MySQL utf8mb4,保存表情符号时出错

我尝试从 MySQL 数据库中的服务中保存用户名。这些名字可以包含表情符号像(只是例子)

在搜索了一点点之后,我发现这个 堆栈溢出链接到这个 教程。我遵循了这些步骤,看起来一切都配置正确。

我有一个 Database (字符集和排序设置为 utf8mb4(_ unicode _ ci)) ,一个名为 TestTable 的 Table,也是这样配置的,还有一个“ Text”列,也是这样配置的(VARCHAR (191) utf8mb4 _ unicode _ ci)。

当我试图保存表情符号时,我得到一个错误:

Example of error for shortcake (🍰):
Warning: #1300 Invalid utf8 character string: 'F09F8D'
Warning: #1366 Incorrect string value: '\xF0\x9F\x8D\xB0' for column 'Text' at row 1

我唯一能正确保存的表情符号就是太阳

虽然我没有全都说实话。

我是不是在配置中遗漏了什么?

请注意: 所有的存储测试都不涉及客户端。我使用 phpmyadmin 手动更改值并保存数据。因此,客户端的正确配置是我会照顾的 之后服务器正确保存表情符号。

另一个旁注 : 目前,当保存表情符号时,我要么像上面那样得到错误,要么没有得到错误,而且 Username 🍰的数据将被存储为 Username ????。错误或没有错误取决于我保存的方式。当通过 SQL 语句创建/保存时,我用问号保存,当在线编辑时,我用问号保存,当使用编辑按钮编辑时,我得到错误。

谢谢你

编辑1: 好吧,我想我找到了问题所在,但没有找到解决方案。 看起来数据库特定的变量没有正确地改变。

当我在服务器上以 root 用户身份登录并读出变量(全局)时:
查询使用: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

对于我的数据库(在 phpmyadmin 中,相同的查询) ,它看起来如下:

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | utf8               |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

如何在特定数据库上调整这些设置? 此外,即使我有第一个显示的默认设置,当创建一个新的数据库,我得到第二个作为设置。

编辑2:

这是我的 my.cnf文件:

[client]
port=3306
socket=/var/run/mysqld/mysqld.sock
default-character-set = utf8mb4


[mysql]
default-character-set = utf8mb4


[mysqld_safe]
socket=/var/run/mysqld/mysqld.sock


[mysqld]
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
port=3306
basedir=/usr
datadir=/var/lib/mysql
tmpdir=/tmp
lc-messages-dir=/usr/share/mysql
log_error=/var/log/mysql/error.log
max_connections=200
max_user_connections=30
wait_timeout=30
interactive_timeout=50
long_query_time=5
innodb_file_per_table
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci


!includedir /etc/mysql/conf.d/
57649 次浏览

character_set_client, _connection, and _results must all be utf8mb4 for that shortcake to be eatable.

Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin's settings -- something is not setting all three.

If SET NAMES utf8mb4 is executed, all three set correctly.

The sun shone because it is only 3-bytes - E2 98 80; utf8 is sufficient for 3-byte utf8 encodings of Unicode characters.

It is likely that your service/application is connecting with "utf8" instead of "utf8mb4" for the client character set. That's up to the client application.

For a PHP application see http://php.net/manual/en/function.mysql-set-charset.php or http://php.net/manual/en/mysqli.set-charset.php

For a Python application see https://github.com/PyMySQL/PyMySQL#example or http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode

Also, check that your columns really are utf8mb4. One direct way is like this:

mysql> SELECT character_set_name FROM information_schema.`COLUMNS`  WHERE table_name = "user"   AND column_name = "displayname";
+--------------------+
| character_set_name |
+--------------------+
| utf8mb4            |
+--------------------+
1 row in set (0.00 sec)

For me, it turned out that the problem lied in mysql client.

mysql client updates my.cnf's char setting on a server, and resulted in unintended character setting.

So, What I needed to do is just to add character-set-client-handshake = FALSE. It disables client setting from disturbing my char setting.

my.cnf would be like this.

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
...

Hope it helps.

I'm not proud of this answer, because it uses brute-force to clean the input. It's brutal, but it works

function cleanWord($string, $debug = false) {
$new_string = "";


for ($i=0;$i<strlen($string);$i++) {
$letter = substr($string, $i, 1);
if ($debug) {
echo "Letter: " . $letter . "<BR>";
echo "Code: " . ord($letter) . "<BR><BR>";
}
$blnSkip = false;
if (ord($letter)=="146") {
$letter = "&acute;";
$blnSkip = true;
}
if (ord($letter)=="233") {
$letter = "&eacute;";
$blnSkip = true;
}
if (ord($letter)=="147" || ord($letter)=="148") {
$letter = "&quot;";
$blnSkip = true;
}
if (ord($letter)=="151") {
$letter = "&#8211;";
$blnSkip = true;
}
if ($blnSkip) {
$new_string .= $letter;
break;
}


if (ord($letter) > 127) {
$letter = "&#0" . ord($letter) . ";";
}


$new_string .= $letter;
}
if ($new_string!="") {
$string = $new_string;
}
//optional
$string = str_replace("\r\n", "<BR>", $string);


return $string;
}


//clean up the input
$message = cleanWord($message);


//now you can insert it as part of SQL statement
$sql = "INSERT INTO tbl_message (`message`)
VALUES ('" . addslashes($message) . "')";

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

example query :

ALTER TABLE `reactions` CHANGE `emoji` `emoji` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

enter image description here

after that , successful able to store emoji in table :

enter image description here

Consider adding

init_connect = 'SET NAMES utf8mb4'

to all of your your db-servers' my.cnf-s.

(still, clients can (so will) overrule it)

Symfony 5 answer

Although this is not what was asked, people can land up here after searching the web for the same problem in Symfony.

1. Configure MySQL properly

☝️ See (and upvote if helpful) top answers here.

2. Change your Doctrine configuration

/config/packages/doctrine.yaml

doctrine:
dbal:
...
charset: utf8mb4

I was importing data via command:

LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(col1, col2, col3, col4, col5...);

This didnt work for me:

SET NAMES utf8mb4;

I had to add the CHARACTER SET to make it working:

LOAD DATA LOCAL INFILE
'E:\\wamp\\tmp\\customer.csv' INTO TABLE `customer`
CHARACTER SET 'utf8mb4'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;

Note, the target column must be also utf8mb4 not utf8, or the import will save (without errors thought) the question marks like "?????".

For codeigniter user, ensure your character set and collate setting in database.php is set properly, which is worked for me.

$db['default']['char_set'] = 'utf8mb4';
$db['default']['dbcollat'] = 'utf8mb4_unicode_ci';