“编码 UTF-8时无法映射字符”错误

小开

The following compiles for me:

class E{
String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

See:

enter image description here

小开

The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.

However, the data in your .java files is not actually encoded in UTF-8. The problem is probably the ¬ character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.

小开

The compiler is using the UTF-8 character encoding to read your source file. But the file must have been written by an editor using a different encoding. Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again.

Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. For example, the character A can be replaced with the Unicode escape \u0041.

By the way, you don't need to use the begin- and end-line anchors ^ and $ when using the matches() method. The entire sequence must be matched by the regular expression when using the matches() method. The anchors are only useful with the find() method.

小开

You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
System.out.println(corrupt+": "+corrupt.charAt(74));
System.out.println(reg+": "+reg.charAt(74));

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.

小开

"error: unmappable character for encoding UTF-8" means, java has found a character which is not representing in UTF-8. Hence open the file in an editor and set the character encoding to UTF-8. You should be able to find a character which is not represented in UTF-8.Take off this character and recompile.

小开

I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
includeantruntime="false" source="${java.level}" target="${java.level}"


encoding="iso-8859-1">


<src path="${production.dir}" />
</javac>

小开

In eclipse try to go to file properties (Alt+Enter) and change the Resource → 'Text File encoding' → Other to UTF-8. Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.

Change the encoding Resource → 'Text File encoding' back to Default.

Compile and deploy the code.

小开

Thanks Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) for your answer.

I did this in Eclipse / STS:

Preferences > General > Content Types > Selected "Text"
(which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo, error gone!

小开

I observed this issue while using Eclipse. I needed to add encoding in my pom.xml file and it resolved. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html

小开

For IntelliJ users, this is pretty easy once you find out what the original encoding was. You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying:

The encoding you've chosen ('[encoding type]') may change the contents of '[Your file]'. Do you want to reload the file from disk or convert the text and save in the new encoding?

So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. For me this turned the ? characters into their proper value.

IntelliJ can tell if you most likely did not pick the right encoding and will warn you. Revert back and try again.

Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). This time select the 'Convert' button on the dialog.

For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. The offending characters were single quotes (‘ and ’) likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8.

小开

I had the similar issue and I fix with the down corner of my IntelliJ.

I changed it from LF to CRLF.

Here is how it looks the down corner of the IntelliJ:

IntelliJ_image