从 Unicode 字符串中移除重音的简单方法？

我想改一下这句话:

这是他的一半。

致:

一半的时间。

有没有一种简单的方法可以在 Java 中实现这一点，就像我在 Objective-C 中所做的那样？

NSString *str = @"Et ça sera sa moitié.";
NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];

120740 次浏览

小开

最佳答案

Finally, I've solved it by using the Normalizer class.

import java.text.Normalizer;


public static String stripAccents(String s)
{
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}

小开

Assuming you are using Java 6 or newer, you might want to take a look at Normalizer, which can decompose accents, then use a regex to strip the combining accents.

Otherwise, you should be able to achieve the same result using ICU4J.

小开

Maybe the easiest and safest way is using StringUtils from Apache Commons Lang

StringUtils.stripAccents(String input)

Removes diacritics (~= accents) from a string. The case will not be altered. For instance, 'à' will be replaced by 'a'. Note that ligatures will be left as is.

StringUtils.stripAccents()

小开

I guess the only difference is that I use a + and not a [] compared to the solution. I think both works, but it's better to have it here as well.

String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

小开

For kotlin

fun stripAccents(s: String): String
{
var string = Normalizer.normalize(s, Normalizer.Form.NFD)
string = Regex("\\p{InCombiningDiacriticalMarks}+").replace(string, "")
return  string
}