Most efficient way to make the first character of a String lower case?

什么是最有效的方法,使一个 String小写字母的第一个字符?

我可以想出一些方法来做到这一点:

Using charAt() with substring()

String input   = "SomeInputString";
String output  = Character.toLowerCase(input.charAt(0)) +
(input.length() > 1 ? input.substring(1) : "");

或者使用 char数组

 String input  = "SomeInputString";
char c[]      = input.toCharArray();
c[0]          = Character.toLowerCase(c[0]);
String output = new String(c);

我相信还有很多其他的方法可以达到这个目的。你有什么建议吗?

115409 次浏览

Java 中的字符串是不可变的,因此无论如何都会创建一个新的字符串。

您的第一个示例可能会稍微高效一些,因为它只需要创建一个新字符串,而不需要创建一个临时字符数组。

说到字符串操作,让我们看看 JakartaCommons Lang StringUtils

尽管使用了面向字符的方法,我还是建议使用面向 String 的解决方案。 ToLowerCase 是特定于语言环境的,因此我会考虑这个问题。根据 字符,to LowerCaseString.toLowerCase更适合小写字母。 另外,面向字符的解决方案不完全兼容 Unicode,因为 字符,to LowerCase不能处理补充字符。

public static final String uncapitalize(final String originalStr,
final Locale locale) {
final int splitIndex = 1;
final String result;
if (originalStr.isEmpty()) {
result = originalStr;
} else {
final String first = originalStr.substring(0, splitIndex).toLowerCase(
locale);
final String rest = originalStr.substring(splitIndex);
final StringBuilder uncapStr = new StringBuilder(first).append(rest);
result = uncapStr.toString();
}
return result;
}

更新: As an example how important the locale setting is let us lowercase I in turkish and german:

System.out.println(uncapitalize("I", new Locale("TR","tr")));
System.out.println(uncapitalize("I", new Locale("DE","de")));

将产生两种不同的结果:

i

如果你不想使用第三方库,我发现了一个不错的替代方案:

import java.beans.Introspector;


Assert.assertEquals("someInputString", Introspector.decapitalize("SomeInputString"));

如果您需要的非常简单(例如,java 类名,没有地区) ,您也可以使用 谷歌番石榴库中的 个案格式类。

String converted = CaseFormat.UPPER_CAMEL.to(CaseFormat.LOWER_CAMEL, "FooBar");
assertEquals("fooBar", converted);

或者您可以准备和重用转换器对象,这样会更有效率。

Converter<String, String> converter=
CaseFormat.UPPER_CAMEL.converterTo(CaseFormat.LOWER_CAMEL);


assertEquals("fooBar", converter.convert("FooBar"));

要更好地理解谷歌番石榴字符串操作的原理,请查看 这个维基页面

如果您想使用 Apache Commons,可以执行以下操作:

import org.apache.commons.lang3.text.WordUtils;
[...]
String s = "SomeString";
String firstLower = WordUtils.uncapitalize(s);

结果: somString

String testString = "SomeInputString";
String firstLetter = testString.substring(0,1).toLowerCase();
String restLetters = testString.substring(1);
String resultString = firstLetter + restLetters;

我今天才发现这个。我试着用最普通的方式来做。只用了一句话,虽然有点长。开始了

String str = "TaxoRank";


System.out.println(" Before str = " + str);


str = str.replaceFirst(str.substring(0,1), str.substring(0,1).toLowerCase());


System.out.println(" After str = " + str);

给予:

在 str = TaxoRank 之前

在 str = taxoRanks 之后

一个非常简短的静态方法来归档您想要的内容:

public static String decapitalizeString(String string) {
return string == null || string.isEmpty() ? "" : Character.toLowerCase(string.charAt(0)) + string.substring(1);
}

我使用 JMH测试了有希望的方法。

测试期间的假设(为了避免每次都检查边角大小写) : 输入 String 长度总是大于1。

Results

Benchmark           Mode  Cnt         Score        Error  Units
MyBenchmark.test1  thrpt   20  10463220.493 ± 288805.068  ops/s
MyBenchmark.test2  thrpt   20  14730158.709 ± 530444.444  ops/s
MyBenchmark.test3  thrpt   20  16079551.751 ±  56884.357  ops/s
MyBenchmark.test4  thrpt   20   9762578.446 ± 584316.582  ops/s
MyBenchmark.test5  thrpt   20   6093216.066 ± 180062.872  ops/s
MyBenchmark.test6  thrpt   20   2104102.578 ±  18705.805  ops/s

The score are operations per second, the more the better.

测试

  1. test1首先是安迪和希林克的方法:

    string = Character.toLowerCase(string.charAt(0)) + string.substring(1);
    
  2. test2 was second Andy's approach. It is also Introspector.decapitalize() suggested by Daniel, but without two if statements. First if was removed because of the testing assumption. The second one was removed, because it was violating correctness (i.e. input "HI" would return "HI"). This was almost the fastest.

    char c[] = string.toCharArray();
    c[0] = Character.toLowerCase(c[0]);
    string = new String(c);
    
  3. test3 was a modification of test2, but instead of Character.toLowerCase(), I was adding 32, which works correctly if and only if the string is in ASCII. This was the fastest. c[0] |= ' ' from Mike's comment gave the same performance.

    char c[] = string.toCharArray();
    c[0] += 32;
    string = new String(c);
    
  4. test4 used StringBuilder.

    StringBuilder sb = new StringBuilder(string);
    sb.setCharAt(0, Character.toLowerCase(sb.charAt(0)));
    string = sb.toString();
    
  5. test5 used two substring() calls.

    string = string.substring(0, 1).toLowerCase() + string.substring(1);
    
  6. test6 uses reflection to change char value[] directly in String. This was the slowest.

    try {
    Field field = String.class.getDeclaredField("value");
    field.setAccessible(true);
    char[] value = (char[]) field.get(string);
    value[0] = Character.toLowerCase(value[0]);
    } catch (IllegalAccessException e) {
    e.printStackTrace();
    } catch (NoSuchFieldException e) {
    e.printStackTrace();
    }
    

Conclusions

If the String length is always greater than 0, use test2.

If not, we have to check the corner cases:

public static String decapitalize(String string) {
if (string == null || string.length() == 0) {
return string;
}


char c[] = string.toCharArray();
c[0] = Character.toLowerCase(c[0]);


return new String(c);
}

如果您确定您的文本将总是在 ASCII 中,并且您正在寻找极端的性能,因为您在瓶颈中发现了这个代码,那么使用 test3

val str = "Hello"
s"${str.head.toLower}${str.tail}"

结果:

res4: String = hello