迭代 String 中所有字符的最快方法

在 Java 中,遍历字符串中所有字符的最快方法是:

String str = "a really, really long string";
for (int i = 0, n = str.length(); i < n; i++) {
char c = str.charAt(i);
}

或者这样:

char[] chars = str.toCharArray();
for (int i = 0, n = chars.length; i < n; i++) {
char c = chars[i];
}

编辑:

我想知道的是,在长时间的迭代过程中重复调用 charAt方法的成本是否比在开始时执行一次对 toCharArray的调用,然后在迭代过程中直接访问数组的成本低或者高。

如果有人能够为不同的字符串长度提供一个健壮的基准测试,考虑到 JIT 预热时间、 JVM 启动时间等等,而不仅仅是对 System.currentTimeMillis()的两次调用之间的差异,那就太好了。

286975 次浏览

第一个使用 str.charAt的应该更快。

如果深入研究 String类的源代码,我们可以看到 charAt的实现如下:

public char charAt(int index) {
if ((index < 0) || (index >= count)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index + offset];
}

在这里,它所做的只是索引一个数组并返回值。

现在,如果我们看到 toCharArray的实施,我们会发现以下内容:

public char[] toCharArray() {
char result[] = new char[count];
getChars(0, count, result, 0);
return result;
}


public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
if (srcBegin < 0) {
throw new StringIndexOutOfBoundsException(srcBegin);
}
if (srcEnd > count) {
throw new StringIndexOutOfBoundsException(srcEnd);
}
if (srcBegin > srcEnd) {
throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
}
System.arraycopy(value, offset + srcBegin, dst, dstBegin,
srcEnd - srcBegin);
}

正如你所看到的,它正在做一个 System.arraycopy,这肯定会比不做慢一点。

第二种方法会创建一个新的字符数组,并将 String 中的所有字符复制到这个新的字符数组中,因此我猜测第一种方法会更快(并且不那么占用内存)。

看起来两者都不是更快或更慢

    public static void main(String arguments[]) {




//Build a long string
StringBuilder sb = new StringBuilder();
for(int j = 0; j < 10000; j++) {
sb.append("a really, really long string");
}
String str = sb.toString();
for (int testscount = 0; testscount < 10; testscount ++) {




//Test 1
long start = System.currentTimeMillis();
for(int c = 0; c < 10000000; c++) {
for (int i = 0, n = str.length(); i < n; i++) {
char chr = str.charAt(i);
doSomethingWithChar(chr);//To trick JIT optimistaion
}
}


System.out.println("1: " + (System.currentTimeMillis() - start));


//Test 2
start = System.currentTimeMillis();
char[] chars = str.toCharArray();
for(int c = 0; c < 10000000; c++) {
for (int i = 0, n = chars.length; i < n; i++) {
char chr = chars[i];
doSomethingWithChar(chr);//To trick JIT optimistaion
}
}
System.out.println("2: " + (System.currentTimeMillis() - start));
System.out.println();
}




}




public static void doSomethingWithChar(char chr) {
int newInt = chr << 2;
}

对于长字符串,我将选择第一个。为什么复制周围的长字符串? 文件显示:

Public char [] toCharArray () 将此字符串转换为新的字符数组。

返回: 新分配的字符数组,其长度为该字符串的长度,其内容被初始化为包含由该字符串表示的字符序列。

编辑1

我改变了测试来欺骗 JIT 优化。

//编辑2

重复测试10次,让 JVM 预热。

//编辑3

结论:

首先,str.toCharArray();在内存中复制整个字符串。对于长字符串,它可能会占用内存。方法 String.charAt( )在字符串类检查索引内的字符数组中查找字符。 由于这个索引检查,看起来足够短的 String first 方法(即 chatAt方法)有点慢。但是如果 String 足够长,复制整个 char 数组就会变慢,而第一个方法会更快。字符串越长,toCharArray执行的速度越慢。尝试改变限制在 for(int j = 0; j < 10000; j++)循环看到它。 如果我们让 JVM 预热代码运行得更快,但比例是相同的。

毕竟这只是微观优化。

这只是你不用担心的微观优化。

char[] chars = str.toCharArray();

返回 str字符数组的副本(在 JDK 中,它通过调用 System.arrayCopy返回字符的副本)。

除此之外,str.charAt()只检查索引是否确实在边界内,并返回数组索引中的字符。

第一个不在 JVM 中创建额外的内存。

首次更新: 在生产环境中尝试之前(不建议) ,请先阅读以下 http://www.javaspecialists.eu/archive/issue237.html 从 Java9开始,所描述的解决方案将不再有效,因为现在 Java 默认将字符串存储为 byte []。

第二次更新: 截至2016-10-25年,在我的 AMDx648core 和 source 1.8上,使用“ charAt”和字段访问没有区别。看起来,jvm 已经进行了充分的优化,可以内联和流线化任何“ string.charAt (n)”调用。

第三次更新: 截至2020-09-07年,在我的 Ryzen 1950-X 16核心和源1.14上,‘ charAt1’比字段访问慢9倍,‘ charAt2’比字段访问慢4倍。现场访问重新成为明确的赢家。注意,程序需要对 Java9 + 版本的 jvms 使用 byte []访问。

这完全取决于被检查的 String的长度。如果正如问题所述,它是针对 很长字符串的,那么检查字符串的最快方法是使用反射来访问字符串的后台 char[]

使用 JDK 8(win32和 win64)的一个完全随机的基准测试,在64 AMD Phenom II 4核心955@3.2 GHZ (在客户机模式和服务器模式下)上使用9种不同的技术(见下文结果表明,对于小字符串,使用 String.charAt(n)是最快的,对于大字符串,使用 reflection访问 String 备份数组的速度几乎是两倍。

实验

  • 试验了9种不同的优化技术。

  • 所有字符串内容都是随机的

  • 字符串大小的测试是从0、1、2、4、8、16等开始,以2的倍数进行的。

  • 每个字符串大小测试1000次

  • 每次测试都被随机排序。换句话说,每次测试都是以随机顺序进行的,超过1000次。

  • 整个测试套件是向前和向后完成的,以显示 JVM 预热对优化和时间的影响。

  • 整个套件完成两次,一次在 -client模式,另一次在 -server模式。

结论

- 客户端模式(32位)

对于字符串 长度为1至256个字符,调用 string.charAt(i)获胜,平均每秒处理1340万到5.88亿个字符。

此外,它总体上比客户机快5.5% ,比服务器快13.9% ,如下所示:

    for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}

使用局部最终长度变量:

    final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}

对于长字符串 512到256K 字符长度,使用反射访问 String 的后备数组是最快的。这项技术的速度几乎是作为 String.charAt (i)(快了178%)。在这个范围内的平均速度是每秒11.11亿个字符。

字段必须提前获取,然后可以在库中对不同的字符串进行重用。有趣的是,与上面的代码不同,使用 Field 访问时,使用本地 final length 变量比在循环检查中使用‘ chars.length’快9% 。下面是如何以最快的速度设置字段访问:

   final Field field = String.class.getDeclaredField("value");
field.setAccessible(true);


try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}

服务器模式下的特殊注释

在我的 AMD 64机器上的64位 Java 机器上,在服务器模式下获得32个字符长度字符串后,字段访问开始获胜。在客户端模式下,直到512个字符长度时才看到这一点。

同样值得注意的是,当我在服务器模式下运行 JDK 8(32位构建)时,大字符串和小字符串的整体性能都要低7% 。这是2013年12月121日构建的 JDK 8的早期版本。因此,目前看来,32位服务器模式比32位客户端模式慢。

也就是说... ... 似乎唯一值得调用的服务器模式是在64位机器上。否则它实际上会妨碍性能。

对于运行在 AMD64上的 -server mode中的32位构建,我可以这样说:

  1. CharAt (i)显然是总体上的赢家。虽然大小在8到512个字符之间,但是在“新”“重用”和“字段”之间还是有赢家的。
  2. CharAt (i)在客户端模式下要快45%
  3. 在客户端模式下,大字符串的字段访问速度是大字符串的两倍。

同样值得一提的是,String.chars ()(Stream 和并行版本)是一个失败的例子。比其他方法慢多了。使用 Streams API 执行一般字符串运算是一种相当缓慢的方法。

愿望清单

Java String 可以有谓词来接受优化的方法,比如包含(谓词)、 forEvery (消费者)、 forEachWithIndex (消费者)。因此,用户不需要知道对 String 方法的长度或重复调用,这些方法可以帮助解析库 beep-beep beep加速。

继续做梦:)

快乐弦乐队!

~ SH

该测试使用了以下9种方法来测试字符串是否存在空格:

“ charAt1”——按照常规方法检查字符串内容:

int charAtMethod1(final String data) {
final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return len;
}

“ charAt2”——与上面相同,但是使用 String.LENGTh ()代替为长度创建最终的局部 int

int charAtMethod2(final String data) {
for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return data.length();
}

“ stream”——使用新的 JAVA-8字符串的 IntStream 并将谓词传递给它来执行检查

int streamMethod(final String data, final IntPredicate predicate) {
if (data.chars().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

“ stream Para”——和上面一样,但是哦-啦-啦-去平行! ! !

// avoid this at all costs
int streamParallelMethod(final String data, IntPredicate predicate) {
if (data.chars().parallel().anyMatch(predicate)) {
doThrow();
}
return data.length();
}

“重用”——用字符串内容填充可重用的 char []

int reuseBuffMethod(final char[] reusable, final String data) {
final int len = data.length();
data.getChars(0, len, reusable, 0);
for (int i = 0; i < len; i++) {
if (reusable[i] <= ' ') {
doThrow();
}
}
return len;
}

“ new1”——从字符串中获取 char []的新副本

int newMethod1(final String data) {
final int len = data.length();
final char[] copy = data.toCharArray();
for (int i = 0; i < len; i++) {
if (copy[i] <= ' ') {
doThrow();
}
}
return len;
}

“ new2”——和上面一样,但用“ FOR-EACH”

int newMethod2(final String data) {
for (final char c : data.toCharArray()) {
if (c <= ' ') {
doThrow();
}
}
return data.length();
}

“ FIELD 1”——花哨! ! 获取字段以访问字符串的内部字符[]

int fieldMethod1(final Field field, final String data) {
try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

“ field 2”——和上面一样,但使用“ FOR-EACH”

int fieldMethod2(final Field field, final String data) {
final char[] chars;
try {
chars = (char[]) field.get(data);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
for (final char c : chars) {
if (c <= ' ') {
doThrow();
}
}
return chars.length;
}

客户端 -client模式的组合结果(正向和反向测试组合)

注意: Java32位的-client 模式和 Java64位的-server 模式与我的 AMD64机器上的相同。

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt    77.0     72.0   462.0     584.0   127.5    89.5    86.0   159.5   165.0
2        charAt    38.0     36.5   284.0   32712.5    57.5    48.3    50.3    89.0    91.5
4        charAt    19.5     18.5   458.6    3169.0    33.0    26.8    27.5    54.1    52.6
8        charAt     9.8      9.9   100.5    1370.9    17.3    14.4    15.0    26.9    26.4
16       charAt     6.1      6.5    73.4     857.0     8.4     8.2     8.3    13.6    13.5
32       charAt     3.9      3.7    54.8     428.9     5.0     4.9     4.7     7.0     7.2
64       charAt     2.7      2.6    48.2     232.9     3.0     3.2     3.3     3.9     4.0
128      charAt     2.1      1.9    43.7     138.8     2.1     2.6     2.6     2.4     2.6
256      charAt     1.9      1.6    42.4      90.6     1.7     2.1     2.1     1.7     1.8
512      field1     1.7      1.4    40.6      60.5     1.4     1.9     1.9     1.3     1.4
1,024    field1     1.6      1.4    40.0      45.6     1.2     1.9     2.1     1.0     1.2
2,048    field1     1.6      1.3    40.0      36.2     1.2     1.8     1.7     0.9     1.1
4,096    field1     1.6      1.3    39.7      32.6     1.2     1.8     1.7     0.9     1.0
8,192    field1     1.6      1.3    39.6      30.5     1.2     1.8     1.7     0.9     1.0
16,384   field1     1.6      1.3    39.8      28.4     1.2     1.8     1.7     0.8     1.0
32,768   field1     1.6      1.3    40.0      26.7     1.3     1.8     1.7     0.8     1.0
65,536   field1     1.6      1.3    39.8      26.3     1.3     1.8     1.7     0.8     1.0
131,072  field1     1.6      1.3    40.1      25.4     1.4     1.9     1.8     0.8     1.0
262,144  field1     1.6      1.3    39.6      25.2     1.5     1.9     1.9     0.8     1.0

服务器 -server模式的组合结果(正向和反向测试组合)

注意: 这是在 AMD64上以服务器模式运行的 Java32位测试。Java64位的服务器模式与客户机模式下的 Java32位相同,只是字段访问在32个字符大小后开始胜出。

Size     WINNER  charAt1 charAt2  stream streamPar   reuse    new1    new2  field1  field2
1        charAt     74.5    95.5   524.5     783.0    90.5   102.5    90.5   135.0   151.5
2        charAt     48.5    53.0   305.0   30851.3    59.3    57.5    52.0    88.5    91.8
4        charAt     28.8    32.1   132.8    2465.1    37.6    33.9    32.3    49.0    47.0
8          new2     18.0    18.6    63.4    1541.3    18.5    17.9    17.6    25.4    25.8
16         new2     14.0    14.7   129.4    1034.7    12.5    16.2    12.0    16.0    16.6
32         new2      7.8     9.1    19.3     431.5     8.1     7.0     6.7     7.9     8.7
64        reuse      6.1     7.5    11.7     204.7     3.5     3.9     4.3     4.2     4.1
128       reuse      6.8     6.8     9.0     101.0     2.6     3.0     3.0     2.6     2.7
256      field2      6.2     6.5     6.9      57.2     2.4     2.7     2.9     2.3     2.3
512       reuse      4.3     4.9     5.8      28.2     2.0     2.6     2.6     2.1     2.1
1,024    charAt      2.0     1.8     5.3      17.6     2.1     2.5     3.5     2.0     2.0
2,048    charAt      1.9     1.7     5.2      11.9     2.2     3.0     2.6     2.0     2.0
4,096    charAt      1.9     1.7     5.1       8.7     2.1     2.6     2.6     1.9     1.9
8,192    charAt      1.9     1.7     5.1       7.6     2.2     2.5     2.6     1.9     1.9
16,384   charAt      1.9     1.7     5.1       6.9     2.2     2.5     2.5     1.9     1.9
32,768   charAt      1.9     1.7     5.1       6.1     2.2     2.5     2.5     1.9     1.9
65,536   charAt      1.9     1.7     5.1       5.5     2.2     2.4     2.4     1.9     1.9
131,072  charAt      1.9     1.7     5.1       5.4     2.3     2.5     2.5     1.9     1.9
262,144  charAt      1.9     1.7     5.1       5.1     2.3     2.5     2.5     1.9     1.9

完全可运行的程序代码

(要在 Java7及更早版本上进行测试,请删除这两个流测试)

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;
import java.util.function.IntPredicate;


/**
* @author Saint Hill <http://stackoverflow.com/users/1584255/saint-hill>
*/
public final class TestStrings {


// we will not test strings longer than 512KM
final int MAX_STRING_SIZE = 1024 * 256;


// for each string size, we will do all the tests
// this many times
final int TRIES_PER_STRING_SIZE = 1000;


public static void main(String[] args) throws Exception {
new TestStrings().run();
}


void run() throws Exception {


// double the length of the data until it reaches MAX chars long
// 0,1,2,4,8,16,32,64,128,256 ...
final List<Integer> sizes = new ArrayList<>();
for (int n = 0; n <= MAX_STRING_SIZE; n = (n == 0 ? 1 : n * 2)) {
sizes.add(n);
}


// CREATE RANDOM (FOR SHUFFLING ORDER OF TESTS)
final Random random = new Random();


System.out.println("Rate in nanoseconds per character inspected.");
System.out.printf("==== FORWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);


printHeadings(TRIES_PER_STRING_SIZE, random);


for (int size : sizes) {
reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));
}


// reverse order or string sizes
Collections.reverse(sizes);


System.out.println("");
System.out.println("Rate in nanoseconds per character inspected.");
System.out.printf("==== BACKWARDS (tries per size: %s) ==== \n", TRIES_PER_STRING_SIZE);


printHeadings(TRIES_PER_STRING_SIZE, random);


for (int size : sizes) {
reportResults(size, test(size, TRIES_PER_STRING_SIZE, random));


}
}


///
///
///  METHODS OF CHECKING THE CONTENTS
///  OF A STRING. ALWAYS CHECKING FOR
///  WHITESPACE (CHAR <=' ')
///
///
// CHECK THE STRING CONTENTS
int charAtMethod1(final String data) {
final int len = data.length();
for (int i = 0; i < len; i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return len;
}


// SAME AS ABOVE BUT USE String.length()
// instead of making a new final local int
int charAtMethod2(final String data) {
for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) <= ' ') {
doThrow();
}
}
return data.length();
}


// USE new Java-8 String's IntStream
// pass it a PREDICATE to do the checking
int streamMethod(final String data, final IntPredicate predicate) {
if (data.chars().anyMatch(predicate)) {
doThrow();
}
return data.length();
}


// OH LA LA - GO PARALLEL!!!
int streamParallelMethod(final String data, IntPredicate predicate) {
if (data.chars().parallel().anyMatch(predicate)) {
doThrow();
}
return data.length();
}


// Re-fill a resuable char[] with the contents
// of the String's char[]
int reuseBuffMethod(final char[] reusable, final String data) {
final int len = data.length();
data.getChars(0, len, reusable, 0);
for (int i = 0; i < len; i++) {
if (reusable[i] <= ' ') {
doThrow();
}
}
return len;
}


// Obtain a new copy of char[] from String
int newMethod1(final String data) {
final int len = data.length();
final char[] copy = data.toCharArray();
for (int i = 0; i < len; i++) {
if (copy[i] <= ' ') {
doThrow();
}
}
return len;
}


// Obtain a new copy of char[] from String
// but use FOR-EACH
int newMethod2(final String data) {
for (final char c : data.toCharArray()) {
if (c <= ' ') {
doThrow();
}
}
return data.length();
}


// FANCY!
// OBTAIN FIELD FOR ACCESS TO THE STRING'S
// INTERNAL CHAR[]
int fieldMethod1(final Field field, final String data) {
try {
final char[] chars = (char[]) field.get(data);
final int len = chars.length;
for (int i = 0; i < len; i++) {
if (chars[i] <= ' ') {
doThrow();
}
}
return len;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}


// same as above but use FOR-EACH
int fieldMethod2(final Field field, final String data) {
final char[] chars;
try {
chars = (char[]) field.get(data);
} catch (Exception ex) {
throw new RuntimeException(ex);
}
for (final char c : chars) {
if (c <= ' ') {
doThrow();
}
}
return chars.length;
}


/**
*
* Make a list of tests. We will shuffle a copy of this list repeatedly
* while we repeat this test.
*
* @param data
* @return
*/
List<Jobber> makeTests(String data) throws Exception {
// make a list of tests
final List<Jobber> tests = new ArrayList<Jobber>();


tests.add(new Jobber("charAt1") {
int check() {
return charAtMethod1(data);
}
});


tests.add(new Jobber("charAt2") {
int check() {
return charAtMethod2(data);
}
});


tests.add(new Jobber("stream") {
final IntPredicate predicate = new IntPredicate() {
public boolean test(int value) {
return value <= ' ';
}
};


int check() {
return streamMethod(data, predicate);
}
});


tests.add(new Jobber("streamPar") {
final IntPredicate predicate = new IntPredicate() {
public boolean test(int value) {
return value <= ' ';
}
};


int check() {
return streamParallelMethod(data, predicate);
}
});


// Reusable char[] method
tests.add(new Jobber("reuse") {
final char[] cbuff = new char[MAX_STRING_SIZE];


int check() {
return reuseBuffMethod(cbuff, data);
}
});


// New char[] from String
tests.add(new Jobber("new1") {
int check() {
return newMethod1(data);
}
});


// New char[] from String
tests.add(new Jobber("new2") {
int check() {
return newMethod2(data);
}
});


// Use reflection for field access
tests.add(new Jobber("field1") {
final Field field;


{
field = String.class.getDeclaredField("value");
field.setAccessible(true);
}


int check() {
return fieldMethod1(field, data);
}
});


// Use reflection for field access
tests.add(new Jobber("field2") {
final Field field;


{
field = String.class.getDeclaredField("value");
field.setAccessible(true);
}


int check() {
return fieldMethod2(field, data);
}
});


return tests;
}


/**
* We use this class to keep track of test results
*/
abstract class Jobber {


final String name;
long nanos;
long chars;
long runs;


Jobber(String name) {
this.name = name;
}


abstract int check();


final double nanosPerChar() {
double charsPerRun = chars / runs;
long nanosPerRun = nanos / runs;
return charsPerRun == 0 ? nanosPerRun : nanosPerRun / charsPerRun;
}


final void run() {
runs++;
long time = System.nanoTime();
chars += check();
nanos += System.nanoTime() - time;
}
}


// MAKE A TEST STRING OF RANDOM CHARACTERS A-Z
private String makeTestString(int testSize, char start, char end) {
Random r = new Random();
char[] data = new char[testSize];
for (int i = 0; i < data.length; i++) {
data[i] = (char) (start + r.nextInt(end));
}
return new String(data);
}


// WE DO THIS IF WE FIND AN ILLEGAL CHARACTER IN THE STRING
public void doThrow() {
throw new RuntimeException("Bzzzt -- Illegal Character!!");
}


/**
* 1. get random string of correct length 2. get tests (List<Jobber>) 3.
* perform tests repeatedly, shuffling each time
*/
List<Jobber> test(int size, int tries, Random random) throws Exception {
String data = makeTestString(size, 'A', 'Z');
List<Jobber> tests = makeTests(data);
List<Jobber> copy = new ArrayList<>(tests);
while (tries-- > 0) {
Collections.shuffle(copy, random);
for (Jobber ti : copy) {
ti.run();
}
}
// check to make sure all char counts the same
long runs = tests.get(0).runs;
long count = tests.get(0).chars;
for (Jobber ti : tests) {
if (ti.runs != runs && ti.chars != count) {
throw new Exception("Char counts should match if all correct algorithms");
}
}
return tests;
}


private void printHeadings(final int TRIES_PER_STRING_SIZE, final Random random) throws Exception {
System.out.print("  Size");
for (Jobber ti : test(0, TRIES_PER_STRING_SIZE, random)) {
System.out.printf("%9s", ti.name);
}
System.out.println("");
}


private void reportResults(int size, List<Jobber> tests) {
System.out.printf("%6d", size);
for (Jobber ti : tests) {
System.out.printf("%,9.2f", ti.nanosPerChar());
}
System.out.println("");
}
}

只是为了好奇和比较圣希尔的答案。

如果需要处理大量数据,则不应在客户机模式下使用 JVM。客户端模式不适合优化。

让我们比较一下在客户端模式和服务器模式下使用 JVM 的@Saint Hill 基准测试的结果。

Core2Quad Q6600 G0 @ 2.4GHz
JavaSE 1.7.0_40

参见: “ java-server”和“ java-client”的真正区别是什么?


客户端模式:

len =      2:    111k charAt(i),  105k cbuff[i],   62k new[i],   17k field access.   (chars/ms)
len =      4:    285k charAt(i),  166k cbuff[i],  114k new[i],   43k field access.   (chars/ms)
len =      6:    315k charAt(i),  230k cbuff[i],  162k new[i],   69k field access.   (chars/ms)
len =      8:    333k charAt(i),  275k cbuff[i],  181k new[i],   85k field access.   (chars/ms)
len =     12:    342k charAt(i),  342k cbuff[i],  222k new[i],  117k field access.   (chars/ms)
len =     16:    363k charAt(i),  347k cbuff[i],  275k new[i],  152k field access.   (chars/ms)
len =     20:    363k charAt(i),  392k cbuff[i],  289k new[i],  180k field access.   (chars/ms)
len =     24:    375k charAt(i),  428k cbuff[i],  311k new[i],  205k field access.   (chars/ms)
len =     28:    378k charAt(i),  474k cbuff[i],  341k new[i],  233k field access.   (chars/ms)
len =     32:    376k charAt(i),  492k cbuff[i],  340k new[i],  251k field access.   (chars/ms)
len =     64:    374k charAt(i),  551k cbuff[i],  374k new[i],  367k field access.   (chars/ms)
len =    128:    385k charAt(i),  624k cbuff[i],  415k new[i],  509k field access.   (chars/ms)
len =    256:    390k charAt(i),  675k cbuff[i],  436k new[i],  619k field access.   (chars/ms)
len =    512:    394k charAt(i),  703k cbuff[i],  439k new[i],  695k field access.   (chars/ms)
len =   1024:    395k charAt(i),  718k cbuff[i],  462k new[i],  742k field access.   (chars/ms)
len =   2048:    396k charAt(i),  725k cbuff[i],  471k new[i],  767k field access.   (chars/ms)
len =   4096:    396k charAt(i),  727k cbuff[i],  459k new[i],  780k field access.   (chars/ms)
len =   8192:    397k charAt(i),  712k cbuff[i],  446k new[i],  772k field access.   (chars/ms)

服务器模式:

len =      2:     86k charAt(i),   41k cbuff[i],   46k new[i],   80k field access.   (chars/ms)
len =      4:    571k charAt(i),  250k cbuff[i],   97k new[i],  222k field access.   (chars/ms)
len =      6:    666k charAt(i),  333k cbuff[i],  125k new[i],  315k field access.   (chars/ms)
len =      8:    800k charAt(i),  400k cbuff[i],  181k new[i],  380k field access.   (chars/ms)
len =     12:    800k charAt(i),  521k cbuff[i],  260k new[i],  545k field access.   (chars/ms)
len =     16:    800k charAt(i),  592k cbuff[i],  296k new[i],  640k field access.   (chars/ms)
len =     20:    800k charAt(i),  666k cbuff[i],  408k new[i],  800k field access.   (chars/ms)
len =     24:    800k charAt(i),  705k cbuff[i],  452k new[i],  800k field access.   (chars/ms)
len =     28:    777k charAt(i),  736k cbuff[i],  368k new[i],  933k field access.   (chars/ms)
len =     32:    800k charAt(i),  780k cbuff[i],  571k new[i],  969k field access.   (chars/ms)
len =     64:    800k charAt(i),  901k cbuff[i],  800k new[i],  1306k field access.   (chars/ms)
len =    128:    1084k charAt(i),  888k cbuff[i],  633k new[i],  1620k field access.   (chars/ms)
len =    256:    1122k charAt(i),  966k cbuff[i],  729k new[i],  1790k field access.   (chars/ms)
len =    512:    1163k charAt(i),  1007k cbuff[i],  676k new[i],  1910k field access.   (chars/ms)
len =   1024:    1179k charAt(i),  1027k cbuff[i],  698k new[i],  1954k field access.   (chars/ms)
len =   2048:    1184k charAt(i),  1043k cbuff[i],  732k new[i],  2007k field access.   (chars/ms)
len =   4096:    1188k charAt(i),  1049k cbuff[i],  742k new[i],  2031k field access.   (chars/ms)
len =   8192:    1157k charAt(i),  1032k cbuff[i],  723k new[i],  2048k field access.   (chars/ms)

结论:

如您所见,服务器模式要快得多。

如果考虑到 ToCharArray ()的时间复杂性,尽管@Saint Hill 给出了答案,

即使对于非常大的字符串,第一个字符串也更快。您可以运行下面的代码亲自查看。

        char [] ch = new char[1_000_000_00];
String str = new String(ch); // to create a large string


// ---> from here
long currentTime = System.nanoTime();
for (int i = 0, n = str.length(); i < n; i++) {
char c = str.charAt(i);
}
// ---> to here
System.out.println("str.charAt(i):"+(System.nanoTime()-currentTime)/1000000.0 +" (ms)");


/**
*   ch = str.toCharArray() itself takes lots of time
*/
// ---> from here
currentTime = System.nanoTime();
ch = str.toCharArray();
for (int i = 0, n = str.length(); i < n; i++) {
char c = ch[i];
}
// ---> to  here
System.out.println("ch = str.toCharArray() + c = ch[i] :"+(System.nanoTime()-currentTime)/1000000.0 +" (ms)");

产出:

str.charAt(i):5.492102 (ms)
ch = str.toCharArray() + c = ch[i] :79.400064 (ms)

String.toCharArray()创建新的字符数组,意味着分配字符串长度的内存,然后使用 System.arraycopy()复制字符串的原始字符数组,然后将这个复制返回给调用者。 CharAt ()从原始副本返回位置为 i的字符,这就是 String.charAt()String.toCharArray()快的原因。 但是,String.toCharArray()从原始 String 数组返回 copy 而不是 char,其中 String.charAt()从原始 char 数组返回字符。 下面的代码返回此字符串的指定索引处的值。

public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}

下面的代码返回一个新分配的字符数组,其长度为此字符串的长度

public char[] toCharArray() {
// Cannot use Arrays.copyOf because of class initialization order issues
char result[] = new char[value.length];
System.arraycopy(value, 0, result, 0, value.length);
return result;
}

在许多实际场景中,字符值和原始字符串的索引都是必需的。下面是另一套 String迭代基准测试,使用 openjdk 版本“19.0.1”2022-10-18和 JMH 1.35进行了测试。

源代码

/**
* Iterates over a string, much like {@link CharacterIterator}, but faster.
* <p>
* <strong>Caution:</strong> This class offers minimal bounds checking to eke
* out some efficiency.
* </p>
*/
public final class FastCharacterIterator {


private final String mS;
private final int mLen;


/**
* Starts at 0, not guaranteed to be within bounds.
*/
private int mPos;


/**
* Constructs a new iterator that can advance through the given string
* one character at a time.
*
* @param s The string to iterate.
*/
public FastCharacterIterator( final String s ) {
assert s != null;


mS = s;
mLen = s.length();
}


/**
* Returns the iterated index. The return value is not guaranteed to be
* within the string bounds.
*
* @return The iterated index.
*/
public int index() {
return mPos;
}


/**
* Returns the character at the currently iterated position in the string.
* This method performs bounds checking by catching an exception because
* usually parsing is complete when there are no more characters to iterate,
* meaning that 99.99% of the time, explicit bounds checking is superfluous.
*
* @return {@link CharacterIterator#DONE} if there are no more characters.
*/
public char current() {
try {
return mS.charAt( mPos );
} catch( final Exception ex ) {
return DONE;
}
}


/**
* Returns the next character in the string and consumes it.
*
* @return {@link CharacterIterator#DONE} if there are no more characters.
*/
public char advance() {
try {
return mS.charAt( ++mPos );
} catch( final Exception ex ) {
return DONE;
}
}


/**
* Returns the next character in the string without consuming it. Multiple
* consecutive calls to this method will return the same value.
*
* @return {@link CharacterIterator#DONE} if there are no more characters.
*/
public char peek() {
try {
return mS.charAt( mPos + 1 );
} catch( final Exception ex ) {
return DONE;
}
}


/**
* Advances to the next character in the string, without bounds checking.
*/
public void next() {
mPos++;
}


/**
* Advances to the previous character in the string, without bounds checking.
*/
public void prev() {
mPos--;
}


/**
* Answers whether {@link #next()} followed by {@link #current()} is safe.
*
* @return {@code true} if there are more characters to be iterated.
*/
public boolean hasNext() {
return mPos < mLen;
}


/**
* Parse all characters that match a given function.
*
* @param f The function that determines when skipping stops.
*/
@SuppressWarnings( "StatementWithEmptyBody" )
public void skip( final Function<Character, Boolean> f ) {
assert f != null;


while( f.apply( advance() ) ) ;


// The loop always overshoots by one character.
prev();
}


/**
* Creates a string from a subset of consecutive characters in the string
* being iterated. The calling class is responsible for bounds-checking.
*
* @param began The starting index, must be greater than or equal to zero
*              and less than or equal to {@code ended}.
* @param ended The ending index, must be less than the string length.
* @return A substring of the iterated string.
* @throws IndexOutOfBoundsException Either or both parameters exceed the
*                                   string's boundaries.
*/
public String substring( final int began, final int ended ) {
assert began >= 0;
assert began <= ended;
assert ended < mLen;


return mS.substring( began, ended );
}
}


测试守则

用于测试的完整源代码:

import com.whitemagicsoftware.keenquotes.util.FastCharacterIterator;
import org.openjdk.jmh.annotations.Benchmark;


import java.text.StringCharacterIterator;
import java.util.StringTokenizer;
import java.util.concurrent.atomic.AtomicInteger;


import static java.lang.Math.random;
import static java.text.CharacterIterator.DONE;


/**
* Higher scores mean faster code:
*
* <pre>
* Benchmark                     Mode   Cnt    Score    Error  Units
* test_CharArrayIterator        thrpt   25  753.960 ±  0.972  ops/s
* test_CharAtIterator           thrpt   25  878.016 ±  0.884  ops/s
* test_FastCharacterIterator    thrpt   25  803.041 ± 48.422  ops/s
* test_StreamIterator           thrpt   25  101.416 ±  0.053  ops/s
* test_StringCharacterIterator  thrpt   25  580.341 ±  0.432  ops/s
* test_StringTokenizer          thrpt   25  174.121 ±  8.282  ops/s
* </pre>
*/
@SuppressWarnings( "unused" )
public class StringIterationBenchmark {
private static final int STRLEN = 1024 * 1024;
private static final String CHARSET =
"ABCDEFGHIJKLM NOPQRSTUVWXYZ abcdefghijklm nopqrstuvxyz 01234 5 6789";


private static final String sText;


static {
final var len = CHARSET.length();
final var buffer = new StringBuilder( STRLEN );


for( var i = 0; i < STRLEN; i++ ) {
buffer.append( CHARSET.charAt( (int) (len * random()) ) );
}


sText = buffer.toString();
}


private static String getText() {
return sText;
}


@Benchmark
public void test_FastCharacterIterator() {
final var s = getText();
final var i = new FastCharacterIterator( s );
var spaces = 0;


char ch = ' ';


while( (ch = i.advance()) != DONE ) {
if( ch == ' ' ) {
spaces++;
}
}


fail( i.index(), s.length() );
}


@Benchmark
public void test_CharAtIterator() {
final var s = getText();
final var length = s.length();
var index = 0;
var spaces = 0;


while( index < length ) {
final var ch = s.charAt( index );


if( ch == ' ' ) {
spaces++;
}


index++;
}


fail( index, length );
}


@Benchmark
public void test_StringCharacterIterator() {
final var s = getText();
final var i = new StringCharacterIterator( s );
var index = 0;
var spaces = 0;


char ch = ' ';


while( ch != DONE ) {
ch = i.next();


if( ch == ' ' ) {
spaces++;
}


index++;
}


fail( index, s.length() );
}


@Benchmark
public void test_CharArrayIterator() {
final var s = getText();
final var i = s.toCharArray();
var index = 0;
var spaces = 0;


for( final var ch : i ) {
if( ch == ' ' ) {
spaces++;
}


index++;
}


fail( index, s.length() );
}


@Benchmark
public void test_StringTokenizer() {
final var s = getText();
final var i = new StringTokenizer( s, " ", true );
var index = 0;
var spaces = 0;


while( i.hasMoreTokens() ) {
final var token = i.nextToken();


if( token.isBlank() ) {
spaces++;
}


index += token.length();
}


fail( index, s.length() );
}


@Benchmark
public void test_StreamIterator() {
final var s = getText();
final var index = new AtomicInteger();
final var spaces = new AtomicInteger();


s.chars().forEach( codepoint -> {
final var ch = Character.valueOf( (char) codepoint );


if( ch == ' ' ) {
spaces.incrementAndGet();
}


index.incrementAndGet();
} );


fail( index.get(), s.length() );
}


private static void fail( final int index, final int length ) {
if( index != length ) {
throw new RuntimeException( "Fail" );
}
}
}

评论

请随意改进代码并重新运行基准测试。抱怨将发送到 /dev/null。 StackOverflow 是一个 wiki。