Java 子字符串()的时间复杂性

Java 中 String#substring()方法的时间复杂度是多少?

93390 次浏览

新答案

As of update 6 within Java 7's lifetime, the behaviour of substring changed to create a copy - so every String refers to a char[] which is 没有 shared with any other object, as far as I'm aware. So at that point, substring() became an O(n) operation where n is the numbers in the substring.

旧答案: Java 7之前

没有文档记录——但在实践中,如果假定不需要垃圾收集,则为 O (1) ,等等。

它只是构建一个新的 String对象,该对象引用相同的底层 char[],但具有不同的偏移量和计数值。因此,成本是执行验证和构造一个新的(相当小的)对象所花费的时间。这就是 O (1) ,因为讨论操作的复杂性是明智的,这些操作的复杂性随着时间的变化会随着垃圾收集、 CPU 缓存等等而变化。特别是,它不直接取决于原始字符串或子字符串的长度。

O (1)因为没有复制原始字符串,所以它只是创建一个具有不同偏移量信息的新包装器对象。

您可以自己判断是否遵循,但 Java 的性能缺陷在其他地方,而不是在字符串的子字符串中。 密码:

public static void main(String[] args) throws IOException {


String longStr = "asjf97zcv.1jm2497z20`1829182oqiwure92874nvcxz,nvz.,xo" +
"aihf[oiefjkas';./.,z][p\\°°°°°°°°?!(*#&(@*&#!)^(*&(*&)(*&" +
"fasdznmcxzvvcxz,vc,mvczvcz,mvcz,mcvcxvc,mvcxcvcxvcxvcxvcx";
int[] indices = new int[32 * 1024];
int[] lengths = new int[indices.length];
Random r = new Random();
final int minLength = 6;
for (int i = 0; i < indices.length; ++i)
{
indices[i] = r.nextInt(longStr.length() - minLength);
lengths[i] = minLength + r.nextInt(longStr.length() - indices[i] - minLength);
}


long start = System.nanoTime();


int avoidOptimization = 0;
for (int i = 0; i < indices.length; ++i)
//avoidOptimization += lengths[i]; //tested - this was cheap
avoidOptimization += longStr.substring(indices[i],
indices[i] + lengths[i]).length();


long end = System.nanoTime();
System.out.println("substring " + indices.length + " times");
System.out.println("Sum of lengths of splits = " + avoidOptimization);
System.out.println("Elapsed " + (end - start) / 1.0e6 + " ms");
}

产出:

substring 32768 times
Sum of lengths of splits = 1494414
Elapsed 2.446679 ms

If it is O(1) or not, depends. If you just reference same String in memory, then imagine 非常 long String, you make substring and stop referencing long one. Wouldn't be nice to release memory for long one?

在旧版本的 Java 中,它是 O (1)——正如 Jon 所说,它只是创建了一个新的 String,其底层 char []相同,但偏移量和长度不同。

但是,从 Java7更新6开始,这种情况实际上已经发生了改变。

消除了 char []共享,删除了偏移量和长度字段。 Substring ()现在只是将所有字符复制到一个新的 String 中。

因此,在 Java7更新6中子字符串是 O (n)

现在是线性复杂度,这是在修复子字符串的内存泄漏问题之后。

因此,从 Java 1.7.0 _ 06开始,请记住 String.substring 现在具有线性复杂性,而不是常量复杂性。

为乔恩的回答增添了证据。 我也有同样的疑问,想要检查字符串的长度是否对子字符串函数有任何影响。编写以下代码来检查子字符串实际依赖于哪个参数。

import org.apache.commons.lang.RandomStringUtils;


public class Dummy {


private static final String pool[] = new String[3];
private static int substringLength;


public static void main(String args[]) {
pool[0] = RandomStringUtils.random(2000);
pool[1] = RandomStringUtils.random(10000);
pool[2] = RandomStringUtils.random(100000);
test(10);
test(100);
test(1000);
}


public static void test(int val) {
substringLength = val;
StatsCopy statsCopy[] = new StatsCopy[3];
for (int j = 0; j < 3; j++) {
statsCopy[j] = new StatsCopy();
}
long latency[] = new long[3];
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < 3; j++) {
latency[j] = latency(pool[j]);
statsCopy[j].send(latency[j]);
}
}
for (int i = 0; i < 3; i++) {
System.out.println(
" Avg: "
+ (int) statsCopy[i].getAvg()
+ "\t String length: "
+ pool[i].length()
+ "\tSubstring Length: "
+ substringLength);
}
System.out.println();
}


private static long latency(String a) {
long startTime = System.nanoTime();
a.substring(0, substringLength);
long endtime = System.nanoTime();
return endtime - startTime;
}


private static class StatsCopy {
private  long count = 0;
private  long min = Integer.MAX_VALUE;
private  long max = 0;
private  double avg = 0;


public  void send(long latency) {
computeStats(latency);
count++;
}


private  void computeStats(long latency) {
if (min > latency) min = latency;
if (max < latency) max = latency;
avg = ((float) count / (count + 1)) * avg + (float) latency / (count + 1);
}


public  double getAvg() {
return avg;
}


public  long getMin() {
return min;
}


public  long getMax() {
return max;
}


public  long getCount() {
return count;
}
}


}


Java8的执行输出如下:

 Avg: 128    String length: 2000    Substring Length: 10
Avg: 127    String length: 10000   Substring Length: 10
Avg: 124    String length: 100000  Substring Length: 10


Avg: 172    String length: 2000    Substring Length: 100
Avg: 175    String length: 10000   Substring Length: 100
Avg: 177    String length: 100000  Substring Length: 100


Avg: 1199   String length: 2000    Substring Length: 1000
Avg: 1186   String length: 10000   Substring Length: 1000
Avg: 1339   String length: 100000  Substring Length: 1000


证明子字符串函数取决于请求的子字符串的长度,而不是字符串的长度。

Java1.7.0 _ 06: O (1)之前。

Java 1.7.0 _ 06: O (n)之后。由于内存泄漏,这个更改了。字段 offsetcount从 String 中删除后,子字符串实现变为 O (n)。

详情请参阅: http://java-performance.info/changes-to-string-java-1-7-0_06/