用 Java 文本文件写入海量数据的最快方法

我必须在文本[ csv ]文件中写入大量数据。我使用 BufferedWriter 写入数据,写入174mb 的数据大约需要40秒。这是 Java 能提供的最快速度吗?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );

注意: 这40秒还包括从结果集中迭代和获取记录的时间。:) .174mb 表示结果集中的400000行。

197609 次浏览

你的传输速度可能不会受到 Java 的限制。相反,我会怀疑(没有特定的顺序)

  1. 从数据库传输的速度
  2. 传输到磁盘的速度

如果读取完整的数据集,然后将其写入磁盘,那么这将花费更长的时间,因为 JVM 将不得不分配内存,并且 db rea/disk 写操作将按顺序进行。相反,对于从 db 进行的每次读操作,我将写出到缓冲写入器,这样操作将更接近于并发操作(我不知道您是否正在这样做)

您可以尝试删除 BufferedWriter 并直接使用 FileWriter。在一个现代系统中,很有可能你只是写到驱动器的缓存内存。

写175MB (400万字符串)需要4-5秒的时间——这是在一个双核2.4 GHz 的戴尔电脑上,运行着80GB、7200转/分的 Windows XP 操作系统。

您能分离出记录检索的时间和文件写入的时间吗?

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;


public class FileWritingPerfTest {
    



private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;


public static void main(String[] args) throws Exception {
List<String> records = new ArrayList<String>(RECORD_COUNT);
int size = 0;
for (int i = 0; i < RECORD_COUNT; i++) {
records.add(RECORD);
size += RECSIZE;
}
System.out.println(records.size() + " 'records'");
System.out.println(size / MEG + " MB");
    

for (int i = 0; i < ITERATIONS; i++) {
System.out.println("\nIteration " + i);
        

writeRaw(records);
writeBuffered(records, 8192);
writeBuffered(records, (int) MEG);
writeBuffered(records, 4 * (int) MEG);
}
}


private static void writeRaw(List<String> records) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
System.out.print("Writing raw... ");
write(records, writer);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}


private static void writeBuffered(List<String> records, int bufSize) throws IOException {
File file = File.createTempFile("foo", ".txt");
try {
FileWriter writer = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);
    

System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
write(records, bufferedWriter);
} finally {
// comment this out if you want to inspect the files afterward
file.delete();
}
}


private static void write(List<String> records, Writer writer) throws IOException {
long start = System.currentTimeMillis();
for (String record: records) {
writer.write(record);
}
// writer.flush(); // close() should take care of this
writer.close();
long end = System.currentTimeMillis();
System.out.println((end - start) / 1000f + " seconds");
}
}

对于这些从 DB 中大量读取的内容,您可能需要调优语句的 拿大小。这样可以节省很多到 DB 的往返路程。

Http://download.oracle.com/javase/1.5.0/docs/api/java/sql/statement.html#setfetchsize%28int%29

尝试内存映射文件(在我的 m/c,core 2 Two,2.5 GB RAM 中写入174 MB 需要300 m/s) :

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;


FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
wrBuf.put(buffer);
}
rwChannel.close();

只是为了统计数字:

机器是旧的戴尔与新的固态硬盘

CPU: 英特尔奔腾 D2,8 Ghz

SSD: 爱国者地狱120GB

4000000 'records'
175.47607421875 MB


Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds


Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds


Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds


Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds


Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds

正如我们所看到的,原始方法缓冲较慢。

package all.is.well;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import junit.framework.TestCase;


/**
* @author Naresh Bhabat
*
Following  implementation helps to deal with extra large files in java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.




Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.






It uses random access file,which is almost like streaming API.




* ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);


*  	   for 10 threads:Total time required for reading and writing the text in
*         :seconds 349.317
*
*         For 100:Total time required for reading the text and writing   : seconds 464.042
*
*         For 1000 : Total time required for reading and writing text :466.538
*         For 10000  Total time required for reading and writing in seconds 479.701
*
*
*/
public class DealWithHugeRecordsinFile extends TestCase {


static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";
static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";
static volatile RandomAccessFile fileToWrite;
static volatile RandomAccessFile file;
static volatile String fileContentsIter;
static volatile int position = 0;


public static void main(String[] args) throws IOException, InterruptedException {
long currentTimeMillis = System.currentTimeMillis();


try {
fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles
file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles
seriouslyReadProcessAndWriteAsynch();


} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thread currentThread = Thread.currentThread();
System.out.println(currentThread.getName());
long currentTimeMillis2 = System.currentTimeMillis();
double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
System.out.println("Total time required for reading the text in seconds " + time_seconds);


}


/**
* @throws IOException
* Something  asynchronously serious
*/
public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
while (true) {
String readLine = file.readLine();
if (readLine == null) {
break;
}
Runnable genuineWorker = new Runnable() {
@Override
public void run() {
// do hard processing here in this thread,i have consumed
// some time and eat some exception in write method.
writeToFile(FILEPATH_WRITE, readLine);
// System.out.println(" :" +
// Thread.currentThread().getName());


}
};
executor.execute(genuineWorker);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
file.close();
fileToWrite.close();
}


/**
* @param filePath
* @param data
* @param position
*/
private static void writeToFile(String filePath, String data) {
try {
// fileToWrite.seek(position);
data = "\n" + data;
if (!data.contains("Randomization")) {
return;
}
System.out.println("Let us do something time consuming to make this thread busy"+(position++) + "   :" + data);
System.out.println("Lets consume through this loop");
int i=1000;
while(i>0){
			

i--;
}
fileToWrite.write(data.getBytes());
throw new Exception();
} catch (Exception exception) {
System.out.println("exception was thrown but still we are able to proceeed further"
+ " \n This can be used for marking failure of the records");
//exception.printStackTrace();


}


}
}

对于那些希望提高检索记录和转储到文件中的时间(即不对记录进行处理)的用户,不要将它们放入 ArrayList 中,而是将这些记录添加到 StringBuffer 中。应用 toSring ()函数获取一个 String 并立即将其写入文件。

对我来说,检索时间从22秒减少到17秒。