Objective-C: 逐行读取文件

在 Objective-C 中处理大型文本文件的适当方法是什么?假设我需要分别读取每一行,并将每一行视为 NSString。做这件事最有效的方法是什么?

一种解决方案是使用 NSString 方法:

+ (id)stringWithContentsOfFile:(NSString *)path
encoding:(NSStringEncoding)enc
error:(NSError **)error

然后用换行符分隔行,然后迭代数组中的元素。然而,这似乎相当低效。是否没有简单的方法将文件视为一个流,在每一行上枚举,而不是一次读入所有内容?有点像 Java 的 Java.io。BufferedReader.

155608 次浏览

这个答案不是对象,而是 C。

既然 OBC 是基于“ C”的,为什么不使用 fgets 呢?

是的,我确信 OBC 有它自己的方法——我只是还不够熟练,还不知道它是什么:)

您可以使用 NSInputStream,它具有文件流的基本实现。可以将字节读入缓冲区(read:maxLength:方法)。您必须自己扫描缓冲区以寻找新行。

问得好。我认为 @ Diederik有一个很好的答案,尽管不幸的是,Cocoa 没有一个机制来确切地指导您想要做的事情。

NSInputStream 允许读取 N 个字节(非常类似于 java.io.BufferedReader) ,但是你必须自己把它转换成 NSString,然后扫描换行符(或其他任何分隔符)并保存剩余的字符以便下一次读取,或者如果还没有读取换行符,则读取更多字符。(NSFileHandle可以让你读取 NSData,然后你可以把它转换成 NSString,但它本质上是相同的过程。)

苹果有一个 流程编程指南,可以帮助填写的细节,和 这个所以问题可能也有帮助,如果你要处理的 uint8_t*缓冲区。

如果你要经常读取这样的字符串(特别是在程序的不同部分) ,最好将这种行为封装在一个类中,这个类可以为你处理细节,或者甚至子类化 NSInputStream(它是 被设计成子类) ,并添加一些方法,让你能够准确地读取你想读取的内容。

正式声明,我认为这将是一个很好的特性添加,我将提交一个增强请求的东西,使这成为可能。:-)


编辑: 原来这个请求已经存在了,这里有一个2006年的 Radar (rdar://4742914 for Apple-inside people)。

这应该会奏效:

#include <stdio.h>


NSString *readLineAsNSString(FILE *file)
{
char buffer[4096];


// tune this capacity to your liking -- larger buffer sizes will be faster, but
// use more memory
NSMutableString *result = [NSMutableString stringWithCapacity:256];


// Read up to 4095 non-newline characters, then read and discard the newline
int charsRead;
do
{
if(fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead) == 1)
[result appendFormat:@"%s", buffer];
else
break;
} while(charsRead == 4095);


return result;
}

使用方法如下:

FILE *file = fopen("myfile", "r");
// check for NULL
while(!feof(file))
{
NSString *line = readLineAsNSString(file);
// do stuff with line; line is autoreleased, so you should NOT release it (unless you also retain it beforehand)
}
fclose(file);

此代码从文件中读取非换行字符,一次最多可读取4095个字符。如果一行的长度超过4095个字符,那么它将一直读取,直到遇到换行符或文件结束符。

注意 : 我还没有测试这段代码。请在使用它之前进行测试。

在 Cocoa/Objective-C 中读取文本文件的适当方法在 Apple 的 String 编程指南中有记载。读写文件的部分应该正是你想要的。PS: 什么是“线”?用“ n”分隔的字符串的两个部分?或者“ r”?或者“ r n”?或者你只是在看段落?前面提到的指南还包括一个关于将字符串拆分为行或段落的部分。(这一部分称为“段落和断行”,链接在我上面指向的页面的左侧菜单中。不幸的是,这个网站不允许我发布多个 URL,因为我还不是一个值得信赖的用户。)

套用 Knuth 的话: 过早的优化是万恶之源。不要简单地假设“将整个文件读入内存”是很慢的。你做过基准测试了吗?你知道它 事实上读取整个文件到内存中吗?也许它只是返回一个代理对象,并在您使用字符串时在后台继续读取?(免责声明: 我不知道 NSString 是否真的做到了这一点。)关键是: 首先要用文件记录的方式做事情。然后,如果基准测试显示这没有达到您所期望的性能,那么就进行优化。

Mac OS X 是 Unix,Objective-C 是 C 超集,所以你可以从 <stdio.h>中使用老式的 fopenfgets。保证有效。

[NSString stringWithUTF8String:buf]将 C 字符串转换为 NSString。还有一些方法用于在其他编码中创建字符串以及在不复制的情况下创建字符串。

使用这个脚本,效果非常好:

NSString *path = @"/Users/xxx/Desktop/names.txt";
NSError *error;
NSString *stringFromFileAtPath = [NSString stringWithContentsOfFile: path
encoding: NSUTF8StringEncoding
error: &error];
if (stringFromFileAtPath == nil) {
NSLog(@"Error reading file at %@\n%@", path, [error localizedFailureReason]);
}
NSLog(@"Contents:%@", stringFromFileAtPath);

这将适用于从 Text读取 String的一般情况。 如果您想阅读较长的文本 (文本大小),那么使用其他人在这里提到的方法,如缓冲 (在内存空间中保留文本的大小)

假设你读了一个文本文件。

NSString* filePath = @""//file path...
NSString* fileRoot = [[NSBundle mainBundle]
pathForResource:filePath ofType:@"txt"];

你想摆脱新的路线。

// read everything from text
NSString* fileContents =
[NSString stringWithContentsOfFile:fileRoot
encoding:NSUTF8StringEncoding error:nil];


// first, separate by new line
NSArray* allLinedStrings =
[fileContents componentsSeparatedByCharactersInSet:
[NSCharacterSet newlineCharacterSet]];


// then break down even further
NSString* strsInOneLine =
[allLinedStrings objectAtIndex:0];


// choose whatever input identity you have decided. in this case ;
NSArray* singleStrs =
[currentPointString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:@";"]];

就是这样。

这里有一个很好的简单的解决方案,我使用较小的文件:

NSString *path = [[NSBundle mainBundle] pathForResource:@"Terrain1" ofType:@"txt"];
NSString *contents = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:nil];
NSArray *lines = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"\r\n"]];
for (NSString* line in lines) {
if (line.length) {
NSLog(@"line: %@", line);
}
}

逐行读取文件(也适用于极大的文件)可以通过以下函数完成:

DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
NSString * line = nil;
while ((line = [reader readLine])) {
NSLog(@"read line: %@", line);
}
[reader release];

或者:

DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
[reader enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) {
NSLog(@"read line: %@", line);
}];
[reader release];

启用此功能的 DDFileReader 类如下:

接口文件(. h) :

@interface DDFileReader : NSObject {
NSString * filePath;


NSFileHandle * fileHandle;
unsigned long long currentOffset;
unsigned long long totalFileLength;


NSString * lineDelimiter;
NSUInteger chunkSize;
}


@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;


- (id) initWithFilePath:(NSString *)aPath;


- (NSString *) readLine;
- (NSString *) readTrimmedLine;


#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif


@end

实施(. m)

#import "DDFileReader.h"


@interface NSData (DDAdditions)


- (NSRange) rangeOfData_dd:(NSData *)dataToFind;


@end


@implementation NSData (DDAdditions)


- (NSRange) rangeOfData_dd:(NSData *)dataToFind {


const void * bytes = [self bytes];
NSUInteger length = [self length];


const void * searchBytes = [dataToFind bytes];
NSUInteger searchLength = [dataToFind length];
NSUInteger searchIndex = 0;


NSRange foundRange = {NSNotFound, searchLength};
for (NSUInteger index = 0; index < length; index++) {
if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
//the current character matches
if (foundRange.location == NSNotFound) {
foundRange.location = index;
}
searchIndex++;
if (searchIndex >= searchLength) { return foundRange; }
} else {
searchIndex = 0;
foundRange.location = NSNotFound;
}
}
return foundRange;
}


@end


@implementation DDFileReader
@synthesize lineDelimiter, chunkSize;


- (id) initWithFilePath:(NSString *)aPath {
if (self = [super init]) {
fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
if (fileHandle == nil) {
[self release]; return nil;
}


lineDelimiter = [[NSString alloc] initWithString:@"\n"];
[fileHandle retain];
filePath = [aPath retain];
currentOffset = 0ULL;
chunkSize = 10;
[fileHandle seekToEndOfFile];
totalFileLength = [fileHandle offsetInFile];
//we don't need to seek back, since readLine will do that.
}
return self;
}


- (void) dealloc {
[fileHandle closeFile];
[fileHandle release], fileHandle = nil;
[filePath release], filePath = nil;
[lineDelimiter release], lineDelimiter = nil;
currentOffset = 0ULL;
[super dealloc];
}


- (NSString *) readLine {
if (currentOffset >= totalFileLength) { return nil; }


NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
[fileHandle seekToFileOffset:currentOffset];
NSMutableData * currentData = [[NSMutableData alloc] init];
BOOL shouldReadMore = YES;


NSAutoreleasePool * readPool = [[NSAutoreleasePool alloc] init];
while (shouldReadMore) {
if (currentOffset >= totalFileLength) { break; }
NSData * chunk = [fileHandle readDataOfLength:chunkSize];
NSRange newLineRange = [chunk rangeOfData_dd:newLineData];
if (newLineRange.location != NSNotFound) {


//include the length so we can include the delimiter in the string
chunk = [chunk subdataWithRange:NSMakeRange(0, newLineRange.location+[newLineData length])];
shouldReadMore = NO;
}
[currentData appendData:chunk];
currentOffset += [chunk length];
}
[readPool release];


NSString * line = [[NSString alloc] initWithData:currentData encoding:NSUTF8StringEncoding];
[currentData release];
return [line autorelease];
}


- (NSString *) readTrimmedLine {
return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}


#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
NSString * line = nil;
BOOL stop = NO;
while (stop == NO && (line = [self readLine])) {
block(line, &stop);
}
}
#endif


@end

课程由 Dave DeLong完成

正如@porneL 所说,C api 非常方便。

NSString* fileRoot = [[NSBundle mainBundle] pathForResource:@"record" ofType:@"txt"];
FILE *file = fopen([fileRoot UTF8String], "r");
char buffer[256];
while (fgets(buffer, 256, file) != NULL){
NSString* result = [NSString stringWithUTF8String:buffer];
NSLog(@"%@",result);
}

正如其他人回答的那样,NSInputStream 和 NSFileHandle 都是不错的选择,但它也可以通过 NSData 和内存映射以一种相当紧凑的方式实现:

BRLineReader

#import <Foundation/Foundation.h>


@interface BRLineReader : NSObject


@property (readonly, nonatomic) NSData *data;
@property (readonly, nonatomic) NSUInteger linesRead;
@property (strong, nonatomic) NSCharacterSet *lineTrimCharacters;
@property (readonly, nonatomic) NSStringEncoding stringEncoding;


- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding;
- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding;
- (NSString *)readLine;
- (NSString *)readTrimmedLine;
- (void)setLineSearchPosition:(NSUInteger)position;


@end

BRLineReader. m

#import "BRLineReader.h"


static unsigned char const BRLineReaderDelimiter = '\n';


@implementation BRLineReader
{
NSRange _lastRange;
}


- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding
{
self = [super init];
if (self) {
NSError *error = nil;
_data = [NSData dataWithContentsOfFile:filePath options:NSDataReadingMappedAlways error:&error];
if (!_data) {
NSLog(@"%@", [error localizedDescription]);
}
_stringEncoding = encoding;
_lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
}


return self;
}


- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding
{
self = [super init];
if (self) {
_data = data;
_stringEncoding = encoding;
_lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
}


return self;
}


- (NSString *)readLine
{
NSUInteger dataLength = [_data length];
NSUInteger beginPos = _lastRange.location + _lastRange.length;
NSUInteger endPos = 0;
if (beginPos == dataLength) {
// End of file
return nil;
}


unsigned char *buffer = (unsigned char *)[_data bytes];
for (NSUInteger i = beginPos; i < dataLength; i++) {
endPos = i;
if (buffer[i] == BRLineReaderDelimiter) break;
}


// End of line found
_lastRange = NSMakeRange(beginPos, endPos - beginPos + 1);
NSData *lineData = [_data subdataWithRange:_lastRange];
NSString *line = [[NSString alloc] initWithData:lineData encoding:_stringEncoding];
_linesRead++;


return line;
}


- (NSString *)readTrimmedLine
{
return [[self readLine] stringByTrimmingCharactersInSet:_lineTrimCharacters];
}


- (void)setLineSearchPosition:(NSUInteger)position
{
_lastRange = NSMakeRange(position, 0);
_linesRead = 0;
}


@end

根据@Adam Rosenfield 的回答,fscanf的格式字符串将会改变如下:

"%4095[^\r\n]%n%*[\n\r]"

它将工作在 osx,linux,Windows 行结束。

使用分类或扩展使我们的生活变得更容易一些。

extension String {


func lines() -> [String] {
var lines = [String]()
self.enumerateLines { (line, stop) -> () in
lines.append(line)
}
return lines
}


}


// then
for line in string.lines() {
// do the right thing
}

我发现@lukaswelte 的响应和来自 Dave DeLong的代码非常有用。我正在寻找这个问题的解决方案,但需要解析大型文件的 \r\n而不仅仅是 \n

编写的代码包含一个 bug,如果解析超过一个字符。

H 文件:

#import <Foundation/Foundation.h>


@interface FileChunkReader : NSObject {
NSString * filePath;


NSFileHandle * fileHandle;
unsigned long long currentOffset;
unsigned long long totalFileLength;


NSString * lineDelimiter;
NSUInteger chunkSize;
}


@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;


- (id) initWithFilePath:(NSString *)aPath;


- (NSString *) readLine;
- (NSString *) readTrimmedLine;


#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif


@end

. m 档案:

#import "FileChunkReader.h"


@interface NSData (DDAdditions)


- (NSRange) rangeOfData_dd:(NSData *)dataToFind;


@end


@implementation NSData (DDAdditions)


- (NSRange) rangeOfData_dd:(NSData *)dataToFind {


const void * bytes = [self bytes];
NSUInteger length = [self length];


const void * searchBytes = [dataToFind bytes];
NSUInteger searchLength = [dataToFind length];
NSUInteger searchIndex = 0;


NSRange foundRange = {NSNotFound, searchLength};
for (NSUInteger index = 0; index < length; index++) {
if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
//the current character matches
if (foundRange.location == NSNotFound) {
foundRange.location = index;
}
searchIndex++;
if (searchIndex >= searchLength)
{
return foundRange;
}
} else {
searchIndex = 0;
foundRange.location = NSNotFound;
}
}


if (foundRange.location != NSNotFound
&& length < foundRange.location + foundRange.length )
{
// if the dataToFind is partially found at the end of [self bytes],
// then the loop above would end, and indicate the dataToFind is found
// when it only partially was.
foundRange.location = NSNotFound;
}


return foundRange;
}


@end


@implementation FileChunkReader


@synthesize lineDelimiter, chunkSize;


- (id) initWithFilePath:(NSString *)aPath {
if (self = [super init]) {
fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
if (fileHandle == nil) {
return nil;
}


lineDelimiter = @"\n";
currentOffset = 0ULL; // ???
chunkSize = 128;
[fileHandle seekToEndOfFile];
totalFileLength = [fileHandle offsetInFile];
//we don't need to seek back, since readLine will do that.
}
return self;
}


- (void) dealloc {
[fileHandle closeFile];
currentOffset = 0ULL;


}


- (NSString *) readLine {
if (currentOffset >= totalFileLength)
{
return nil;
}


@autoreleasepool {


NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
[fileHandle seekToFileOffset:currentOffset];
unsigned long long originalOffset = currentOffset;
NSMutableData *currentData = [[NSMutableData alloc] init];
NSData *currentLine = [[NSData alloc] init];
BOOL shouldReadMore = YES;




while (shouldReadMore) {
if (currentOffset >= totalFileLength)
{
break;
}


NSData * chunk = [fileHandle readDataOfLength:chunkSize];
[currentData appendData:chunk];


NSRange newLineRange = [currentData rangeOfData_dd:newLineData];


if (newLineRange.location != NSNotFound) {


currentOffset = originalOffset + newLineRange.location + newLineData.length;
currentLine = [currentData subdataWithRange:NSMakeRange(0, newLineRange.location)];


shouldReadMore = NO;
}else{
currentOffset += [chunk length];
}
}


if (currentLine.length == 0 && currentData.length > 0)
{
currentLine = currentData;
}


return [[NSString alloc] initWithData:currentLine encoding:NSUTF8StringEncoding];
}
}


- (NSString *) readTrimmedLine {
return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}


#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
NSString * line = nil;
BOOL stop = NO;
while (stop == NO && (line = [self readLine])) {
block(line, &stop);
}
}
#endif


@end

很多答案都是很长的代码块,或者是从整个文件中读取的。我喜欢在这个任务中使用 c 方法。

FILE* file = fopen("path to my file", "r");


size_t length;
char *cLine = fgetln(file,&length);


while (length>0) {
char str[length+1];
strncpy(str, cLine, length);
str[length] = '\0';


NSString *line = [NSString stringWithFormat:@"%s",str];
% Do what you want here.


cLine = fgetln(file,&length);
}

注意 fgetln 不会保留换行符。另外,我们 + 1 str 的长度,因为我们想为 NULL 终止留出空间。

我加上这个是因为我尝试过的所有其他答案都以这样或那样的方式达不到要求。下面的方法可以处理大文件、任意长行和空行。它已经用实际内容进行了测试,并将从输出中删除换行符。

- (NSString*)readLineFromFile:(FILE *)file
{
char buffer[4096];
NSMutableString *result = [NSMutableString stringWithCapacity:1000];


int charsRead;
do {
if(fscanf(file, "%4095[^\r\n]%n%*[\n\r]", buffer, &charsRead) == 1) {
[result appendFormat:@"%s", buffer];
}
else {
break;
}
} while(charsRead == 4095);


return result.length ? result : nil;
}

荣誉归于“亚当 · 罗森菲尔德和”苏普

我看到很多这样的答案依赖于将整个文本文件读入内存,而不是一次读取一个块。下面是我在漂亮的现代 Swift 中的解决方案,使用 FileHandle 来降低对内存的影响:

enum MyError {
case invalidTextFormat
}


extension FileHandle {


func readLine(maxLength: Int) throws -> String {


// Read in a string of up to the maximum length
let offset = offsetInFile
let data = readData(ofLength: maxLength)
guard let string = String(data: data, encoding: .utf8) else {
throw MyError.invalidTextFormat
}


// Check for carriage returns; if none, this is the whole string
let substring: String
if let subindex = string.firstIndex(of: "\n") {
substring = String(string[string.startIndex ... subindex])
} else {
substring = string
}


// Wind back to the correct offset so that we don't miss any lines
guard let dataCount = substring.data(using: .utf8, allowLossyConversion: false)?.count else {
throw MyError.invalidTextFormat
}
try seek(toOffset: offset + UInt64(dataCount))
return substring
}


}

请注意,这将保留行尾的回车,因此根据您的需要,您可能需要调整代码以删除它。

使用方法: 只需打开目标文本文件的文件句柄,调用 readLine并设置适当的最大长度——1024是纯文本的标准长度,但我将它保持打开状态,以防您知道它会更短。请注意,该命令不会溢出文件的末尾,因此如果您打算解析整个事件,则可能必须手动检查是否没有到达该文件末尾。下面是一些示例代码,展示了如何在 myFileURL打开一个文件并逐行读取它直到结束。

do {
let handle = try FileHandle(forReadingFrom: myFileURL)
try handle.seekToEndOfFile()
let eof = handle.offsetInFile
try handle.seek(toFileOffset: 0)


while handle.offsetInFile < eof {
let line = try handle.readLine(maxLength: 1024)
// Do something with the string here
}
try handle.close()
catch let error {
print("Error reading file: \(error.localizedDescription)"
}