如何读取文件的前N行?

我们有一个很大的原始数据文件,我们想把它修剪成指定的大小。

如何在python中获取文本文件的前N行?所使用的操作系统对实现有任何影响吗?

372943 次浏览

Python 3:

with open("datafile") as myfile:
head = [next(myfile) for x in range(N)]
print(head)

Python 2:

with open("datafile") as myfile:
head = [next(myfile) for x in xrange(N)]
print head

这里有另一种方法(Python 2和amp;3):

from itertools import islice


with open("datafile") as myfile:
head = list(islice(myfile, N))
print(head)

没有特定的方法来读取文件对象暴露的行数。

我想最简单的方法是:

lines =[]
with open(file_name) as f:
lines.extend(f.readline() for i in xrange(N))

如果你想要一些明显(不需要在手册中查找深奥的东西)不需要导入就可以工作的东西,请尝试/except,并且可以在相当大范围的Python 2上工作。X版本(2.2至2.6):

def headn(file_name, n):
"""Like *x head -N command"""
result = []
nlines = 0
assert n >= 1
for line in open(file_name):
result.append(line)
nlines += 1
if nlines >= n:
break
return result


if __name__ == "__main__":
import sys
rval = headn(sys.argv[1], int(sys.argv[2]))
print rval
print len(rval)
N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
for i in range(N):
line = next(file).strip()
print(line)

基于gnibbler的投票结果(2009年11月20日0:27):这个类将head()和tail()方法添加到文件对象。

class File(file):
def head(self, lines_2find=1):
self.seek(0)                            #Rewind file
return [self.next() for x in xrange(lines_2find)]


def tail(self, lines_2find=1):
self.seek(0, 2)                         #go to end of file
bytes_in_file = self.tell()
lines_found, total_bytes_scanned = 0, 0
while (lines_2find+1 > lines_found and
bytes_in_file > total_bytes_scanned):
byte_block = min(1024, bytes_in_file-total_bytes_scanned)
self.seek(-(byte_block+total_bytes_scanned), 2)
total_bytes_scanned += byte_block
lines_found += self.read(1024).count('\n')
self.seek(-total_bytes_scanned, 2)
line_list = list(self.readlines())
return line_list[-lines_2find:]

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

我自己最方便的方法:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于列表理解的解决方案 函数open()支持迭代接口。enumerate()包含open()和return元组(index, item),然后检查是否在可接受的范围内(如果i <LINE_COUNT),然后简单地打印结果

欣赏Python。;)

从Python 2.6开始,您可以利用IO基类中更复杂的函数。所以上面评分最高的答案可以改写为:

    with open("datafile") as myfile:
head = myfile.readlines(N)
print head

(你不必担心你的文件少于N行,因为没有StopIteration异常抛出。)

如果你想快速读取第一行,你不关心性能,你可以使用.readlines()返回列表对象,然后切片列表。

例如,前5行:

with open("pathofmyfileandfileandname") as myfile:
firstNlines=myfile.readlines()[0:5] #put here the interval you want

注意:整个文件被读取,所以是从性能的角度来看不是最好的,但它 是易于使用,快速编写和易于记忆,所以如果你只是想执行 一些一次性计算非常方便

print firstNlines

与其他答案相比,一个优点是可以轻松地选择行范围,例如跳过前10行[10:30]或后10行[:-10]或只取偶数行[::2]

如果您有一个非常大的文件,并假设您希望输出为numpy数组,则使用np。Genfromtxt将冻结您的计算机。以我的经验来看,这样好多了:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''


rows = []  # unknown number of lines, so use list


with open(fname) as f:
j=0
for line in f:
if j==maxrows:
break
else:
line = [float(s) for s in line.split()]
rows.append(np.array(line, dtype = np.double))
j+=1
return np.vstack(rows)  # convert list of vectors to array

对于前5行,简单地做:

N=5
with open("data_file", "r") as file:
for i in range(N):
print file.next()

我所做的是使用pandas调用N行。我认为性能不是最好的,但例如,如果N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)
#!/usr/bin/python


import subprocess


p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)


output, err = p.communicate()


print  output

这个方法对我很有效

最直观的两种方法是:

  1. 逐行迭代文件,breakN行之后。

  2. 使用next()方法N次逐行迭代文件。(这本质上只是顶部答案的不同语法。)

代码如下:

# Method 1:
with open("fileName", "r") as f:
counter = 0
for line in f:
print line
counter += 1
if counter == N: break


# Method 2:
with open("fileName", "r") as f:
for i in xrange(N):
line = f.next()
print line

底线是,只要你不使用readlines()或__abc1将整个文件放入内存,你就有很多选择。

这对我很有效

f = open("history_export.csv", "r")
line= 5
for x in range(line):
a = f.readline()
print(a)

这适用于Python 2 &3:

from itertools import islice


with open('/tmp/filename.txt') as inf:
for line in islice(inf, N, N+M):
print(line)

fname = input("Enter file name: ")
num_lines = 0


with open(fname, 'r') as f: #lines count
for line in f:
num_lines += 1


num_lines_input = int (input("Enter line numbers: "))


if num_lines_input <= num_lines:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)


else:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)
print("Don't have", num_lines_input, " lines print as much as you can")




print("Total lines in the text",num_lines)


我想通过读取整个文件来处理小于n行的文件

def head(filename: str, n: int):
try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines

这要归功于约翰·拉·鲁伊和伊莲·伊利耶夫。使用异常句柄函数以获得最佳性能

修改1:感谢FrankM的反馈,处理文件存在和读取权限我们可以进一步增加

import errno
import os


def head(filename: str, n: int):
if not os.path.isfile(filename):
raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)
if not os.access(filename, os.R_OK):
raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)
   

try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines

您可以使用第二个版本,也可以使用第一个版本,稍后再处理文件异常。从性能的角度来看,检查是快速的,而且大部分是免费的

使用列表(file_data)简单地将CSV文件对象转换为列表

import csv;
with open('your_csv_file.csv') as file_obj:
file_data = csv.reader(file_obj);
file_list = list(file_data)
for row in file_list[:4]:
print(row)

下面是另一个使用列表理解的体面解决方案:

file = open('file.txt', 'r')


lines = [next(file) for x in range(3)]  # first 3 lines will be in this list


file.close()