如何读取文件的前N行?

小开

最佳答案

Python 3:

with open("datafile") as myfile:
head = [next(myfile) for x in range(N)]
print(head)

Python 2:

with open("datafile") as myfile:
head = [next(myfile) for x in xrange(N)]
print head

这里有另一种方法(Python 2和amp;3):

from itertools import islice


with open("datafile") as myfile:
head = list(islice(myfile, N))
print(head)

小开

没有特定的方法来读取文件对象暴露的行数。

我想最简单的方法是:

lines =[]
with open(file_name) as f:
lines.extend(f.readline() for i in xrange(N))

小开

如果你想要一些明显(不需要在手册中查找深奥的东西)不需要导入就可以工作的东西，请尝试/except，并且可以在相当大范围的Python 2上工作。X版本(2.2至2.6):

def headn(file_name, n):
"""Like *x head -N command"""
result = []
nlines = 0
assert n >= 1
for line in open(file_name):
result.append(line)
nlines += 1
if nlines >= n:
break
return result


if __name__ == "__main__":
import sys
rval = headn(sys.argv[1], int(sys.argv[2]))
print rval
print len(rval)

小开

N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
for i in range(N):
line = next(file).strip()
print(line)

小开

基于gnibbler的投票结果(2009年11月20日0:27):这个类将head()和tail()方法添加到文件对象。

class File(file):
def head(self, lines_2find=1):
self.seek(0)                            #Rewind file
return [self.next() for x in xrange(lines_2find)]


def tail(self, lines_2find=1):
self.seek(0, 2)                         #go to end of file
bytes_in_file = self.tell()
lines_found, total_bytes_scanned = 0, 0
while (lines_2find+1 > lines_found and
bytes_in_file > total_bytes_scanned):
byte_block = min(1024, bytes_in_file-total_bytes_scanned)
self.seek(-(byte_block+total_bytes_scanned), 2)
total_bytes_scanned += byte_block
lines_found += self.read(1024).count('\n')
self.seek(-total_bytes_scanned, 2)
line_list = list(self.readlines())
return line_list[-lines_2find:]

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

小开

我自己最方便的方法:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

基于列表理解的解决方案函数open()支持迭代接口。enumerate()包含open()和return元组(index, item)，然后检查是否在可接受的范围内(如果i <LINE_COUNT)，然后简单地打印结果

欣赏Python。；）

小开

从Python 2.6开始，您可以利用IO基类中更复杂的函数。所以上面评分最高的答案可以改写为:

    with open("datafile") as myfile:
head = myfile.readlines(N)
print head

(你不必担心你的文件少于N行，因为没有StopIteration异常抛出。)

小开

如果你想快速读取第一行，你不关心性能，你可以使用.readlines()返回列表对象，然后切片列表。

例如，前5行:

with open("pathofmyfileandfileandname") as myfile:
firstNlines=myfile.readlines()[0:5] #put here the interval you want

注意:整个文件被读取，所以是从性能的角度来看不是最好的，但它是易于使用，快速编写和易于记忆，所以如果你只是想执行一些一次性计算非常方便

print firstNlines

与其他答案相比，一个优点是可以轻松地选择行范围，例如跳过前10行[10:30]或后10行[:-10]或只取偶数行[::2]。

小开

如果您有一个非常大的文件，并假设您希望输出为numpy数组，则使用np。Genfromtxt将冻结您的计算机。以我的经验来看，这样好多了:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''


rows = []  # unknown number of lines, so use list


with open(fname) as f:
j=0
for line in f:
if j==maxrows:
break
else:
line = [float(s) for s in line.split()]
rows.append(np.array(line, dtype = np.double))
j+=1
return np.vstack(rows)  # convert list of vectors to array

小开

对于前5行，简单地做:

N=5
with open("data_file", "r") as file:
for i in range(N):
print file.next()

小开

我所做的是使用pandas调用N行。我认为性能不是最好的，但例如，如果N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)

小开

#!/usr/bin/python


import subprocess


p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)


output, err = p.communicate()


print  output

这个方法对我很有效

小开

最直观的两种方法是:

逐行迭代文件，break在N行之后。

使用next()方法N次逐行迭代文件。(这本质上只是顶部答案的不同语法。)

代码如下:

# Method 1:
with open("fileName", "r") as f:
counter = 0
for line in f:
print line
counter += 1
if counter == N: break


# Method 2:
with open("fileName", "r") as f:
for i in xrange(N):
line = f.next()
print line

底线是，只要你不使用readlines()或__abc1将整个文件放入内存，你就有很多选择。

小开

这对我很有效

f = open("history_export.csv", "r")
line= 5
for x in range(line):
a = f.readline()
print(a)

小开

这适用于Python 2 &3：

from itertools import islice


with open('/tmp/filename.txt') as inf:
for line in islice(inf, N, N+M):
print(line)

小开


fname = input("Enter file name: ")
num_lines = 0


with open(fname, 'r') as f: #lines count
for line in f:
num_lines += 1


num_lines_input = int (input("Enter line numbers: "))


if num_lines_input <= num_lines:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)


else:
f = open(fname, "r")
for x in range(num_lines_input):
a = f.readline()
print(a)
print("Don't have", num_lines_input, " lines print as much as you can")




print("Total lines in the text",num_lines)

小开

我想通过读取整个文件来处理小于n行的文件

def head(filename: str, n: int):
try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines

这要归功于约翰·拉·鲁伊和伊莲·伊利耶夫。使用异常句柄函数以获得最佳性能

修改1:感谢FrankM的反馈，处理文件存在和读取权限我们可以进一步增加

import errno
import os


def head(filename: str, n: int):
if not os.path.isfile(filename):
raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)
if not os.access(filename, os.R_OK):
raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)
   

try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines

您可以使用第二个版本，也可以使用第一个版本，稍后再处理文件异常。从性能的角度来看，检查是快速的，而且大部分是免费的

小开

使用列表(file_data)简单地将CSV文件对象转换为列表

import csv;
with open('your_csv_file.csv') as file_obj:
file_data = csv.reader(file_obj);
file_list = list(file_data)
for row in file_list[:4]:
print(row)

小开

下面是另一个使用列表理解的体面解决方案:

file = open('file.txt', 'r')


lines = [next(file) for x in range(3)]  # first 3 lines will be in this list


file.close()