如何在 Python 中一次读取文件中的一个字符？

小开

You should try f.read(1), which is definitely correct and the right thing to do.

小开

Python itself can help you with this, in interactive mode:

>>> help(file.read)
Help on method_descriptor:


read(...)
read([size]) -> read at most size bytes, returned as a string.


If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.

小开

最佳答案

with open(filename) as f:
while True:
c = f.read(1)
if not c:
print "End of file"
break
print "Read a character:", c

小开

Just read a single character

f.read(1)

小开

Just:

myfile = open(filename)
onecharacter = myfile.read(1)

小开

First, open a file:

with open("filename") as fileobj:
for line in fileobj:
for ch in line:
print(ch)

This goes through every line in the file and then every character in that line.

小开

I learned a new idiom for this today while watching Raymond Hettinger's Transforming Code into Beautiful, Idiomatic Python:

import functools


with open(filename) as f:
f_read_ch = functools.partial(f.read, 1)
for ch in iter(f_read_ch, ''):
print 'Read a character:', repr(ch)

小开

I like the accepted answer: it is straightforward and will get the job done. I would also like to offer an alternative implementation:

def chunks(filename, buffer_size=4096):
"""Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
until no more characters can be read; the last chunk will most likely have
less than `buffer_size` bytes.


:param str filename: Path to the file
:param int buffer_size: Buffer size, in bytes (default is 4096)
:return: Yields chunks of `buffer_size` size until exhausting the file
:rtype: str


"""
with open(filename, "rb") as fp:
chunk = fp.read(buffer_size)
while chunk:
yield chunk
chunk = fp.read(buffer_size)


def chars(filename, buffersize=4096):
"""Yields the contents of file `filename` character-by-character. Warning:
will only work for encodings where one character is encoded as one byte.


:param str filename: Path to the file
:param int buffer_size: Buffer size for the underlying chunks,
in bytes (default is 4096)
:return: Yields the contents of `filename` character-by-character.
:rtype: char


"""
for chunk in chunks(filename, buffersize):
for char in chunk:
yield char


def main(buffersize, filenames):
"""Reads several files character by character and redirects their contents
to `/dev/null`.


"""
for filename in filenames:
with open("/dev/null", "wb") as fp:
for char in chars(filename, buffersize):
fp.write(char)


if __name__ == "__main__":
# Try reading several files varying the buffer size
import sys
buffersize = int(sys.argv[1])
filenames  = sys.argv[2:]
sys.exit(main(buffersize, filenames))

The code I suggest is essentially the same idea as your accepted answer: read a given number of bytes from the file. The difference is that it first reads a good chunk of data (4006 is a good default for X86, but you may want to try 1024, or 8192; any multiple of your page size), and then it yields the characters in that chunk one by one.

The code I present may be faster for larger files. Take, for example, the entire text of War and Peace, by Tolstoy. These are my timing results (Mac Book Pro using OS X 10.7.4; so.py is the name I gave to the code I pasted):

$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total

Now: do not take the buffer size at 4096 as a universal truth; look at the results I get for different sizes (buffer size (bytes) vs wall time (sec)):

As you can see, you can start seeing gains earlier on (and my timings are likely very inaccurate); the buffer size is a trade-off between performance and memory. The default of 4096 is just a reasonable choice but, as always, measure first.

小开

f = open('hi.txt', 'w')
f.write('0123456789abcdef')
f.close()
f = open('hej.txt', 'r')
f.seek(12)
print f.read(1) # This will read just "c"

小开

This will also work:

with open("filename") as fileObj:
for line in fileObj:
for ch in line:
print(ch)

It goes through every line in the the file and every character in every line.

(Note that this post now looks extremely similar to a highly upvoted answer, but this was not the case at the time of writing.)

小开

To make a supplement, if you are reading file that contains a line that is vvvvery huge, which might break your memory, you might consider read them into a buffer then yield the each char

def read_char(inputfile, buffersize=10240):
with open(inputfile, 'r') as f:
while True:
buf = f.read(buffersize)
if not buf:
break
for char in buf:
yield char
yield '' #handle the scene that the file is empty


if __name__ == "__main__":
for word in read_char('./very_large_file.txt'):
process(char)

小开

#reading out the file at once in a list and then printing one-by-one
f=open('file.txt')
for i in list(f.read()):
print(i)

小开

os.system("stty -icanon -echo")
while True:
raw_c = sys.stdin.buffer.peek()
c = sys.stdin.read(1)
print(f"Char: {c}")

小开

Best answer for Python 3.8+:

with open(path, encoding="utf-8") as f:
while c := f.read(1):
do_my_thing(c)

You may want to specify utf-8 and avoid the platform encoding. I've chosen to do that here.

Function – Python 3.8+:

def stream_file_chars(path: str):
with open(path) as f:
while c := f.read(1):
yield c

Function – Python<=3.7:

def stream_file_chars(path: str):
with open(path, encoding="utf-8") as f:
while True:
c = f.read(1)
if c == "":
break
yield c

Function – pathlib + documentation:

from pathlib import Path
from typing import Union, Generator


def stream_file_chars(path: Union[str, Path]) -> Generator[str, None, None]:
"""Streams characters from a file."""
with Path(path).open(encoding="utf-8") as f:
while (c := f.read(1)) != "":
yield c

小开

Combining qualities of some other answers, here is something that is invulnerable to long files / lines, while being more succinct and faster:

import functools as ft, itertools as it


with open(path) as f:
for c in it.chain.from_iterable(
iter(ft.partial(f.read, 4096), '')
):
print(c)