Reading a file is more complicated than you might think. But with complexity, we gain a little more control. At the programmatic level, we not only have the option of reading the file all at once – which is how we normally think of opening and accessing a file’s contents – but we can also read it line-by-line. This is useful for when we need to do something for each line in a file, and/or when we only need to deal with the first few lines of a file.
This lesson assumes that you've completed the previous lesson and your tempdata
subfolder is full of Shakespeare prose and poetry.
Here's the problem we're trying to solve – if you're doing this as homework, see the full info for this exercise:
From the text file at tempdata/tragedies/hamlet
, read and print the first 5 lines of text.
When you run d.py
from the command-line:
0004-shakefiles $ python d.py
HAMLET DRAMATIS PERSONAE
As I've said in another lesson: opening a file and reading a file are two different actions.
I suggest jumping into ipython and exploring this interactively before writing out the script.
This should seem familiar from the previous lesson:
>>> import os
>>> fname = os.path.join('tempdata', 'tragedies', 'hamlet')
>>> hamletfile = open(fname, 'r')
hamletfile
is a file object, which means it has a read()
method. However, the read()
method reads the entire file as one giant text string, which we don't want to do. We want to selectively print a few lines, i.e. the first and last 5 lines.
To read a single line, we use the appropriately-named method, readline()
. The first line contains the name of the play:
>>> hamletfile.readline()
'\tHAMLET\n'
Each time you call it, it reads the next line. The next two lines are just newline characters (i.e. "blank lines"):
>>> hamletfile.readline()
'\n'
>>> hamletfile.readline()
'\n'
The fourth line basically says, "This is the section in which we list the characters".
>>> hamletfile.readline()
'\tDRAMATIS PERSONAE\n'
The fifth line is another blank line:
>>> hamletfile.readline()
'\n'
It's worth just opening the tempdata/tragedies/hamlet
file in your text editor to confirm that we just read through its first 5 lines:
Before writing the part of the script that reads the final 5 lines, let's write out the script that achieves the reading and writing and printing to screen of the first 5 lines.
import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
# more stuff to come...
hamletfile.close()
Sure, copy-pasting a command 5 times is easy enough. But you should get into the habit of using a for-loop whenever possible:
import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
for x in range(5):
print(hamletfile.readline())
hamletfile.close()
If you run the script as is, the output to screen will look like this:
HAMLET
DRAMATIS PERSONAE
Ten lines are printed to screen. This is because the print()
function always adds a newline character to the end. We can prevent this redundant newline output by calling the strip() method (which belongs to all text string objects) on each line read from the file:
import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
for x in range(5):
print(hamletfile.readline().strip())
hamletfile.close()
And that's "all" it takes to read and print the first n lines of Hamlet. As we'll see in subsequent exercises, it's a bit more complicated to read from an arbitrary point in a file, e.g. "the 1,000th line" or "the 5th from the last line".