Reading and printing the first 5 lines of Shakespeare's Hamlet

How to read a text file, line by line.
This article is part of a sequence.
Extracting and Reading Shakespeare
A walkthrough of modules, file system operations, and Shakespeare.

Summary

Reading a file is more complicated than you might think. But with complexity, we gain a little more control. At the programmatic level, we not only have the option of reading the file all at once – which is how we normally think of opening and accessing a file’s contents – but we can also read it line-by-line. This is useful for when we need to do something for each line in a file, and/or when we only need to deal with the first few lines of a file.

Table of contents

This lesson assumes that you've completed the previous lesson and your tempdata subfolder is full of Shakespeare prose and poetry.

The problem

Here's the problem we're trying to solve – if you're doing this as homework, see the full info for this exercise:

0004-shakefiles/d.py
Print the first 5 lines of the Hamlet text

From the text file at tempdata/tragedies/hamlet, read and print the first 5 lines of text.

Expectations

When you run d.py from the command-line:

0004-shakefiles $ python d.py
  • The program's output to screen should be:
    HAMLET
    
    
    DRAMATIS PERSONAE
    

Opening and reading the Hamlet text file

As I've said in another lesson: opening a file and reading a file are two different actions.

Opening the Hamlet file

I suggest jumping into ipython and exploring this interactively before writing out the script.

This should seem familiar from the previous lesson:

>>> import os
>>> fname = os.path.join('tempdata', 'tragedies', 'hamlet')
>>> hamletfile = open(fname, 'r')

Reading the first 5 lines of the Hamlet file

hamletfile is a file object, which means it has a read() method. However, the read() method reads the entire file as one giant text string, which we don't want to do. We want to selectively print a few lines, i.e. the first and last 5 lines.

To read a single line, we use the appropriately-named method, readline(). The first line contains the name of the play:

>>> hamletfile.readline()
'\tHAMLET\n'

Each time you call it, it reads the next line. The next two lines are just newline characters (i.e. "blank lines"):

>>> hamletfile.readline()
'\n'
>>> hamletfile.readline()
'\n'

The fourth line basically says, "This is the section in which we list the characters".

>>> hamletfile.readline()
'\tDRAMATIS PERSONAE\n'

The fifth line is another blank line:

>>> hamletfile.readline()
'\n'

It's worth just opening the tempdata/tragedies/hamlet file in your text editor to confirm that we just read through its first 5 lines:

image hamlet-first-5-lines.png

Before writing the part of the script that reads the final 5 lines, let's write out the script that achieves the reading and writing and printing to screen of the first 5 lines.

import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
print(hamletfile.readline())
# more stuff to come...
hamletfile.close()

Practice your for-loops

Sure, copy-pasting a command 5 times is easy enough. But you should get into the habit of using a for-loop whenever possible:

import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
for x in range(5):
    print(hamletfile.readline())
hamletfile.close()

Stripping newline characters with strip()

If you run the script as is, the output to screen will look like this:

  HAMLET





  DRAMATIS PERSONAE




Ten lines are printed to screen. This is because the print() function always adds a newline character to the end. We can prevent this redundant newline output by calling the strip() method (which belongs to all text string objects) on each line read from the file:

import os
fname = os.path.join('tempdata', 'tragedies', 'hamlet')
hamletfile = open(fname, 'r')
for x in range(5):
    print(hamletfile.readline().strip())
hamletfile.close()

And that's "all" it takes to read and print the first n lines of Hamlet. As we'll see in subsequent exercises, it's a bit more complicated to read from an arbitrary point in a file, e.g. "the 1,000th line" or "the 5th from the last line".

This article is part of a sequence.
Extracting and Reading Shakespeare
A walkthrough of modules, file system operations, and Shakespeare.