Opening files and reading from files

How to opening files and read from files and avoid annoying mistakes when reading files

Summary

Opening files and reading their data is something we learn how to do with a simple double-click in our earliest interactions with computers. However, at the programmatic layer, things are substantially more complicated…

Table of contents

The basic pattern of opening and reading files in Python

Here's the official Python documentation on reading and writing from files. But before reading that, let's dive into the bare minimum that I want you to know.

Let's just go straight to a code example. Pretend you have a file named example.txt in the current directory. If you don't, just create one, and then fill it with these lines and save it:

hello world
and now
I say
goodbye

Here's a short snippet of Python code to open that file and print out its contents to screen – note that this Python code has to be run in the same directory that the example.txt file exists in.

myfile = open("example.txt")
txt = myfile.read()
print(txt)
myfile.close()

Did that seem too complicated? Here's a less verbose version:

myfile = open("example.txt")
print(myfile.read())
myfile.close()

Here's how to read that file, line-by-line, using a for-loop:

myfile = open("example.txt")
for line in myfile:
    print(line)
myfile.close()

(Note: If you're getting a FileNotFoundError already – that's almost to be expected. Keep reading!)

Still seem too complicated? Well, there's no getting around the fact that at the programmatic layer, opening a file is distinct from reading its contents. Not only that, we also have to manually close the file.

Now let's take this step-by-step.

How to open a file – an interactive exploration

To open a file, we simply use the open() method and pass in, as the first argument, the filename:

myfile = open("example.txt")

That seems easy enough, so let's jump into some common errors.

How to mess up when opening a file

Here is likely the most common error you'll get when trying to open a file.

FileNotFoundError: [Errno 2] No such file or directory: 'SOME_FILENAME'

In fact, I've seen students waste dozens of hours trying to get past this error message, because they don't stop to read it. So, read it: What does FileNotFoundError mean?

Try putting spaces where the capitalization occurs:

  File Not Found Error

You'll get this error because you tried to open a file that simply doesn't exist. Sometimes, it's a simple typo, trying to open() a file named "example.txt" but accidentally misspelling it as "exmple.txt".

But more often, it's because you know a file exists under a given filename, such as "example.txt" – but how does your Python code know where that file is? Is it the "example.txt" that exists in your Downloads folder? Or the one that might exist in your Documents folder? Or the thousands of other folders on your computer system?

That's a pretty complicated question. But the first step in not wasting your time is that if you ever see this error, stop whatever else you are doing. Don't tweak your convoluted for-loop. Don't try to install a new Python library. Don't restart your computer, then re-run the script to see if the error magically fixes itself.

The error FileNotFoundError occurs because you either don't know where a file actually is on your computer. Or, even if you do, you don't know how to tell your Python program where it is. Don't try to fix other parts of your code that aren't related to specifying filenames or paths.

How to fix a FileNotFoundError

Here's a surefire fix: make sure the file actually exists.

Let's start from scratch by making an error. In your system shell (i.e. Terminal), change to your Desktop folder:

$ cd ~/Desktop

Now, run ipython:

$ ipython

And now that you're in the interactive Python interpreter, try to open a filename that you know does not exist on your Desktop, and then enjoy the error message:

>>> myfile = open("whateverdude.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-4234adaa1c35> in <module>()
----> 1 myfile = open("whateverdude.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'whateverdude.txt'

Now manually create the file on your Desktop, using Sublime Text 3 or whatever you want. Add some text to it, then save it.

this
is my
file

Look and see for yourself that this file actually exists in your Desktop folder:

image desktop-whateverdude.png

OK, now switch back to your interactive Python shell (i.e. ipython), the one that you opened after changing into the Desktop folder (i.e. cd ~/Desktop). Re-run that open() command, the one that resulted in the FileNotFoundError:

>>> myfile = open("whateverdude.txt")

Hopefully, you shouldn't get an error.

But what is that object that the myfile variable points to? Use the type() method to figure it out:

>>> type(myfile)
 _io.TextIOWrapper

And what is that? The details aren't important, other than to point out that myfile is most definitely not just a string literal, i.e. str.

Use the Tab autocomplete (i.e. type in myfile.) to get a list of existing methods and attributes for the myfile object:

myfile.buffer          myfile.isatty          myfile.readlines
myfile.close           myfile.line_buffering  myfile.seek
myfile.closed          myfile.mode            myfile.seekable
myfile.detach          myfile.name            myfile.tell
myfile.encoding        myfile.newlines        myfile.truncate
myfile.errors          myfile.read            myfile.writable
myfile.fileno          myfile.readable        myfile.write
myfile.flush           myfile.readline        myfile.writelines

Well, we can do a lot more with files than just read() from them. But let's focus on just reading for now.

How to read from a file – an interactive exploration

Assuming the myfile variable points to some kind of file object, this is how you read from it:

>>> mystuff = myfile.read()

What's in that mystuff variable? Again, use the type() function:

>>> type(mystuff)
str

It's just a string. Which means of course that we can print it out:

>>> print(mystuff)
this
is my
file

Or count the number of characters:

>>> len(mystuff)
15

Or print it out in all-caps:

>>> print(mystuff.upper())
THIS
IS MY
FILE

And that's all there's to reading from a file that has been opened.

Now onto the mistakes.

How to mess up when reading from a file

Here's a very, very common error:

>>> filename = "example.txt"
>>> filename.read()

The error output:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-441b57e838ab> in <module>()
----> 1 filename.read()

AttributeError: 'str' object has no attribute 'read'

Take careful note that this is not a FileNotFoundError. It is an AttributeError – which, admittedly, is not very clear – but read the next part:

'str' object has no attribute 'read'

The error message gets to the point: the str object – i.e. a string literal, e.g. something like "hello world" does not have a read attribute.

Revisiting the erroneous code:

>>> filename = "example.txt"
>>> filename.read()

If filename points to "example.txt", then filename is simply a str object.

In other words, a file name is not a file object. Here's a clearer example of errneous code:

>>> "example.txt".read()

And to beat the point about the head:

>>> "hello world this is just a string".read()

Why is this such a common mistake? Because in 99% of our typical interactions with files, we see a filename on our Desktop graphical interface and we double-click that filename to open it. The graphical interface obfuscates the process – and for good reason. Who cares what's happening as long as my file opens when I double-click it!

Unfortunately, we have to care when trying to read a file programmatically. Opening a file is a discrete operation from reading it.

Again, here's the code, in a slightly more verbose fashion:

>>> myfilename = "example.txt"
>>> myfile = open(myfilename)
>>> mystuff = myfile.read()
>>> # do something to mystuff, like print it, or whatever
>>> myfile.close()

The file object also has a close() method, which formally cleans up after the opened file and allows other programs to safely access it. Again, that's a low-level detail that you never think of in day-to-day computing. In fact, it's something you probably will forget in the programming context, as not closing the file won't automatically break anything (not until we start doing much more complicated types of file operations, at least…). Typically, as soon as a script finishes, any unclosed files will automatically be closed.

However, I like closing the file explicitly – not just to be on the safe side – but it helps to reinforce the concept of that file object.

How to read from a file – line-by-line

One of the advantages of getting down into the lower-level details of opening and reading from files is that we now have the ability to read files line-by-line, rather than one giant chunk. Again, to read files as one giant chunk of content, use the read() method:

>>> myfile = open("example.txt")
>>> mystuff = myfile.read()

It doesn't seem like such a big deal now, but that's because example.txt probably contains just a few lines. But when we deal with files that are massive – like all 3.3 million records of everyone who has donated more than $200 to a single U.S. presidential campaign committee in 2012 or everyone who has ever visited the White House – opening and reading the file all at once is noticeably slower. And it may even crash your computer.

If you've wondered why spreadsheet software, such as Excel, has a limit of rows (roughly 1,000,000), it's because most users do want to operate on a data file, all at once. However, many interesting data files are just too big for that. We'll run into those scenarios later in the quarter.

For now, here's what reading line-by-line typically looks like:

myfile = open("example.txt")
for line in myfile:
    print(line)
myfile.close()

Because each line in a textfile has a newline character (which is represented as \n but is typically "invisible"), invoking the print() function will create double-spaced output, because print() adds a newline to what it outputs (i.e. think back to your original print("hello world") program).

To get rid of that effect, call the strip() method, which belongs to str objects and removes whitespace characters from the left and right side of a text string:

myfile = open("example.txt")
for line in myfile:
    print(line.strip())
myfile.close()

And of course, you can make things loud with the good ol' upper() function:

myfile = open("example.txt")
for line in myfile:
    print(line.strip())
myfile.close()

That's it for now. We haven't covered how to write to a file (which is a far more dangerous operation) – I save that for a separate lesson. But it's enough to know that when dealing with files as a programmer, we have to be much more explicit and specific in the steps.

References and Related Readings

Opening files and writing to files
How to open files and write to files and avoid catastrophic mistakes when writing to files.
Chapter 11. Files
Python Input and Output Tutorial
There are several ways to present the output of a program; data can be printed in a human-readable form, or written to a file for future use. This chapter will discuss some of the possibilities.