A list is a Python object that represents am ordered sequence of other objects. If loops allow us to magnify the effect of our code a million times over, then lists are the containers we use to easily store the increased bounty from our programs, and to pass large quantities of data into other programs.
Lists are one of the most important and ubiquitous data structures that we’ll learn about and use. This lesson is a little on the verbose side because we cover not just the basics about lists, but we take the time to briefly consider what it means for an object to be “mutable”, and also, what it means for a method to perform an “in-place” operation on an object.
It’s recommend that you read this guide before reading about tuples and dictionaries, which are very similar to lists but are important enough to have their own guides.
Square brackets are used to denote a list object. Here's an empty list:
mylist = []
And here's a list with just a single member:
mylist = ['hello']
Lists can contain multiple objects. When the members of a list are written out in code, commas are used to separate each member.
Here's a very simple list of 2 string objects:
mylist = ["hello", "world"]
Lists can contain any other kind of object, including other lists. You can ignore the usual Python indentation rules when declaring a long list; note that in the list below, the list inside the main list counts as a single member:
>>> mylist = ["hello", "world", 1, 2, 9999, "pizza",
... 42, ["this", "is", "a", "sub list"], -100]
>>> len(mylist)
9
Given a list containing a bunch of objects, how do we actually access those objects to use in our programs?
Lists are commonly used with loops, particularly for-loops. When a list is passed into the for-loop, it yields each of its members in sequential order:
>>> mylist = ['a', 'b', 'c', 42]
>>> for x in mylist:
... print("Hello", x)
Hello a
Hello b
Hello c
Hello 42
The members of a list are indexed with integers. Lists are zero-indexed, which means that the first member of a list is available at the index value of 0
.
To access a list's members, use square brackets immediately following the list, with the desired index inside the brackets:
>>> mylist = [1, 2, 3]
>>> mylist[0]
1
>>> mylist[2]
3
>>> [4, 5, 6][1]
5
We can access members of a list starting from its endpoint by providing a negative value for the index. The value of -1
will access the final member of a list, -2
will get the 2nd to last member, and so forth.
A list is just "big" enough to contain all of its existing members. The interpreter will raise an error if you try to access an index value bigger than the length of the list:
(Actually, if the index is equal to the length of a list, we'll get an error. Hello infamous off-by-one error!)
>>> mylist = ["hi"]
>>> print(mylist[0])
hi
>>> print(mylist[1])
IndexError: list index out of range
We can exceed the bounds of a list with negative-number indexing as well:
>>> mylist = ["hi"]
>>> print(mylist[-1])
hi
>>> print(mylist[-2])
IndexError: list index out of range
Lists can contain objects of any type, including other lists:
>>> mylist = [1, 2, ['hello', 'world']]
To access that nested list itself – which is the last element of mylist
– we use the same index-notation as we would for any other kind of object:
>>> mylist[-1]
['hello', 'world']
>>> type(mylist[-1])
list
As mylist[-1]
is itself a list, we can continue to use the square-bracket index notation to access its last member, which is the string object, "world"
:
>>> type(mylist[-1][-1])
str
>>> mylist[-1][-1]
'world'
This notation can go as deep as you got your nests:
>>> mylist = ['a', ['b', ['c', ['d', ['e', 'fin!']]]]]
>>> mylist[1][1][1][1][1]
'fin!'
…But obviously, try not to create lists that look like the above, contrived example.
Sometimes we only need part of a list, such as the first n members. The syntax for this uses the same square-bracket notation as for individual indexing. But instead of passing a single value, we pass in 2 values – the starting and end point, respectively – separated by a colon:
>>> mylist = [0, 1, 2, 3, 4, 5]
>>> x = mylist[0:3]
>>> type(x)
list
>>> len(x)
3
>>> print(x)
[0, 1, 2]
Note that the "slice" goes up to the number that denotes the endpoint – i.e. 3
– but does not include it. This is consistent with how lists are zero-indexed, but will be another way that the "off-by-one" error will bite you.
The Python documentation's informal tutorial covers other variations of the slice notation, which can be used for any kind of sequence, including strings.
A common real-world scenario for only needing part of a list is when we we use a file object's readlines()
method, which returns all the lines of a text file as a list. Sometimes, data providers will put metadata or blank lines at the top of the file, e.g.
Data from the California Department of Education
Last Updated: 2015-0101
id,school,city,enrollment
1001,Kennedy High School,Springfield,401
1002,Washington Elementary School,Los Angeles,2226
...etc
To skip the first 3 lines and iterate through the rest, we start at the index value of 3
(which corresponds to the 4th line, thanks to, again, Python's zero-indexed arrays). The endpoint index can be left blank, or set to -1
:
for line in myfile.readlines()[3:-1]:
do_something(line)
If you've read up to this point, then you might have noticed that I haven't discussed any of the methods belonging to the list object. Nor have I covered basic but obviously important concepts, such as how to add a new member to a list. Or remove members from a list. Or change the members of a list.
I cover those operations in the next section. However, it might be worth taking a brief segue to read the guide: The Immutable Tuple
First of all, I think it's worth looking at the tuple object, which is similar to a list and frequently used in Python programs. If you understand lists, you understand tuples. The main difference is that tuples are immutable. After seeing examples of immutable objects, you might appreciate better what it means when we say that a list is mutable, and that its in-place methods "mutate" its contents.
Lists have a variety of methods, many of them that I don't believe we need to actually memorize at this point. Instead, I'll focus on a subset of the most useful and frequently used list methods, as well as the ones that are "safest" – i.e. least confusing to use.
This section covers the in-place methods: that is, the methods that can alter the contents of the list, such as adding members or altering their values.
In the lesson on The Immutable Tuple, I elaborate a bit more on non-in-place methods. But in case you don't want to jump to that lesson, I'll review an example here of non-in-place methods involving string objects.
Non-in-place methods don't alter their object; instead, they return a copy of that object:
>>> s = "hey"
>>> t = s.upper()
>>> t
'HEY' # a totally different string object
>>> s
'hey' # the same object as before
So, given a list, how do we add more members to it?
Before we get to the in-place way of adding a new member to a list, I'll cover a non-in-place way here:
We can use the plus sign operator to concatenate two lists, in the same way we can use it to concatenate two string objects. The result of the concatenation, for both the list and string scenarios, is an entirely new object:
>>> x = [1, 2, 3]
>>> y = [4, 5]
>>> z = x + y
>>> len(z)
5 # z definitely points to a list bigger than x and y
>>> len(x)
3 # x seems to have the same size as before...
>>> len(y)
2 # ...as does z
Can you predict what the variables x
, y
, and z
contain?
>>> print(x)
[1, 2, 3] # x seems the same...
>>> print(y)
[4, 5] # y seems the same...
>>> print(z)
[1, 2, 3, 4, 5]
OK, let's finally add to a list using an in-place method.
To add a new member to a listn – and change the list object itself – we use the list's append()
method:
>>> x = [1, 2, 3]
>>> x.append('a')
>>> print(x)
[1, 2, 3, 'a']
See how the object that the x
variable refers to has changed after the append()
call?
So what's the big deal? Nothing. As long as we don't care whether the contents of the list that x
points to stays the same or not, then we don't have to worry about in-place vs non-in-place. But it's important to at least be aware that there is a difference…
Suppose we want to add the contents of one list to another? Depends what you mean by "add" – once again, showing how human language isn't precise enough for computing – because append()
might be exactly what we want:
>>> x = [1, 2, 3]
>>> x.append([4, 5, 6])
Remember that a list's members may be any kind of Python object, including other lists. So passing a list as an argument into the append()
call of another list results in a single member being added to the calling list:
>>> len(x)
4
>>> print(x)
[1, 2, 3, [4, 5, 6]]
If we were expecting something different, e.g.
>>> len(x)
6
>>> print(x)
[1, 2, 3, 4, 5, 6]
– it's not because append()
has a bug. It's because we failed to express our expectations more specifically. We don't want to merely put one list inside another. We want to append each individual member of one list to another, so that the resulting list is "flat" instead of "nested".
We want the extend()
method, which is also an in-place method:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> x.extend(y)
>>> print(x)
[1, 2, 3, 4, 5, 6]
(Note that y
list itself does not get mutated; it's just the list that calls extend()
that is mutated, i.e. the x
list)
The extend()
method can be used to append individual methods of other sequences, such as tuples and strings (the latter of which are sequences of individual characters):
>>> x = [1, 2, 3]
>>> mytuple = (4, 5)
>>> x.extend(mytuple)
>>> len(x)
5
>>> print(x)
[1, 2, 3, 4, 5]
>>> x.extend("hello")
>>> len(x)
10
>>> print(x)
[1, 2, 3, 4, 5, 'h', 'e', 'l', 'l', 'o']
However, if pass a non-iterable object – i.e. something that isn't a sequence of objects – into extend()
, such as a number, the result will be an error:
>>> x = [1, 2, 3]
>>> x.extend(4)
TypeError: 'int' object is not iterable
This section describes how to modify the values of a list in-place.
To change the value that exists at a given list's index, we can simply reassign it as we would a variable:
>>> x = [0, 1, 2]
>>> x[-1] = "happy happy joy joy"
>>> print(x)
[0, 1, 'happy happy joy joy']
Note that we can't change the value for an index that doesn't exist for the list, i.e. one that is out of bounds of for the list's size:
>>> x = []
>>> x[0] = 'first'
IndexError: list assignment index out of range
This section describes how to remove items from a list using in-place methods. I don't think there really is such a thing as removing elements from a list non-in-place, i.e. in such a way that the list isn't changed. I'm not sure what the point would be.
But one quick clarification: slicing a list is not an in-place operation:
>>> list_x = [1, 2, 3, 4, 5]
>>> list_y = list_x[2:5]
>>> print(list_y)
[3, 4, 5]
>>> print(list_x)
[1, 2, 3, 4, 5]
Instead, the "slice" – i.e. list_y
is actually a new list object. The "sliced" list, list_x
, is unchanged.
To actually remove an item from a list, use its pop()
method. When calling pop()
without any arguments, the last member of the list is removed from the list and returned:
>>> mylist = [1, 2, 3]
>>> q = mylist.pop()
>>> print(mylist)
[1, 2]
>>> print(q)
3
If we attempt to call pop()
on an empty list, it will cause an error:
>>> mylist = [1, 2, 3]
>>> mylist.pop()
3
>>> mylist.pop()
2
>>> mylist.pop()
1
>>> mylist.pop()
IndexError: pop from empty list
To remove items from the front of a list, we use the pop()
method but pass the value 0
as an argument:
>>> mylist = [1, 2, 3]
>>> mylist.pop(0)
1
>>> mylist.pop(0)
2
>>> mylist.pop(0)
3
There are more in-place methods for lists, including:
reverse()
- reverse the order of a list's membersremove(val)
- delete the first member in the list that matches val
sort()
- rearrange the order of a list based on a sorting algorithminsert(x, val)
- insert a new member, val
, before the index at x
clear()
- remove all members from the listIt's not that these methods aren't useful, it's just that their in-place effects can unintentionally lead to hard-to-find bugs.
For example, pretend we have 4-line file that looks like this:
Oh Romeo
Wherefore are thou
Pizza?
COPYRIGHT SHAKESPEERE
Pretend we have a program in which we read the file's contents, via readlines()
, so that the variable textlist
points to a list object containing those text lines. That list and its members looks like this:
py
['Oh Romeo \n', 'Wherefore are thou\n', 'Pizza?\n', 'COPYRIGHT SHAKESPEERE']
Pretend the program is required to print every line of the poem except for the copyright notice. Pretend that the program is also required, at the end, to print the total number of lines in the file, including the copyright notice.
So how to print all the text lines except for the unneeded meta-text? One approach is to use remove()
to just delete that pesky copyright notice:
textlist.remove('COPYRIGHT SHAKESPEERE')
for line in textlist:
print(line)
Seems straightforward, right? But what happens when it comes time to print the total number of lines to screen?
print("Total lines:", len(textlist))
The result of that len()
function will not be 4 – it will be 3, because textlist
no longer contains all the lines of the text file, due to the in-place effects of the remove()
function. Sure, this may seem like an easy bug to track in this ridiculously contrived simple example. But it won't be in any non-trivial program.
So for these lessons and exercises, I try to compel you to work with lists and other objects in non-mutating ways:
IGNORE_THIS_LINE = 'COPYRIGHT SHAKESPEERE'
for line in textlist:
txt = line.strip()
if txt != IGNORE_THIS_LINE:
print(txt)
print("Total lines:", len(textlist))
This "avoid-all-in-place-side-effects" sometimes results in a few extra lines of code, and more planning beforehand. But trust me; those extra minutes are much, much less than the time it takes to pore through a program, trying to figure out which function irrevocably altered an object.
In a later lesson, I'll cover the sorted()
method, which is a non-in-place way of sorting a sequence. Sorting is such a common – and complicated – taks that it deserves its own tutorial.
This section covers list methods that don't have any effect on the list object itself.
As we've seen in previous examples, the len()
function returns the number of members in a list:
>>> mylist = [1, 2, 3]
>>> len(mylist)
3
The count()
method (which is common to all sequences, including string objects) takes in a single argument and returns the number of values that are equal to the argument:
>>> a_list = ['she', 'sells', 'seashells']
>>> a_list.count('she')
1
>>> b_list = [42, "42", 42, 4242]
>>> b_list.count(42)
2
Sometimes we don't know exactly where a value is inside a given list. The list's index()
method takes in a single argument and returns the index value of the first matching value inside the list. However, if no value inside the list matches the argument, then an error is raised:
>>> mylist = ['a', 'b', 'b', 'a']
>>> mylist.index('b')
1
>>> mylist.index('c')
ValueError: 'c' is not in list
in
and not in
keywordsSo the problem of using the list's index()
method is that if you don't know whether a list contains an exact value, you have to deal with the ValueError
stopping your program.
The in
keyword, which we've used when testing for substrings in bigger strings, also works for testing whether a value exists inside a collection, such as a list.
>>> mylist = ['abba', 'dabba', 'doo']
>>> 'abba' in mylist
True
Note that this tests for whether value is exactly equal to an object in the list – if the value we search for is merely just a substring of a string inside the list, i.e. a partial match, the test will return False
:
>>> 'dab' in mylist
False
We can invert the test by using not in
:
>>> 'dab' not in mylist
True
Just as other objects have constructor functions – e.g. str()
and int()
for the string and integer objects, respectively – the list object has list()
.
Calling it without any arguments will create an empty list:
>>> mylist = list()
>>> len(mylist)
0
>>> print(mylist)
[]
Passing in an iterable object as an argument – i.e. any type of sequence or collection – will create a new list containing the members of the passed-in object:
>>> mystr = "hello"
>>> mylist = list("hello")
>>> print(mylist)
['h', 'e', 'l', 'l', 'o']
The list()
constructor is very handy for converting other collections into nice and simple lists. For example, do you have a tuple object that you want a copy of, but a copy that you can actually alter? Passing it into list()
creates the same sequence of elements, except as a list object:
>>> mytuple = (1, 2)
>>> mylist = list(mytuple)
>>> print(mylist)
[1, 2]
>>> mylist.append(3)
>>> print(mylist)
[1, 2, 3]
When we instantiate a range object with the range()
constructor, it acts similar to a list, in that we can access individual members and slice it. But a range isn't exactly like a list of numbers:
>>> r = range(5)
>>> r[-1]
4
>>> print(r)
range(0, 5)
>>> r.pop()
AttributeError: 'range' object has no attribute 'pop'
That can be fixed by converting the range object into a list with list()
:
>>> r = range(5)
>>> rx = list(r)
>>> type(rx)
list
>>> print(rx)
[0, 1, 2, 3, 4]
Lists are a topic in which I could write endlessly about – we haven't even looked at how lists are used when dealing with real-world data, for example. However, I think this is enough of a primer for now. I may update this guide with more examples and elaboration. But once you get the gists of lists, they feel very natural to use in your programs.
And if you can understand how to iterate (i.e. loop) through a list, you've pretty much understood how to iterate through all the kinds of data structures that we will work with.
The other equally ubiquitous Python data object is the dictionary, i.e. the dict
object. Dictionaries are unordered collections of objects which are indexed with "keys" that, unlike lists, are not restricted to integers.
Dictionaries are important enough to deserve their own guide. But understanding lists is important to understanding dictionaries, and why we use both when dealing with real-world data.