The Python list stores a collection of objects in an ordered sequence. In contrast, the dictionary stores objects in an unordered collection. However, dictionaries allow a program to access any member of the collection using a key – which can be a human-readable string.
Both the dictionary and list are ubiquitous for representing real-world data.
Dictionaries – like lists – are collections of objects. Unlike lists, dictionaries are unordered collections. They are not indexed by sequential numbers, but by keys:
>>> mydict = {"apples": 42, "oranges": 999}
>>> mydict['oranges']
999
A dictionary – which has a type name of dict
– is denoted by curly braces: { }
.
An empty dictionary can be initialized by either using the dict()
constructor function, or simply with a pair of curly braces:
newdict = {}
Commas are used to separate the members in a dictionary. But each member consists of a key-value pair. A colon is used as a delimiter between a key and its corresponding value:
{"a": 99, "hello": "world"}
Accessing a dictionary's values by index (i.e. its keys) uses the same square bracket notation as other sequence-type objects:
>>> mydict = {"z": 92, "world": "hello"}
>>> print(mydict['world'])
hello
>>> print(mydict["z"])
The keys of a dictionary can be any kind of immutable type, which includes: strings, numbers, and tuples:
mydict = {"hello": "world",
0: "a",
1: "b",
"2": "not a number"
(1, 2, 3): "a tuple!"}
However, for the most part, we'll find ourselves using strings as keys when manually creating dictionary objects and converting real-world data into dictionary objects.
Just like the list object can contain a sequence of any kind of Python object, the values stored in a dictionary – and with which we use keys to access – can be any kind of Python object, such as lists or other dictionaries:
>>> mydict = {"message": {"hello": 123456}}
>>> print(mydict['message'])
{'hello': 123456}
>>> print(mydict['message']['hello'])
123456
In fact, get very acclimated to the concept of dictionaries within other dictionaries. If you use Instagram's API to fetch data about Snoop Dogg's account, the API will return a text file (in JSON format) that can be turned into a Python dictionary:
{
"data": {
"id": "1574083",
"username": "snoopdogg",
"full_name": "Snoop Dogg",
"profile_picture": "http://distillery.s3.amazonaws.com/profiles/profile_1574083_75sq_1295469061.jpg",
"bio": "This is my bio",
"website": "http://snoopdogg.com",
"counts": {
"media": 1320,
"follows": 420,
"followed_by": 3410
}
}
Note how the "outer" dictionary contains a single key, "data"
, which points to a dictionary object with key-value pairs that correspond to information about Snoop Dogg's account. And that dictionary itself contains another dictionary via the key, "counts"
.
Note: Unlike lists, iterating through a dictionary will not happen in a predictable order. I go more into detail in a later section.
If we pass a dict
object into a for-loop, by default, only the key will be yielded:
>>> mydict = {'a': 'hello', 'b': 'world'}
>>> for x in mydict:
... print(x)
b
a
Sometimes we only need the keys. However, having the keys also let's us access each value by reference, including the ability to change the values that are referred to:
>>> mydict = {'a': 'hello', 'b': 'world'}
>>> for x in mydict:
... val = mydict[x]
... mydict[x] = val.upper()
... print('Changed what', x, 'points to: from', val, 'to', mydict[x])
Changed what b points to: from world to WORLD
Changed what a points to: from hello to HELLO
If you intend to just iterate through a dictionary's key, I recommend explicitly making that clear by calling the dictionary's keys()
method:
>>> for k in mydict.keys():
... print(k)
b
a
The keys()
method returns a dict_keys
object…which I don't really use directly. Instead, I'll convert it to a list object with the list()
constructor:
>>> mykeys = mydict.keys()
>>> type(mykeys)
dict_keys
>>> mylist = list(mykeys)
>>> type(mylist)
list
If we want to iterate only through a dictionary's values, or to get a list of its values, we can call its values()
method:
>>> mydict = {'a': 'hello', 'b': 'world'}
>>> for v in mydict.values():
... print(v)
world
hello
>>> myvals = mydict.values()
>>> type(myvals)
dict_values
>>> mylist = list(myvals)
>>> type(mylist)
list
Note that if we iterate through a dictionary's values with a loop, then inside the loop, we have no way to directly access the dictionary's keys or to be able to change what those keys point to. Sometimes, that's just fine – as loops get complicated, it's good to know exactly what kind of access and functionality the loops have.
Oftentimes, we'd like to have access to both the key and the value for every key-value pair in a dictionary. We could use the keys()
method and then derive each key's value inside the loop:
>>> mydict = {'a': 'hello', 'b': 'world'}
>>> for key in mydict.keys():
>>> val = mydict[key]
>>> print("Key", key, 'points to', val)
Key a points to hello
Key b points to world
Or, we could just use the items()
method. First, I'll show what items()
returns, and then convert it to a list with list()
>>> myitems = mydict.items()
>>> type(myitems)
dict_items
>>> mylist = list(myitems)
>>> type(mylist)
list
Note that the actual contents are the same, whether they're inside a dict_items
collection or a list
– it's just that a list object allows us to access each item individually by numerical index, among other functionality:
>>> print(mylist)
[('a', 'hello'), ('b', 'world')]
>>> print(mylist[0])
('a', 'hello')
>>> print(mylist[1])
('b', 'world')
Basically, items()
returns a sequence of tuples.
So when we iterate through mylist.items()
with a for-loop, we can take advantage of a feature referred to as tuple unpacking. Notice the change in the for-loop statement:
mydict = {'a': 'hello', 'b': 'world'}
for key, val in mydict.items():
print("Key", key, 'points to', val)
We can learn the details of tuple unpacking some other time. For now, we can think of it as just a really convenient way to assign more than one variable in a single line. We could have also done this:
for x in mydict.items():
key = x[0]
val = x[1]
print("Key", key, 'points to', val)
But…why be verbose when we don't need to be?
Compared to the list, the dictionary object doesn't really have a ton of methods or attributes:
mydict.clear mydict.get mydict.pop mydict.update
mydict.copy mydict.items mydict.popitem mydict.values
mydict.fromkeys mydict.keys mydict.setdefault
We've already covered values()
, keys()
, and items()
. Of the remaining few, I personally just use get()
and update()
– and occasionally, setdefault()
Similar to trying to access a list by too big of an index value, accessing a non-existent key of a dictionary will raise a KeyError
:
>>> mydict = {'z': 999}
>>> x = mydict['a']
KeyError: 'a'
The get(k)
method provides a safe way to test for a key, k
. If the dictionary has key k
, the get(k)
method will return the value. If not, then a NoneType
object is returned:
>>> mydict = {'z': 999}
>>> mydict.get('z')
999
>>> mydict.get('oooga boooga!!!')
>>> type(mydict.get('heysadf'))
NoneType
This is especially useful when looping through a list of dictionaries, in which not all the dictionaries have all of the same keys:
names = []
names.append({'first': 'Dan', 'last': 'Nguyen', 'suffix': 'III'})
names.append({'first': 'Jane'})
for name in names:
x = name.get('first')
y = name.get('last')
z = name.get('suffix')
print(x, y, z)
The output:
Dan Nguyen III
Jane None None
OK, that's not great, but at least the program didn't crash. We can modify the for-loop with some conditional branches to avoid those ugly None
values, which is what Python will print to screen for NoneType
objects:
for name in names:
x = name.get('first')
y = name.get('last')
z = name.get('suffix')
if not x:
x = ""
if not y:
y = "Doe"
if not z:
z = ""
print(x, y, z)
The output:
Dan Nguyen III
Jane Doe
setdefault()
to provide a value instead of NoneType valueSo those conditional branches worked. But let's be honest: nobody likes writing that kind of ugly forest of conditional-branches, especially for such a . There are ways to mitigate this, including using the dict
object's setdefault()
method, which allows us to specify a fallback value, other than NoneType
, for missing keys:
for name in names:
x = name.setdefault('first', "")
y = name.setdefault('last', "Doe")
z = name.setdefault('suffix', "")
print(x, y, z)
That's much nicer. The output is the same as before:
Dan Nguyen III
Jane Doe
The update(newdict)
method takes a dictionary, newdict
, as an argument (it can take other sequences too, but let's keep it simple for now), and does an in-place update of the calling dictionary.
For keys in newdict
that also exist in the calling dictionary, the corresponding values in the calling dictionary are replaced. For keys that aren't already in the calling dictionary, new key-value pairs are added:
>>> a = {'first': 'Dan', 'last': 'Nguyen'}
>>> b = {'last': 'Smith', 'suffix': 'Jr.'}
>>> a.update(b)
>>> print(a)
{'first': 'Dan', 'last': 'Smith', 'suffix': 'Jr.'}
In a list, members are added into memory sequentially:
>>> mylist = []
>>> mylist.append('a')
>>> mylist.append('b')
>>> mylist.append('c')
Python only allows us to set (i.e. change) the value at an existing index:
>>> mylist[0] = 'A'
However, it will throw an error if we try to set a value to an index that the list has not yet reached:
>>> mylist[999] = 'Hey'
IndexError: list assignment index out of range
In contrast, because dictionaries are unordered collections of objects, we're allowed to set values with any key we like:
>>> mydict = {}
>>> mydict[99999999] = "hello"
A list is considered ordered because its members are arranged in the same order that they were inserted into the list. Every time we iterate through a list, sequentially, we can assume that its members will always be accessible in the same order that they were inserted.
>>> mylist = []
>>> mylist.append(0)
>>> mylist.append(1)
>>> mylist.append(2)
>>> for n in mylist:
... print(n)
0
1
2
Among other operations, this allows us to slice the list in sequential chunks:
>>> mylist = [0, 1, 2, 3, 4]
>>> mylist[2:4]
[2, 3]
In contrast, the members of dictionary are not stored in any particular order. No matter what order you add key-value pairs into a dictionary, we have no idea what order they'll come out as when we iterate through the dictionary:
>>> mydict = {}
>>> mydict['a'] = 0
>>> mydict['b'] = 1
>>> mydict['c'] = 2
>>> for k in mydict:
... print(k)
b
c
a
This is not typically considered to be a huge drawback. For most use cases of a dictionary, we don't really care what order the key-value pairs are stored as.
However, for the times that we demand order, Python has a collections module with allows us to create an OrderedDict. I cover that in a later section.
Anything that can be represented in a list can be represented as a dictionary, and vice versa. Why pick one over the other? It depends on what the programmer (or data provider) feels makes the most sense for representing data.
For example, here's a list of the components of David Bowie's birth name, and how we would access each part of his name:
>>> bowielist = ['David', 'Robert', 'Jones']
>>> print(bowielist[0])
David
>>> print(bowielist[2])
Jones
This is how we could represent it as a dictionary:
>>> bowiedict = {
... 'first': 'David',
... 'middle': 'Robert',
... 'last': 'Jones'
... }
>>> print(bowiedict['first'])
David
>>> print(bowiedict['last'])
Jones
The dictionary seems so much more verbose, doesn't it? However, the dictionary's verbosity pays off for humans: it's easier to remember that "last"
points to the last name component, rather than the index value of 2
, as in the case of the list.
But the list implementation, as straightforward as it seems, is limited by its own simplicity, and is far more difficult to adapt to more complicated, real-world data.
For example, what if we wanted to add titles and suffixes to a name?
>>> lelandlist = ['Mr.', 'Leland', 'Dewitt', 'Stanford', 'Jr.']
Now, anyone expecting that the 0
index points to the first name is going to be surprised/confused.
The dictionary implementation, however, can handle additions to the data definition with ease:
>>> lelandlist = {
... "first": "Leland",
... "middle": "Dewitt",
... "last": "Stanford",
... "suffix": "Jr.",
... "title": "Mr."
... }
Anyone who access the first
key can expect it to return the first name, the addition of the new keys – suffix
and title
– don't change how we work with the data.
The dictionary – with the tradeoff for verbosity – allows us to more intuitively represent things from the real-world. What if we wanted the data object to represent a person's identity beyond their name? Modifying the object that lelanddict
points to, we can use a nested structure:
lelanddict = {
"name": {
"first": "Leland",
"middle": "Dewitt",
"last": "Stanford",
"suffix": "Jr.",
"title": "Mr."
}
"birth": {
"date": 1868-05-14,
"place": {
"city": "Sacramento",
"state": "California",
"country": "United States"
},
"death": {
"date": "1884-03-13",
"place": {
"city": "Florence",
"country": "Italy"
}
}
}
}
Yeah, that looks pretty complicated. But life is complicated.
(Also, names are extremely complicated, which underscores why using a simple list isn't enough to represent the components of someone's name)
Here are examples of how various real-world concepts and objects are modeled as dictionaries.
I'll devote another guide to this, but dictionaries (and lists) can be represented as text files via the JSON format. For now, pretend we've done this serialization step and that each of the examples has been set to the variable, datathing
:
Github has a Status API, of which it provides machine-readable messages, e.g. "is Github.com functioning?".
Here's what the live status looks like, as a JSON text file:
{
"status": "good",
"last_updated": "2012-12-07T18:11:55Z"
}
Looks like a dictionary, right? Pretend that it's been assigned to the variable datathing
. This is how we might use the object:
datathing = {
"status": "good",
"last_updated": "2012-12-07T18:11:55Z"
}
if datathing['status'] == 'good':
print("Things are good")
else:
print("WTF!?!??!")
Similar to the single status message, except that this returns a list of messages, i.e. a list of dictionaries:
datathing = [{
"status": "good",
"body": "Everything operating normally.",
"created_on": "2016-01-21T20:28:31Z"
}, {
"status": "minor",
"body": "We're investigating issues serving GitHub pages.",
"created_on": "2016-01-21T20:25:03Z"
}, {
"status": "good",
"body": "Everything operating normally.",
"created_on": "2016-01-20T22:56:57Z"
}, {
"status": "minor",
"body": "We're investigating issues affecting a small number of repositories.",
"created_on": "2016-01-20T22:47:07Z"
}]
To print out the date, body, and status level of each message:
for d in datathing: # remember that datathing is a list
print(d['created_on'], '--',
d['status'] + ':')
print(d['body'])
print("")
The output:
2016-01-21T20:28:31Z -- good:
Everything operating normally.
2016-01-21T20:25:03Z -- minor:
We're investigating issues serving GitHub pages.
2016-01-20T22:56:57Z -- good:
Everything operating normally.
2016-01-20T22:47:07Z -- minor:
We're investigating issues affecting a small number of repositories.
The Instagram API allows us to look up an [individual user]((https://www.instagram.com/developer/endpoints/users/#get_users):
datathing = {
"data": {
"id": "1574083",
"username": "snoopdogg",
"full_name": "Snoop Dogg",
"profile_picture": "http://distillery.s3.amazonaws.com/profiles/profile_1574083_75sq_1295469061.jpg",
"bio": "This is my bio",
"website": "http://snoopdogg.com",
"counts": {
"media": 1320,
"follows": 420,
"followed_by": 3410
}
}
}
Here's how to access various values inside that nested dictionary:
>>> user = datathing['data']
>>> print("User name:", user['full_name'])
Snoop Dogg
>>> print("Bio:", user['bio'])
This is my bio
>>> counts = user['counts']
>>> print("Number of posts:", counts['media'])
1320
>>> print("Number of users followed:", counts['follows'])
420
Here's an excerpt of how Spotify's API (via the get-artist endpoint) represents Beyoncé (see the JSON file here):
beyonce = {
"name" : "Beyoncé",
"popularity" : 86,
"type" : "artist",
"uri" : "spotify:artist:6vWDO969PvNqNYHIOW5v0m",
"external_urls" : {
"spotify" : "https://open.spotify.com/artist/6vWDO969PvNqNYHIOW5v0m"
},
"followers" : {
"href" : None,
"total" : 3841151
},
"genres" : [ "dance pop", "pop", "r&b", "urban contemporary" ],
"href" : "https://api.spotify.com/v1/artists/6vWDO969PvNqNYHIOW5v0m",
"id" : "6vWDO969PvNqNYHIOW5v0m",
"images" : [ {
"height" : 1000,
"url" : "https://i.scdn.co/image/a370c003642050eeaec0bc604409aa585ca92297",
"width" : 1000
}, {
"height" : 640,
"url" : "https://i.scdn.co/image/79e91d3cd4a7c15e0c219f4e6c941d282fe87a3d",
"width" : 640
} ]
}
Note how the external_urls
key points to a dictionary, which itself contains a single key named spotify
that points to Beyoncé's page on the Spotify website. The genres
key points to a list of string objects, as Beyoncé's oeuvre can't be contained in a single genre. The images
key points to a list of dictionaries, as Spotify serves up multiple sizes of an artist's image, and each image has multiple properties, e.g. height
, url
, and width
.
The followers
key also points to a dictionary. To get Beyoncé's number of Spotify followers – which is associated with the total
key, this is how we would access the nested value:
(assume the variable, beyonce
, points to the dictionary above)
>>> print(beyonce['followers']['total'])
3841151
To print the attributes of each version of the images associated with Beyoncé, we can use a nested for-loop:
>>> for imgdict in beyonce['images']:
... print('-----')
... for key, val in imgdict.items():
... print(key + ':', val)
-----
height: 1000
url: https://i.scdn.co/image/a370c003642050eeaec0bc604409aa585ca92297
width: 1000
-----
height: 640
url: https://i.scdn.co/image/79e91d3cd4a7c15e0c219f4e6c941d282fe87a3d
width: 640
<a id-"mark-ordered-dict"></a>
These can be found in [Python's collections
module](https://docs.python.org/3/library/collections.html. I'll write about them later (TK)
TK
TK