Deserializing the JSON text from the Google Maps and Mapzen geocoding APIs

Before programmatically using an API, we need to study its response.
This assignment is due on Tuesday, February 2
7 exercises
3.5 possible points
Create a subfolder named 0010-map-json-responses inside your compciv-2016/exercises folder.

Summary

Even though this uses real JSON-formatted data from the Google Maps and Mapzen geocoders, this exercise is meant to test our understanding of Python list and dictionary objects. Instead of contacting the live API servers, we practice on previously fetched responses, which have been saved as JSON-formatted text files.

We don’t even really practice understanding what the JSON format really means. It’s enough to know of the methods used to deserialize a text file and return, for usage in our programs, the appropriate type of Python data object, which is always either a dict or list.

Table of contents

The Checklist

In your compciv-2016 Git repository create a subfolder and name it:

     exercises/0010-map-json-responses

The folder structure will look like this (not including any subfolders such as `tempdata/`:

        compciv-2016
        └── exercises
            └── 0010-map-json-responses
               ├── a.py
               ├── b.py
               ├── c.py
               ├── d.py
               ├── e.py
               ├── f.py
               ├── g.py
    
a.py 0.5 points Download and save the JSON files; print the count of lines and characters
b.py 0.5 points Deserialize the Google Maps geocoder's JSON file and read its status code
c.py 0.5 points Print the formatted address of every result returned in the Google Maps geocoder's JSON response
d.py 0.5 points Print the verbose address of every result from in the Google Maps geocoder JSON file
e.py 0.5 points Print the formatted address and longitude and latitude in the Google Maps geocoder JSON file
f.py 0.5 points Print the parameters used to query the Mapzen Search API and the type of data it responded with, as found in its JSON response,
g.py 0.5 points Print the location, confidence score, and coordinates for each result in the Mapzen Search JSON response

Background information

About the APIs

If you're interested in the documentation for what these JSON responses actually contain, you can read the docs online:

However, you don't need to know all the specifics of what the APIs return, or how to contact the APIs yourself, as I've contacted the APIs myself and saved their responses exactly as I received them.

Here is what I got in response for a query of simply, "Stanford"

I don't know how your web browser will actually render those files, so visit this webpage a more web-friendly rendering of these files.

About the JSON data format

If these JSON files look just like text files to you, that's right, that's all they are. In fact, they should remind you of Python objects you've seen before, particularly the dictionary and the list.

The fact that the text content of a JSON file looks almost identical to various Python objects is a coincidence that we can deal with later. For now, all you have to know is that if you have a string object that contains purportedly JSON-formatted text, this is how you deserialize (a fancy word for "convert") that text string into whatever Python object it looks like, e.g. a list or a dictionary (usually the latter):

import json
# etc...
# assuming mystring points to a text string
mydata = json.loads(mystring)

That's it: you import the json module, and then you pass a string object into the json.loads() method.

You don't even need to know (yet) what it means to "serialize" something.

You can read about the json module if you'd like. But for now, just be assured that the Python docs promise us that json.loads(mystring) converts (i.e. deserializes) whatever mystring points to into some other kind of Python object, either of type dict or of type list. Well, as long as mystring points to a JSON-formatted text string.

And what is that exactly? Sure, you could read the information at www.json.org. But if I've kept my promise, that the URLs provided for the exercise are JSON-formatted text files, then that's all you need to know to do these exercises.

Test this out in interactive Python if you don't believe me:

>>> import requests
>>> import json
>>> URL = 'http://www.compciv.org/files/datadumps/apis/googlemaps/geocode-stanford.json'
>>> txt = requests.get(URL).text
>>> type(txt)
str
# you can run: print(txt) to see what it actually looks like
>>> mydata = json.loads(txt)
>>> type(mydata)
dict
# And here's how to derive the answer to b.py
>>> print(dict['status'])
OK

However, while you don't have to know a lot of new specifics for this set of exercises, you will have to know all about for-loops, lists, and dictionaries

The Exercises

0010-map-json-responses/a.py » Download and save the JSON files; print the count of lines and characters

0010-map-json-responses/a.py
Download and save the JSON files; print the count of lines and characters
0.5 points

Same routine as in past exercises:

1. Create a tempdata directory
2. Download the two files at the given source URLs and save them at the specific corresponding paths in your working directory:

Here’s what your file tree should look like:

      compciv-2016
      └── exercises
          └── 0010-map-json-responses
              └── a.py
              └── tempdata
                  └── mapzen
                      └── stanford.json
                      googlemaps
                      └── stanford.json

You should not have to import the json module for this, as we are not converting the text into data. The len() is all you need.

The string object has a splitlines() function which should make it easier to convert a text string into a list of strings (just in case you wanted to, you know, count the number of lines with len())

Please take notice of where exactly the files are being saved to – i.e. in subdirectories of tempdata, not just in tempdata. And notice how the filenames are different than what they are from the website.

There’s no specific technical reason except that that’s the requirement for the exercise. Although the bigger picture is to prep you for the reality of a big data project, in which sometimes you have to name files whatever you feel like naming them, and it has nothing to do with the URL that they came from.

Expectations

When you run a.py from the command-line:

0010-map-json-responses $ python a.py
  • The program's output to screen should be:
    ---
    Downloading from: http://www.compciv.org/files/datadumps/apis/googlemaps/geocode-stanford.json
    Writing to: tempdata/googlemaps/stanford.json
    Wrote 59 lines and 1751 characters
    ---
    Downloading from: http://www.compciv.org/files/datadumps/apis/mapzen/search-stanford.json
    Writing to: tempdata/mapzen/stanford.json
    Wrote 273 lines and 6826 characters
    
  • The program creates this file path: tempdata/googlemaps/stanford.json
  • The program creates this file path: tempdata/mapzen/stanford.json
  • The program accesses this remote file: http://www.compciv.org/files/datadumps/apis/googlemaps/geocode-stanford.json
  • The program accesses this remote file: http://www.compciv.org/files/datadumps/apis/mapzen/search-stanford.json
Some takeaways from this exercise:
  • JSON text files, when opened and read, are just string objects. It’s not until we use the json.loads() method that anything special happens.

0010-map-json-responses/b.py » Deserialize the Google Maps geocoder's JSON file and read its status code

0010-map-json-responses/b.py
Deserialize the Google Maps geocoder's JSON file and read its status code
0.5 points

The Google Geocoding API, along with a set of results, returns metadata, including a top-level status object, so that the requesting program has an easy way to check if the API was able to fulfill the request.

For this exercise, simply print out the value that the status key points to.

This is a situation in which you can just look at the actual file and find the corresponding object/key-value pair. But please try to do this programatically.

Here’s some sample code to open and read the file, and then deserialize it into a Python dictionary:

import json
f = open(MYFILENAME, 'r')
txt = f.read()
f.close()

mydict = json.loads(txt)

Now you just have to print its status key.

Expectations

When you run b.py from the command-line:

0010-map-json-responses $ python b.py
  • The program's output to screen should be:
    OK

0010-map-json-responses/c.py » Print the formatted address of every result returned in the Google Maps geocoder's JSON response

0010-map-json-responses/c.py
Print the formatted address of every result returned in the Google Maps geocoder's JSON response
0.5 points

The response object has a results key, which is a list of result objects. For each object, print to screen the formatted_address value.

If you eyeball our specific Google Maps geocoder’s JSON response, you’ll notice that the response’s results list actually only contains one item. Even so, you should write your program as if there could be more than 1 result (or even none at all), because when you actually use an API, you won’t be taking the time to manually eyeball the dense JSON text it returns.

Expectations

When you run c.py from the command-line:

0010-map-json-responses $ python c.py
  • The program's output to screen should be:
    Stanford, CA, USA

0010-map-json-responses/d.py » Print the verbose address of every result from in the Google Maps geocoder JSON file

0010-map-json-responses/d.py
Print the verbose address of every result from in the Google Maps geocoder JSON file
0.5 points

For every result in the Google Maps geocoder JSON file, print the text composed of the long_name form of each of the result’s address_components, delimited by a semicolon and a space.

This is similar to the previous exercise, except that extracting the long_name part of each of the address_components is slightly maddeningly complicated. However, if you take some time to think over the details, you might notice how having each address component be its own dictionary, with not just a long_name key-value pair, might be very useful when trying to determine if a given result has certain geopolitical boundaries.

Expectations

When you run d.py from the command-line:

0010-map-json-responses $ python d.py
  • The program's output to screen should be:
    Stanford; Santa Clara County; California; United States
    

0010-map-json-responses/e.py » Print the formatted address and longitude and latitude in the Google Maps geocoder JSON file

0010-map-json-responses/e.py
Print the formatted address and longitude and latitude in the Google Maps geocoder JSON file
0.5 points

For each result in the Google Maps geocoder JSON file, print a semicolon-delimited list of:

  • the formatted_address value
  • the lng value in the location object within the geometry object
  • the lat value in the location object within the geometry object
Expectations

When you run e.py from the command-line:

0010-map-json-responses $ python e.py
  • The program's output to screen should be:
    Stanford, CA, USA;-122.1660756;37.42410599999999
    

0010-map-json-responses/f.py » Print the parameters used to query the Mapzen Search API and the type of data it responded with, as found in its JSON response,

0010-map-json-responses/f.py
Print the parameters used to query the Mapzen Search API and the type of data it responded with, as found in its JSON response,
0.5 points

Like the Google Maps Geocoder, the Mapzen Search API returns metadata along with a set of geocoded results.

Print out the type of result returned, according to the JSON file.

Then read the geocoding object, which has a query object, and from that query object, read the following key-value pairs, in this exact order:

Please print the key-value pairs for the following keys, in this order:

  • text
  • size
  • boundary.country
Expectations

When you run f.py from the command-line:

0010-map-json-responses $ python f.py
  • The program's output to screen should be:
    type: FeatureCollection
    text: Stanford
    size: 10
    boundary.country: USA
    

0010-map-json-responses/g.py » Print the location, confidence score, and coordinates for each result in the Mapzen Search JSON response

0010-map-json-responses/g.py
Print the location, confidence score, and coordinates for each result in the Mapzen Search JSON response
0.5 points

For each Feature-type object in the Mapzen Search JSON file, print out the following values in a semicolon delimited list:

  • label
  • confidence
  • longitude (as found in the Point object’s coordinates, within geometry)
  • latitude (as found in the Point object’s coordinates, within geometry)
  • The longitude is the first value inside each coordinates object.
  • The latitude is in the second value inside each coordinates object.
Expectations

When you run g.py from the command-line:

0010-map-json-responses $ python g.py
  • The program's output to screen should be:
    Stanford, Santa Clara County, CA;0.949;-122.16608;37.42411
    Stanford, Lincoln County, KY;0.945;-84.66189;37.53119
    Stanford, Allin, IL;0.941;-89.21786;40.43476
    Stanford, Judith Basin County, MT;0.94;-110.21826;47.15358
    Stanford, Santa Clara County, CA;0.737;-122.167340615422;37.4251401412163
    Stanford, Oakland County, MI;0.731;-83.1792681531045;42.6751206714193
    Stanford, Clay County, IL;0.725;-88.4167904448051;38.6696905856532
    Stanford, Isanti County, MN;0.725;-93.407649642371;45.4442114012486
    Stanford, Dutchess County, NY;0.725;-73.6917318290657;41.8885062048017
    Stanford, Lincoln County, KY;0.725;-84.6605602220764;37.5349694856098