The Checklist

In your compciv-2016 Git repository create a subfolder and name it:

     exercises/0013-sorted-names

The folder structure will look like this (not including any subfolders such as `tempdata/`:

        compciv-2016
        └── exercises
            └── 0013-sorted-names
               ├── a.py
               ├── b.py
               ├── c.py
               ├── d.py
               ├── e.py

`a.py`	0.5 points	Download the 2014 text file of babynames and count the number of characters
`b.py`	0.5 points	Print the 10 most popular names in 2014, regardless of gender
`c.py`	1.0 points	Print the 10 longest names, given to at least 2,000 babies in 2014
`d.py`	1.0 points	Print the 5 most popular female and male names in 2014 that contain at least one "x"
`e.py`	1.5 points	Print the percentage of babies in 2014 who had popular names.

Background information

Same dataset as the previous exercise, but now you get to practice the built-in sorted function.

The Exercises


          0013-sorted-names/a.py

Download the 2014 text file of babynames and count the number of characters

0.5 points

(Yes, this is virtually the same as 0012-got-babynames-2014/a.py)

Make a tempdata subdirectory inside your working directory, i.e.
```
  0013-sorted-names/tempdata
```
Download the list of Social Security babynames data for 2014:

http://stash.compciv.org/ssa_baby_names/ssa-babynames-nationwide-2014.txt

Save it to your tempdata folder at this path:
```
  0013-sorted-names/tempdata/ssa-babynames-nationwide-2014.txt
```
Count and print the number of characters in the file.

Expectations

When you run a.py from the command-line:

0013-sorted-names $ python a.py

The program's output to screen should be:

There are 425485 characters in tempdata/ssa-babynames-nationwide-2014.txt


          0013-sorted-names/b.py

Print the 10 most popular names in 2014, regardless of gender

0.5 points

The Social Security Administration’s baby name data is ordered by gender, then by baby count in descending order. Rearrange the list so that it is just sorted by baby count in descending order. Then print the first 10 rows.

The easiest way to approach this (and the other exercises) is to iterate through each line in the file and create a list. Then, do the sorting:

records_list = []
f = open(yourfilename, 'r')
for line in f:
    name, sex, babies = line.strip().split(',')
    row = [name, sex, int(babies)]
    records_list.append(row)

Another way to approach this, if you’ve forgotten how a for-loop can iterate through a file object:

records_list = []
lines = open(yourfilename, 'r').readlines()
for line in lines:
    name, sex, babies = line.strip().split(',')
    row = [name, sex, int(babies)]
    records_list.append(row)

What does that for-loop do? Well, records_list now contains a list of lists, as opposed to just a list of strings.

In other words, the above for-loop turned each line (a string):

"Emma,F,20799"

Into a list object, containing 3 objects:

["Emma", "F", 20799]

Now, we just need to:

Sort babylist in reverse order of its third element, e.g. the baby count.
Then loop through just the first 10 elements, and print the results.

Expectations

When you run b.py from the command-line:

0013-sorted-names $ python b.py

The program's output to screen should be:

1. Emma,F,20799
2. Olivia,F,19674
3. Noah,M,19144
4. Sophia,F,18490
5. Liam,M,18342
6. Mason,M,17092
7. Isabella,F,16950
8. Jacob,M,16712
9. William,M,16687
10. Ethan,M,15619


          0013-sorted-names/c.py

Print the 10 longest names, given to at least 2,000 babies in 2014

1.0 points

Of the names that have been given to at least 2,000 babies – male and female combined – in 2014, print the top 10 in descending order of character length. Note that in a case of a tie, (i.e. 2 names with 10 letters), sort by number of babies.

The 2,000 baby count is the combined number of boys and girls for a given name. So you’ll want to create a new list from the original data that aggregates both boy and girl babies into a single count per name.

A partial answer for c.py:

(You can also view it on Github)

from os.path import join

DATADIR = 'tempdata'
FPATH = join(DATADIR, 'ssa-babynames-nationwide-2014.txt')

Now we need to create a dictionary derived from the data in which every name is a key and points to the total number of babies (i.e. both “M” and “F”) e.g.

  {
      'Mackenzie': 4152
      'Christopher': 10293
  }

namesdict = {}
with open(FPATH) as f:
    for line in f:
        name, sex, babies = line.strip().split(',')
        if namesdict.get(name):
            namesdict[name] += int(babies)
        else:
            namesdict[name] = int(babies)

This is necessary because the assignment requires that we select the longest names from a list of names, each of which have been given to at least 2,000 babies – M and F – so we need to basically rebuild a list that is gender-agnostic and is just a list of names and numbers.

After namesdict is populated, we filter it to include only key-value pairs, in which the value (i.e. number of babies) is at least 2,000, as per the assignment requirements.

Then, finally, with that filtered list of “popular” names, you can then sort it by length of name, then number of babies.

Expectations

When you run c.py from the command-line:

0013-sorted-names $ python c.py

The program's output to screen should be:

Christopher        10293
Alexander          15326
Charlotte          10055
Elizabeth           9498
Sebastian           9246
Christian           8520
Gabriella           5051
Annabelle           4324
Nathaniel           4257
Mackenzie           4152


          0013-sorted-names/d.py

Print the 5 most popular female and male names in 2014 that contain at least one "x"

1.0 points

Iterate through the list of names in 2014 and print the 5 most popular names that contain at least one "x", for both females and males.

Follow the process in b.py, in which we write a for-loop just to make a list of lists from the file…but with one twist…use an if-statement to only append rows which meet a certain condition…i.e. the name contains at least one "x":

x_list = []
f = open(yourbabynamesfilename, 'r')
for line in f:
    name, sex, babies = line.strip().split(',')
    if "SOMETHING SOMETHING SOMETHNG":
        row = [name, sex, int(babies)]
        x_list.append(row)

Then you can do two for-loops two create two new lists from x_list, one in which the gender is F and M respectively, and sort them in descending order of count. Then iterate through each list for the top 5 names.

There’s more graceful ways to do it, but whatever makes sense to you with the least amount of typing…

Expectations

When you run d.py from the command-line:

0013-sorted-names $ python d.py

The program's output to screen should be:

Female
1. Alexa             4227
2. Alexis            4188
3. Alexandra         3288
4. Ximena            2323
5. Alexandria        1589
Male
1. Alexander        15293
2. Jaxon             7635
3. Jaxson            4900
4. Xavier            4726
5. Maxwell           3703


          0013-sorted-names/e.py

Print the percentage of babies in 2014 who had popular names.

1.5 points

Print the percentage of babies – rounded to the nearest percent – who have a name in these five brackets of popularity:

Top 10 most popular names
Top 11 to 100 most popular names
Top 101 to 1000 most popular names
Top 1,001 to 10,000 most popular names
All other names, 10,001 and so on

Expectations

When you run e.py from the command-line:

0013-sorted-names $ python e.py

The program's output to screen should be:

Names 1 to 10: 4.9
Names 11 to 100: 22.9
Names 101 to 1000: 43.0
Names 1001 to 10000: 23.9
Names 10001 to 30579: 5.3

Sorting those Baby Names

Summary

The Checklist

Background information

The Exercises

0013-sorted-names/a.py » Download the 2014 text file of babynames and count the number of characters

0013-sorted-names/b.py » Print the 10 most popular names in 2014, regardless of gender

0013-sorted-names/c.py » Print the 10 longest names, given to at least 2,000 babies in 2014

0013-sorted-names/d.py » Print the 5 most popular female and male names in 2014 that contain at least one "x"

0013-sorted-names/e.py » Print the percentage of babies in 2014 who had popular names.

References and Related Readings