I've created landing pages for your convenience:
Tuesday and Thursday, 1:30 to 2:50PM
Building 160, Room 322 [CourseExplorer link]
Instructor: Dan Nguyen | @dancow | dun @ stanford
Office hours: Tuesday and Thursdays, 3 to 6PM. Or by appointment.
Slack Chat: stanfordcompciv.slack.com - Give this popular chat client a try if you haven't already. It might be an easier way to reach me and a much easier way for me to post code and so forth.
(The 2015 site is archived here)
A winter elective on programming and journalism for the Stanford Computational Journalism Lab, taught by Dan Nguyen. The programming part involves modestly-written, simple programs which are then executed with simple brute force. The journalism part involves finding something important about the world.
This does not work as elegantly as we want, but once you've eliminated the boring, whatever remains, no matter how improbably, might be less boring.
You aren't expected to have much if any experience with programming. So we'll take advantage of the years of time you've spent reading and writing. You will be writing some code. And you will eventually be reading more code than you write.
We start out by being introduced to the modern personal computer (PC), to install plain text editors and a text-based programming languages, for efficiently working with text. Then we learn how to use the web browser and its development tools to see how webpages of text are generated by sending strings of text (including cats, but described in text) between computers.
Reading: Who Controls Your Facebook Feed
Reading: Why do Nigerian Scammers Say They are from Nigeria?
One of our goals in programming is to not let have to be there when the program does its work. The concept of a "block" will have us writing code more deliberately, and almost exclusively in our text editors.
Hard to believe, but data formats, such as JSON, were designed to make data both efficient for machines and humans to consume. If you can read JSON, you can basically do the kinds of interesting data mishmashes that make startups and apps seem magical. Airbnb is a startup that uses Facebook data, which you give it, combined with the data of its customers. Tinder does that too, probably in a much easier way. And Tinder also uses JSON under its hood, apparently.
Learning programming without being connected to other computers is like learning a foreign language from a book.
Learning programming through APIs is like learning a foreign language by visiting a foreign country with its permission, and sometimes its hospitality.
Something to think about:
APIs are a technical thing. But their existence, their design, and their availability reflect things about their owners and the data that they distribute.
Another way to put it: The Dallas Police Department as an API (via the Socrata portal) of police involved shootings. Virtually no other police department does. Why?
An overview of HTTP and the Python Requests library, and how the URL query string is used to define data resources, e.g. Google Static Maps and Google Street View.
HTML is a form of structured text and provides a convenient way for us to build user-facing data files and analysis. Designing command-line applications allow us to build scripts that provide more direct interaction with data functions.
This week reserved for class time in working on projects.
Give notice several days in advance. If necessary, we can arrange for you to do a short-term project.
I want you to install some things. Then ask me questions. There's not much else to do so don't do anything you don't know that you don't know how to do.
You may already have Python on your system. If you already really, really know what you're doing (i.e. you used pyenv
to set things up. Or you think you could manage that, then you can ask me for help), then you can ignore this part.
Otherwise, this is like installing any other program to your computer. Though keep in mind the file size is quite hefty. Email me if you are have hard drive space limitations, e.g. fewer than a few gigabytes.
The instructions are here. However, to make things consistent, don't download the most recent file. Download the most appropriate installer from the archive:
If you are unsure of anything, just email me.
Let's wait till Thursday before we try to install other packages. Although if you really know what you're doing, you can try to look up and install, in this order:
If you can do this, then you can try out the face detection script, which will be the most complicated kind of script we can copy from and execute. It's actually not that important, it's just an example of what we can do with just text.
Sublime Text 3 is a plaintext editor. It will be your primary, and possibly only tool you'll need to write and organize programs.
Always bet on text by Graydon Hoare:
Text is the most socially useful communication technology. It works well in 1:1, 1:N, and M:N modes. It can be indexed and searched efficiently, even by hand. It can be translated. It can be produced and consumed at variable speeds. It is asynchronous. It can be compared, diffed, clustered, corrected, summarized and filtered algorithmically. It permits multiparty editing. It permits branching conversations, lurking, annotation, quoting, reviewing, summarizing, structured responses, exegesis, even fan fic. The breadth, scale and depth of ways people use text is unmatched by anything. There is no equivalent in any other communication technology for the social, communicative, cognitive and reflective complexity of a library full of books or an internet full of postings. Nothing else comes close.
So this is my stance on text: always pick text first.
We'll actually be reading a lot about Facebook over the course, so including it here is a bit overkill. Though this recent development (as of yesterday) is worth talking about. Most of the news about Facebook's research concerns their data science, but check out their publications page for a long list of scientific papers. Expect to be OK with being able to find the bigger picture, even if the math escapes you.
Unicode: A story of corruption, connection, and smiling poo
Unicode is traditionally something programmers hate, because of the bugs it causes in programs that read text (i.e. basically all of them). It took me awhile to realize that Unicode is one of the most amazing creations of
Yeah, those graphs seem intimidating…I don't know if I could easily explain them in English. But try to pick out the "bigger picture" reason – what is it about email, the cost of email, the number of scammers vs. number of victims, and most importantly, the type of victims that makes, "Hey, I'm a prince from an exotic place, send me money" seemingly effective?
Related:
Text is a valuable programming interface for communicating with computers. It can lead to many fundamental calculations and algorithmic computations, including: