Resource List

Listed here are some ebooks, articles, software, and other online resources that might be helpful for science writers who want to work with code and data. Also included are some examples of data-journalism stories in the sciences, and illustrations or interactive web apps made with various coding tools. Except as noted, all of these materials are freely available on the web.

General

Stackoverflow. Where to get your questions answered. Usually without even asking—someone else has already done it for you.

Software Carpentry and Data Carpentry. The focus here is on getting scientists to write better code, but most of what the suggestion apply to science writers too.

Code.org. Lots of tutorials and other introductory material. The intended audience is schoolkids, but grpwnups can play too.

Data Journalism Handbook. “This book is intended to be a useful resource for anyone who thinks that they might be interested in becoming a data journalist, or dabbling in data journalism.” (But there’s not much about science stories.)

Boston University workshop on storytelling with data. Three five-day sessions in June. [Not free.]

Geojournalism.org. Tutorials and case studies on maps, with roots in Brazil.

Python

Continuum Analytics/Anaconda Python. The recommended Python distribution. Includes 250+ packages, with a system for managing and updating them. If you have another Python already installed, this one won’t interfere with it.

Python Software Foundation. Official HQ, with documentation, downloads, etc. Includes a thorough though somewhat dated tutorial.

A Crash Course in Python for Scientists. Rick Muller. Online book published as an IPython notebook. Works fine for crashing science writers too.

IPython and Jupyter. Interactive computing = instant gratification. We’ve come a long way since the days of punchcards.

Python Quick Reference and Another Python Quick Reference. Cheat sheets for when you forget the difference between .sort() and sorted().

Exploratory Computing with Python. Mark Bakker. Online book made up of IPython notebooks and videos.

Programming-for-beginners MOOC. Introduction to Computer Science and Programming, an MIT course taught by John Guttag; Python is the language of instruction.

A few notable Python packages and modules (thousands more here):

JavaScript

Eloquent JavaScript: A Modern Introduction to Programming. Marijn Haverbeke. Second edition. No Starch Press. Free ebook. The online version has an embedded editor for completing exercises and writing and running your own code. Worth a look.

Data Visualization with JavaScript. Stephen A. Thomas. 2012. Another free online book.

Interactive Data Visualization for the Web. Scott Murray. 2013. Yet another online book, this one emphasizing the d3 data-graphics library.

Mozilla Developer Network. Mozilla, the maker of the Firefox browser, has tutorials on JavaScript, HTML, and CSS. (Python is coming soon.) Also, their JavaScript Reference is a good place to turn when you’re in the heat of a coding session.

Developer Tools in Chrome. The Chrome browser has an extensive toolkit for inspecting and debugging web pages, including the JavaScript console. See also a helpful blog post by Mitch Robb.

Developer Tools in Firefox. Not to be outdone, Firefox also offers a complete developer’s kit, including not just a JavaScript console but also a JavaScript scratchpad, where you can write and run small programs.

Codepen. “CodePen is a playground for the front end side of the web. It’s all about inspiration, education, and sharing.”

JavaScript for Cats. So cute.

ECMAScript. It not a skin disease, it’s the official name of the language.

Notable JavaScript packages:

Program Editors and Other Tools

Brackets. (See also the Github repository.) An editor for HTML/CSS/JavaScript projects that offers live preview. Brand new; pretty slick; recommended for the workshop.

Sublime Text. A popular editor among Pythonistas and others. Not actually free, but the only penalty for not paying the $70 price is bad karma.

Vim and Emacs. Old school, but they still have passionate adherents.

Git and Github. Git is a version-control system; GitHub is..., well, a Hub for Gits. "Fork Me on GitHub!" is not a rude command. (Note that the website you’re reading now is hosted on GitHub.)

Don’t forget the tools you already know. Excel or another spreadsheet program can be very useful for previewing a data set and selecting the parts you want. The search-and-replace functions of a text editor or word processor can also come in handy.

Data Sets and Sources

Amazon Web Services public data. More than 50 freely available data sets, including climate data, genomics (and other -omics), astronomical surveys, a crawl of the entire WWWW, satellite imagery, and the ever-popular social graph of characters in Marvel Comics.

Data.gov. “124,653 datasets.” Billions and billions served.

A Few Other Programming Languages

Programming languages are much simpler than human ones, and it’s easy to be a polyglot programmer. Knowing one language is a help when you go to pick up a second or third.

Ruby. Large and enthusiastic community, but not much uptake in the sciences.

R. First choice among statisticians.

Julia. The new kid on the block; still has a 0.x version number. Can be run from within IPython (or Jupyter).

Sage. For serious mathematics, built atop Python.

Racket. An offshoot of Lisp, with a self-contained workbench for building programs.

Code and Data in the Wild

Collected here are some examples of published science stories in which code and data contribute to the reporting or the presentation. Also included are a few notable works of data journalism outside the sciences, where the techniques used might well be useful to science writers. And miscellaneous other sources of inspiration.

Data-Driven Science Stories

Losing Ground. Bob Marshall and others. ProPublica, 2014. Maps, satellite imagery, and data overlays to show risks of inundation in Louisiana.

Winter 2015: Boston’s new normal?. Ben Letham, 2015. Ingenious analysis (done with the R statistics language) by an MIT student.

PNAS membership/authorship study. Peter Aldhous, Nature, 2014. Analysis of the “inside track” to PNAS publication available only to Academy members.

Warming ocean illustration. Mark Fischetti, Scientific American, 2013. What makes this particularly interesting is that the IPython notebook behind the illustration has also been published, by Roberto De Almeida.

Extent and Consequences of P-Hacking in Science. Megan L. Head et al., PLOS Biology, 2015. Text-mining the entire corpus of open-access articles on Pubmed to measure bias in favor of publishing non-null results. Python program for the whole process is available.

NSA funding of mathematicians. John Bohanon, Science, 2015. Analysis of grant acknowledgments in papers published since the 1980s, done with the scholar.py Python package.

Roadkill. Joseph Stromberg, Vox, 2015. A geographic database of 29,777 critters that didn’t make it to the other side.

Data Journalism

Horace Greeley, data journalist. Scott Klein, The ProPublica Nerd Blog, 2015. In 1848 the publisher of the New York Tribune crunched half a million numbers to write an expose of excess travel-expense reimbursements for congressmen. (At the time, Greeley was himself a congressman.)

Women at the Box Office. This is Walt Hickey’s analysis of gender vs. budget in Hollywood films, published on the 538 website. Perhaps even more interesting is Brian Keegan’s deconstruction and reconstruction of the analysis, published as an IPython notebook.

Segregation in Ferguson, MO. Jeremy Singer-Vine, BuzzFeed News, 2014. The IPython notebook supporting the story is published on GitHub.

Data Viz

Water’s Edge. Ryan McNeill, Deborah J. Nelson, and Duff Wilson, Reuters, 2014. A series of Reuters articles on sea-level rise (and land-level fall) that includes a number of interactive graphics and maps.

135 years of climate change. Peter Alhous, 2014. Animated map of global temperature by year.

Vizualizing MBTA data. Mike Barry and Brian Card, 2014. Tour de force on the T, done by two grad students at WPI.

Health and wealth of nations. Mike Bostock, 2012, based on earlier Gapminder work by Tom Carden.

Flowingdata. Thoughtful, clever graphics and tutorials—but some available only to paying subscribers.

Simulations and Interactive Presentations

The vaccination game. Ellsworth Campbell and colleagues, 2014. Try to prevent the spread of infection by strategic vaccination or quarantine. Not only informative but fun. Get the source code on GitHub.

A JavaScript Black Hole. You can push it around the sky with your mouse to see how the gravitational field distorts the images of background stars. Code on GitHub.

Mathematical visualizations. Jason Davies, 2004–2014. Dozens of gorgeous, entrancing mathematical magic shows.

Student seating habits. Ali Almossawi, 2011. The dance of the classroom.

Thomas Schelling's model of residential segregation. Simulation by Jerome Cucker.

The U.S. health map. University of Washington, 2014. County-level data on smoking, obesity, etc.

Climbing Everest. Richard Johnson, Bonnie Berkowitz, and Lazaro Gamio, The Washington Post, 2015. From sea level to the summit in one long scroll. The height of the mountain, it turns out, is 48,155 pixels.

The Limits to Growth. Brian Hayes, bit-player.org, 2012. The famous prophecy of doom was a big computing job on a mainframe. Now it's a JavaScript program that runs in the browser.

Population pyramid, 1950–2100. Brian Hayes, bit-player.org, 2012. Built as a demo of technologies for animated online illustration.

A climate model in a browser window. Brian Hayes, 2014.

Miscellaneous Fun

Koalas to the max. No science here, but a fabulous demonstration of what the medium is capable of doing. For an explanation of how it works, see Vadim Ogievetsky’s case study.

CSS of Myself. No science here either, but a JS tour de force.