Code and Data in the Wild

Collected here are some examples of published science stories in which code and data contribute to the reporting or the presentation. Also included are a few notable works of data journalism outside the sciences, where the techniques used might well be useful to science writers. And miscellaneous other sources of inspiration.

Data-Driven Science Stories

Losing Ground. Bob Marshall and others. ProPublica, 2014. Maps, satellite imagery, and data overlays to show risks of inundation in Louisiana.

Winter 2015: Boston’s new normal?. Ben Letham, 2015. Ingenious analysis (done with the R statistics language) by an MIT student.

PNAS membership/authorship study. Peter Aldhous, Nature, 2014. Analysis of the “inside track” to PNAS publication available only to Academy members.

Warming ocean illustration. Mark Fischetti, Scientific American, 2013. What makes this particularly interesting is that the IPython notebook behind the illustration has also been published, by Roberto De Almeida.

Extent and Consequences of P-Hacking in Science. Megan L. Head et al., PLOS Biology, 2015. Text-mining the entire corpus of open-access articles on Pubmed to measure bias in favor of publishing non-null results. Python program for the whole process is available.

NSA funding of mathematicians. John Bohanon, Science, 2015. Analysis of grant acknowledgments in papers published since the 1980s, done with the Python package.

Roadkill. Joseph Stromberg, Vox, 2015. A geographic database of 29,777 critters that didn’t make it to the other side.

Data Journalism

Horace Greeley, data journalist. Scott Klein, The ProPublica Nerd Blog, 2015. In 1848 the publisher of the New York Tribune crunched half a million numbers to write an expose of excess travel-expense reimbursements for congressmen. (At the time, Greeley was himself a congressman.)

Women at the Box Office. This is Walt Hickey’s analysis of gender vs. budget in Hollywood films, published on the 538 website. Perhaps even more interesting is Brian Keegan’s deconstruction and reconstruction of the analysis, published as an IPython notebook.

Segregation in Ferguson, MO. Jeremy Singer-Vine, BuzzFeed News, 2014. The IPython notebook supporting the story is published on GitHub.

Data Viz

Water’s Edge. Ryan McNeill, Deborah J. Nelson, and Duff Wilson, Reuters, 2014. A series of Reuters articles on sea-level rise (and land-level fall) that includes a number of interactive graphics and maps.

135 years of climate change. Peter Alhous, 2014. Animated map of global temperature by year.

Vizualizing MBTA data. Mike Barry and Brian Card, 2014. Tour de force on the T, done by two grad students at WPI.

Health and wealth of nations. Mike Bostock, 2012, based on earlier Gapminder work by Tom Carden.

Flowingdata. Thoughtful, clever graphics and tutorials—but some available only to paying subscribers.

Simulations and Interactive Presentations

The vaccination game. Ellsworth Campbell and colleagues, 2014. Try to prevent the spread of infection by strategic vaccination or quarantine. Not only informative but fun. Get the source code on GitHub.

A JavaScript Black Hole. You can push it around the sky with your mouse to see how the gravitational field distorts the images of background stars. Code on GitHub.

Mathematical visualizations. Jason Davies, 2004–2014. Dozens of gorgeous, entrancing mathematical magic shows.

Student seating habits. Ali Almossawi, 2011. The dance of the classroom.

Thomas Schelling's model of residential segregation. Simulation by Jerome Cucker.

The U.S. health map. University of Washington, 2014. County-level data on smoking, obesity, etc.

Climbing Everest. Richard Johnson, Bonnie Berkowitz, and Lazaro Gamio, The Washington Post, 2015. From sea level to the summit in one long scroll. The height of the mountain, it turns out, is 48,155 pixels.

The Limits to Growth. Brian Hayes,, 2012. The famous prophecy of doom was a big computing job on a mainframe. Now it's a JavaScript program that runs in the browser.

Population pyramid, 1950–2100. Brian Hayes,, 2012. Built as a demo of technologies for animated online illustration.

A climate model in a browser window. Brian Hayes, 2014.

Miscellaneous Fun

Koalas to the max. No science here, but a fabulous demonstration of what the medium is capable of doing. For an explanation of how it works, see Vadim Ogievetsky’s case study.

CSS of Myself. No science here either, but a JS tour de force.