A Python Primer

by Rahul Dave

Why learn Python? It is a hugely popular language for data analysis, and scientific computing. It is a simple language to learn, yet extremely powerful, mostly because almost all scientists use it, and thus there are libraries to do almost anything in the language. There are libraries for simulation, statistical analysis, and natutal language processing, for example. It will help you make fast computations with random numbers, make quick exploratory graphs for a dataset, and munge and clean text and data.

The first thing you ought to do is to install the Anaconda Distribution, and bring up the Jupyter Notebook interface, as detailed on the Software page. Here are a few notebooks to help you get started.

If you are completely new to Python, you ought to check out the first notebook of the Exploratory Computing with Python primer listed on our resources page. You can download the notebook by pressing the download icon at upper right in the browser window. Then navigate to your download folder within Jupyter and open the notebook locally. It moves through the basics rather nicely. After you have worked through it, you might wish to turn your attention to the next two notebooks.

All the notebooks and data files developed for the Python part of the NESW workshop can be downloaded here. Double-click the zip archive to decompress it, then store the contents in any convenient place on your hard disk. Among the files inside are three IPython notebooks: elnino.ipynb, first_this.ipynb, and Pandas_the_Spreadsheet.ipynb. You can open them from the Jupyter main page in your web browser. Just follow the instructions near the top of that page: “To import a notebook, drag the file onto the listing below or click here.”

If you have programmed before, the fast-moving first_this.ipynb notebook introduces some of the basic syntax, data types, control structures, and flow in the language through a series of examples. Run it cell by cell, change things, and play around! (The data file hamlet.txt referred to in this notebook is also included in the zip archive.)

One of the great things about Python is that people have created toolkits, or libraries, that give Python users superpowers. The Pandas_the_Spreadsheet.ipynb notebook introduces Pandas, a Python library for managing, analyzing and modeling spreadsheet-formatted data. You can also familiarize yourself with Pandas by watching Wes McKinney’s 10-minute video tour.

The data set used in this notebook is one we scraped from the Goodreads web site for another workshop. We walk through loading, manipulating, and saving data from it. The data is in comma-separated-value (csv) form, which represents items by rows, with data features separated by commas. The first line is usually taken to be the column headings. You can save data in csv format from Numbers and Excel, when playing with your own data.

Finally, elnino.ipynb is the presentation-day notebook