Previous Lecture Complete and continue  

  The Very Basics of Python

On to the good stuff! We'll start off by going over the very very very basics of using Python, including basic math, data structures, and for loops.

At the bottom of this lesson (and all our coding lessons, for that matter) is a file version of the code displayed in this class. You can download it and open it on your own computer for execution. I recommend not going that route unless you get stuck for some reason. Typing each command out on your own is better practice.

The first few lessons will be documented with excruciating detail so you aren't left with any questions or assumptions. The pace of the course will pick up after we lay the foundation.

Intro to basic Python

Welcome to the first official lesson on Python! First things first - whitespace, punctuation, and capitalization all matter in Python. Vertical whitespace is okay (e.g. blank lines), but horizontal whitespace is used to block lines of code that belong together. If you are having trouble executing code, ensure that you have no typos, no extra whitespace, and that your capitalization matches the examples.

When looking at my example code in these lessons, pay attention to the In and Out markers right above the gray code boxes. The "In" boxes are the code that I submitted. The Out boxes contain the output of that code.

In the following examples, you'll also see that I use # to write little notes about what I'm doing. You can use this character can write reminders to yourself and your readers without it interfering with your code.

Let's get started. The easiest bit of code is using Python as a regular calculator.

In [1]:
1 + 2 #addition
In [2]:
2 * 2.5 #multiplication 

A note about division - in older versions of Python, the / does not actually divide like you would expect it to. If you try the line of code below and get a 0 out instead of 2, scroll up to the top of your notebook and copy/paste: from __future__ import division. The word future there has two underscores before and after.

In [3]:

If you want to store words (including numbers that are not actually numbers, but representations of categories) put quotes around them. These are known as strings. Both single and double quotes work. You can multiply strings, as seen below.

In [4]:
'cat' * 4

The lines above were not stored in the computer, they were just printed and forgotten. If we want the computer to remember the output (the answer from a multiplication problem, for example), we can assign a variable name using an = sign. When you execute that code, nothing will be printed. To access the value stored in the variable name, just type it again either by itself or inside a print() statement.

In [5]:
price_of_milk = 3.25
In [6]:

If you want to store more than one thing under the same variable name, use a list. A list is just a pair of square brackets with a series of items inside, each separated by a comma. Once again, when you execute the code nothing will be printed. To see the list, you must type the variable name.

In [7]:
pets = ['cat', 'dog', 'ferret', 'canary']
In [8]:
['cat', 'dog', 'ferret', 'canary']

Lists can contain any kind of data, including a mix of different data types. Numbers, strings, or some of each are fine. You can even make nested lists.

In [9]:
reps = [1, 2, 3, 'ant', 32, 'elephants', ['coffee', 'milk']]
In [10]:
[1, 2, 3, 'ant', 32, 'elephants', ['coffee', 'milk']]

One important thing to note - The first number in Python is 0, not 1. We will use this bit of knowledge to access the first element in our list using square brackets. This process of accessing elements is known as indexing, and the quirk of the first number in Python being a zero is known as zero-indexing.

In [11]:

In the example below, the 2nd element in the list is selected using 1 - because remember, the first number in Python is zero. This will trip you up for a long time.

In [12]:

It is also possible to obtain the last element (or last nth element) by using a negative index.

In [13]:
In [14]:

If you would like to access a certain range in the list, you can put in the start and end indices separated by a colon, like so.

In [15]:
['ferret', 'canary']

For loops

You can iterate (go one by one) through lists using a for loop. The plain-text structure is "for [one thing] in [a list of things]: do something." The devil is in the details here; the first line must end in a colon, and the second line must be indented either four spaces or one tab.

You can call the [one thing] whatever you want. I like to be descriptive with what I call each item (below I used animal to describe kinds of pets), but often you will see it written as i (as in for i in pets). The i is short for item. Don't worry if this doesn't click yet, we'll talk about for loops more in a future lesson.

In [16]:
for animal in pets: #note this line ends in a colon
    print(animal)    #this line is indented exactly four spaces

When interating, you can perform operations on each element, one by one. Here we are printing the name of the animal twice. You'll notice I'm including print on the second line. Without it, nothing visible happens. You have to be explicit in for loops.

In [17]:
for animal in pets:
    print(animal * 2)

What if we want to keep track of the operation performed inside the loop? The solution is to put an empty list on the outside of the loop, and add the result of each new operation to the list using append. You'll notice that this code block does not print out anything. That's because we didn't ask it to! If we want to see each doubled animal as it is created, we would have to add an additional print statement.

In [18]:
doubled_pets = [] #an empty list

for animal in pets:
    animal_twice = animal*2
    doubled_pets.append(animal_twice) #append the animal_twice string to the list

Now let's check that the external list captured everything that happened inside the loop. It worked!

In [19]:
['catcat', 'dogdog', 'ferretferret', 'canarycanary']

A more useful and interesting operation that doubling is to capitalize each word using str.capitalize(item). Similarly, lowercasing can be useful when working with proper nouns using str.lower(item).

Lowercasing every word in your dataset is a good way to make sure words match when you want them to. For example, if you are comparing a list of exposed people to a list of infected people and want to identify names that appear in both, 'Caitlin' will not match with 'caitlin'. By converting everything to lower case, you can avoid that quirk.

In [20]:
for animal in pets:
    print str.capitalize(animal)

A list is just one example of a data structure. Another example is called a dictionary. Dictionaries store pairs of keys and values - just like words and definitions in a dictionary. One basic use of dictionaries is to label your data, so you remember what is what.

Dictionaries are bookended by curly brackets. Each entry must have a key, which is linked to a value using a colon (:). Entries are separated by commas. The only other rule is that keys can only be strings or numbers, they cannot be e.g. lists. However, values can be lists though. Do not use duplicate keys within the same dictionary - that defeats the purpose of using a key as an identifier.

In the examples below, I'll begin to build a rudimentary line list. (A line list, if you are new to epidemiology, is a collection of data about people who were infected or exposed to an infectious disease. Traditionally, each patient gets one line in a spreadsheet.)

In [21]:
epi = {'Name':'John Snow', 'Age':56, 'Sex':'Male'}

Accessing elements in a dictionary is very similar to lists, but instead of putting an index in square brackets, insert the name of the key.

In [22]:
'John Snow'
In [23]:

Another good use of dictionaries is to keep track of things that you would normally put in a spreadsheet. In other words, data that have common keys and different values. Here we will use a list of dictionaries to keep track of our favorite epidemiologists. Notice how instead of retyping our John Snow example, I just passed the existing variable into our new structure.

In [24]:
famous_epis = [epi, {'Name':'Jonas Salk','Age':45, 'Sex':'Male'}, {'Name':'Margaret Chan', 'Age':40, 'Sex':'Female'}]
In [25]:
[{'Age': 56, 'Name': 'John Snow', 'Sex': 'Male'},
 {'Age': 45, 'Name': 'Jonas Salk', 'Sex': 'Male'},
 {'Age': 40, 'Name': 'Margaret Chan', 'Sex': 'Female'}]

We can access elements of the list by combining our list indexing and our dictionary indexing processes

In [26]:
'John Snow'
In [27]:

And if we want to search for particular records, use a for loop to check each entry in the list.

In [28]:
for person in famous_epis:
    print person['Name']
John Snow
Jonas Salk
Margaret Chan

This is kind of cumbersome though. I wouldn't want to do too many analyses in this style.

This is exactly the kind of probelm where pandas, a Python package for working with data, would be useful. In order to use pandas, we need to import it into our notebook. This line must be at the top of every notebook where you intend to use the package. Nothing visible will happen when you import the package, but rest assured that it loaded pandas functionality into your notebook.

The word pandas is shortened to pd by convention, in order to make it quicker to type.

In [29]:
import pandas as pd

Now let's use the pandas pacakage to covert our list of dictionaries into a nice dataframe (a table, basically) to make it easier to work with. We start by calling the pd.DataFrame() function of the pandas package. The pd prefix tells the computer to refer to the pandas package. The DataFrame() function will convert our list of dictionaries into the data frame structure. We put the name of our list of dictionaries into the parentheses, to tell the function what we want converted.

In [30]:
famous_epis2 = pd.DataFrame(famous_epis)
In [31]:
Age Name Sex
0 56 John Snow Male
1 45 Jonas Salk Male
2 40 Margaret Chan Female

Isn't that better? The next lesson will go into more depth about how to work with dataframes effectively.