Key Points
| Running and Quitting |
|
| Variables and Assignment |
|
| Data Types and Type Conversion |
|
| Built-in Functions and Help |
|
| Libraries |
|
| Reading Tabular Data into DataFrames |
|
| Pandas DataFrames |
|
| Plotting |
|
| Lists |
|
| For Loops |
|
| Looping Over Data Sets |
|
| Conditionals |
|
| Writing Functions |
|
| Programming Style |
|
| Wrap-Up |
|
| Feedback |
|
Reference
Running and Quitting
- Python files have the
.pyextension. - Can be written in a text file or a Jupyter Notebook.
- Jupyter notebooks have the extension
.ipynb - Jupyter notebooks can be opened from Anaconda or through the command line by entering
$ jupyter notebook- Markdown and HTML are allowed in markdown cells for documenting code.
- Jupyter notebooks have the extension
Variables and Assignment
- Variables are stored using
=.- Strings are defined in quotations
'...'. - Integers and floating point numbers are defined without quotations.
- Strings are defined in quotations
- Variables can contain letters, digits, and underscores
_.- Cannot start with a digit.
- Variables that start with underscores should be avoided.
- Use
print(...)to display values as text. - Can use indexing on strings.
- Indexing starts at 0.
- Position is given in square brackets
[position]following the variable name. - Take a slice using
[start:stop]. This makes a copy of part of the original string.startis the index of the first element.stopis the index of the element after the last desired element.
- Use
len(...)to find the length of a variable or string.
Data Types and Type Conversion
- Each value has a type. This controls what can be done with it.
intrepresents an integerfloatrepresents a floating point number.strrepresents a string.
- To determine a variables type, use the built-in function
type(...), including the variable name in the parenthesis. - Modifying strings:
- Use
+to concatenate strings. - Use
*to repeat a string. - Numbers and strings cannot be added to on another.
- Convert string to integer:
int(...). - Convert integer to string:
str(...).
- Convert string to integer:
- Use
Built-in Functions and Help
- To add a comment, place
#before the thing you do not with to be executed. - Commonly used built-in functions:
min()finds the smallest value.max()finds the largest value.round()rounds off a floating point number.help()displays documentation for the function in the parenthesis.- Other ways to get help include holding down
shiftand pressingtabin Jupyter Notebooks.
- Other ways to get help include holding down
Libraries
- Importing a library:
- Use
import ...to load a library. - Refer to this library by using
module_name.thing_name..indicates ‘part of’.
- Use
- To import a specific item from a library:
from ... import ... - To import a library using an alias:
import ... as ... - Importing the math library:
import math- Example of referring to an item with the module’s name:
math.cos(math.pi).
- Example of referring to an item with the module’s name:
- Importing the plotting library as an alias:
import matplotlib as mpl
Reading Tabular Data into DataFrames
- Use the pandas library to do statistics on tabular data. Load with
import pandas as pd.- To read in a csv:
pd.read_csv(), including the path name in the parenthesis.- To specify a column’s values should be used as row headings:
pd.read_csv('path', index_col='column name'), where path and column name should be replaced with the relevant values.
- To specify a column’s values should be used as row headings:
- To read in a csv:
- To get more information about a DataFrame, use
DataFrame.info, replacingDataFramewith the variable name of your DataFrame. - Use
DataFrame.columnsto view the column names. - Use
DataFrame.Tto transpose a DataFrame. - Use
DataFrame.describeto get summary statistics about your data.
Pandas DataFrames
- Select data using
[i,j]- To select by entry position:
DataFrame.iloc[..., ...]- This is inclusive of everything except the final index.
- To select by entry label:
DataFrame.loc[..., ...]- Can select multiple rows or columns by listing labels.
- This is inclusive to both ends.
- Use
:to select all rows or columns.
- To select by entry position:
- Can also select data based on values using
TrueandFalse. This is a Boolean mask.mask = subset > 10000- We can then use this to select values.
- To use a select-apply-combine operation we use
data.apply(lambda x: x > x.mean())wheremean()can be any operation the user would like to be applied to x.
Plotting
- The most widely used plotting library is
matplotlib.- Usually imported using
import matplotlib.pyplot as plt. - To plot we use the command
plt.plot(time, position). - To create a legend use
plt.legend(['label1', 'label2'], loc='upper left')- Can also define labels within the plot statements by using
plt.plot(time, position, label='label'). To make the legend show up, useplt.legend()
- Can also define labels within the plot statements by using
- To label x and y axis
plt.xlabel('label')andplt.ylabel('label')are used.
- Usually imported using
- Pandas DataFrames can be used to plot by using
DataFrame.plot(). Any operations that can be used on a DataFrame can be applied while plotting.- To plot a bar plot
data.plot(kind='bar')
- To plot a bar plot
import matplotlib.puplot as plot
plt.plot(time, position, label='label')
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.legend()
Lists
- Defined within
[...]and separated by,.- An empty list can be created by using
[].
- An empty list can be created by using
- Can use
len(...)to determine how many values are in a list. - Can index just as done in previous lessons.
- Indexing can be used to reassign values
list_name[0] = newvalue.
- Indexing can be used to reassign values
- To add an item to a list use
list_name.append(), with the item to append in the parenthesis. - To combine two lists use
list_name_1.extend(list_name_2). - To remove an item from a list use
del list_name[index].
For Loops
- Start a for loop with
for number in [1, 2, 3]:, with the following lines indented.[1, 2, 3]is considered the collection.numberis the loop variable.- The action following the collection is the body.
- To iterate over a sequence of numbers use
range(start, end)
for number in range(0,5):
print(number)
Conditionals
- Defined similarly to a loop, using
if variable conditional value:.- For example,
if variable > 5:.
- For example,
- Use
elif:for additional tests. - Use
else:for when if statement is not true. - Can combine more than one conditional by using
andoror. - Often used in combination with for loops.
- Conditions that can be used:
==equal to.>=greater than or equal to.<=less than or equal to.>greater than.<less than.
for m in [3, 6, 7, 2, 8]:
if m > 5:
print(m, 'is large')
elif m == 5:
print(m, 'is 5')
else:
print(m, 'is small')
Looping Over Data Sets
- Use a for loop:
for filename in [file1, file2]: - To find a set of files using a pattern use
glob.glob- Must import first using
import glob. *indicates “match zero or more characters”?indicates “match exactly one character”- For example:
glob.glob(*.txt)will find all files that end with.txtin the current directory.
- For example:
- Must import first using
- Combine these by writing a loop using:
for filename in glob.glob(*.txt):
for filename in glob.glob(*.txt):
data = pd.read_csv(filename)
Writing Functions
- Define a function using
def function_name(parameters):. Replaceparameterswith the variables to use when the function is executed. - Run by using
function_name(parameters). - To return a result to the caller use
return ...in the function.
def add_numbers(a, b):
result = a + b
return result
add_numbers(1, 4)
Variable Scope
- A local variable is defined in a function and can only be seen and used within that function.
- A global variable is defined outside of a function and can be seen or used anywhere after definition.
Programming Style
- Document your code.
- Use clear and meaningful variable names.
- Follow the PEP8 style guide when setting up your code.
- Use assertions to check for internal errors.
- Use docstrings to provide help.
Glossary
- Arguments
- Values passed to functions.
- Array
- A container holding elements of the same type.
- Boolean
- An object composed of
TrueandFalse. - DataFrame
- The way Pandas represents a table; a collection of series.
- Element
- An item in a list or an array. For a string, these are the individual characters.
- Function
- A block of code that can be called and re-used elsewhere.
- Global variable
- A variable defined outside of a function that can be used anywhere.
- Index
- The position of a given element.
- Jupyter Notebook
- Interactive coding environment allowing a combination of code and markdown.
- Library
- A collection of files containing functions used by other programs.
- Local Variable
- A variable defined inside of a function that can only be used inside of that function.
- Mask
- A boolean object used for selecting data from another object.
- Method
- An action tied to a particular object. Called by using
object.method. - Modules
- The files within a library containing functions used by other programs.
- Parameters
- Variables used when executing a function.
- Series
- A Pandas data structure to represent a column.
- Substring
- A part of a string.
- Variables
- Names for values.