Udacity Data Scientist Nanodegree : Prerequisite — Python(L2, L3, L4)
Lesson 2: Data Types and Operators / Lesson 3: Control Flow / Lesson 4: Functions
Lesson 2: Data Types and Operators
Cautions!
- Python is case sensitive.
- Spacing is important.
- Use error messages to help you learn.
4.445e8 = 4.445 * 10 ** 8
Arithmetic Operators
**
Exponentiation (note that^
does not do this operation, as you might have seen in other languages)//
Divides and rounds down to the nearest integer
Variables
Assign: x, y, z = 3, 4, 5
> but not a great way to assign variable
1
. Only use ordinary letters, numbers and underscores in your variable names. They can’t have spaces, and need to start with a letter or underscore.
2
. You can’t use reserved words or built-in identifiers
3
. The pythonic way to name variables is to use all lowercase letters and underscores to separate words.(Class: Capital letter at first/ Constant: all upper letter)
Integers and Floats
x = int(4.7) > x == 4
Because the float, or approximation, for 0.1 is actually slightly more than 0.1, when we add several of them together we can see the difference between the mathematically correct answer and the one that Python creates.
>>> print(.1 + .1 + .1 == .3)
False
String
You can include a \
in your string to be able to include one of these quotes:
this_string = 'Simon\'s skateboard is in the garage.'
>>> Simon's skateboard is in the garage.
Split()
- A basic split method:
new_str = "The cow jumped over the moon."
new_str.split()
>>> ['The', 'cow', 'jumped', 'over', 'the', 'moon.']
- Here the separator is space, and the max split argument is set to 3.
new_str.split(' ', 3)
>>> ['The', 'cow', 'jumped', 'over the moon.']
- Using ‘.’ or period as a separator.
new_str.split('.')
>>> ['The cow jumped over the moon', '']
Slice and Dice with Lists
You saw that we can pull more than one value from a list at a time by using slicing. When using slicing, it is important to remember that the lower
index is inclusive
and the upper
index is exclusive
.
Therefore, this:
>>> list_of_random_things = [1, 3.4, 'a string', True]
>>> list_of_random_things[1:2]
[3.4]>>> list_of_random_things[:2]
[1, 3.4]>>> list_of_random_things[1:]
[3.4, 'a string', True]
Mutability and Order
Mutability is about whether or not we can change an object once it has been created. If an object (like a list or string) can be changed (like a list can), then it is called mutable. However, if an object cannot be changed without creating a completely new object (like strings), then the object is considered immutable.
>>> my_list = [1, 2, 3, 4, 5]
>>> my_list[0] = 'one'
>>> print(my_list)
['one', 2, 3, 4, 5]
As shown above, you are able to replace 1 with ‘one’ in the above list. This is because lists are mutable.
However, the following does not work:
>>> greeting = "Hello there"
>>> greeting[0] = 'M'
This is because strings are immutable. This means to change this string, you will need to create a completely new string.
There are two things to keep in mind for each of the data types you are using:
- Are they mutable?
- Are they ordered?
Order is about whether the position of an element in the object can be used to access the element. Both strings and lists are ordered. We can use the order to access parts of a list and string.
However, you will see some data types in the next sections that will be unordered. For each of the upcoming data structures you see, it is useful to understand how you index, are they mutable, and are they ordered. Knowing this about the data structure is really useful!
Useful Functions for Lists
sorted()
: returns a copy of a list in order from smallest to largest, leaving the list unchanged.join()
: Join is a string method that takes a list of strings as an argument, and returns a string consisting of the list elements joined by a separator string.
name = "-".join(["García", "O'Kelly"])
print(name)
>>> García-O'Kelly
Tuple
A tuple is another useful container. It’s a data type for immutable(can’t add, remove items from tuples or sort them) ordered sequences of elements. They are often used to store related pieces of information.
Tuples can also be used to assign multiple variables in a compact way.
dimensions = 52, 40, 100
length, width, height = dimensions
print("The dimensions are {} x {} x {}".format(length, width, height))
The parentheses are optional when defining tuples, and programmers frequently omit them if parentheses don’t clarify the code.
In the second line, three variables are assigned from the content of the tuple dimensions. This is called tuple unpacking. You can use tuple unpacking to assign the information from a tuple into multiple variables without having to access them one by one and make multiple assignment statements.
Sets
A set is a data type for mutable unordered collections of unique elements. One application of a set is to quickly remove duplicates from a list.
numbers = [1, 2, 6, 3, 1, 1, 6]
unique_nums = set(numbers)
print(unique_nums)
>>> {1, 2, 3, 6}
Dictionaries
A dictionary is a mutable data type that stores mappings of unique keys to values. Here’s a dictionary that stores elements and their atomic numbers.
elements = {"hydrogen": 1, "helium": 2, "carbon": 6}
We can check whether a value is in a dictionary the same way we check whether a value is in a list or set with the in
keyword. Dicts have a related method that's also useful, get
. get looks up values in a dictionary, but unlike square brackets, get returns None (or a default value of your choice) if the key isn't found.
print("carbon" in elements)
print(elements.get("dilithium"))
This would output:
True
None
Carbon is in the dictionary, so True is printed. Dilithium isn’t in our dictionary so None is returned by get
and then printed. If you expect lookups to sometimes fail, get
might be a better tool than normal square bracket lookups because errors can crash your program.
You can check if a key returned None with the is
operator. You can check for the opposite using is not
.
n = elements.get("dilithium")
print(n is None)
print(n is not None)
This would output:
True
False
Dictionary keys must be immutable, that is, they must be of a type that is not modifiable.
is — It depends on the object.
Lesson 3: Control flow
Indentation
Spaces or Tabs? — The Python Style Guide recommends using 4 spaces to indent, rather than using a tab. Whichever you use, be aware that “Python 3 disallows mixing the use of tabs and spaces for indentation.”
If — Good and Bad Examples
- Don’t use
True
orFalse
as conditions(if True: ) - Be careful writing expressions that use logical operators
- Don’t compare a boolean variable with
== True
or== False
For Loops
A for
loop is used to "iterate", or do something repeatedly, over an iterable.
An iterable is an object that can return one of its elements at a time. This can include sequence types, such as strings, lists, and tuples, as well as non-sequence types, such as dictionaries and files.
Iterating Through Dictionaries with For
Loops
cast = {
"Jerry Seinfeld": "Jerry Seinfeld",
"Julia Louis-Dreyfus": "Elaine Benes",
"Jason Alexander": "George Costanza",
"Michael Richards": "Cosmo Kramer"
}
If you wish to iterate through both keys and values, you can use the built-in method items
like this:
for key, value in cast.items():
print("Actor: {} Role: {}".format(key, value))
This outputs:
Actor: Jerry Seinfeld Role: Jerry Seinfeld
Actor: Julia Louis-Dreyfus Role: Elaine Benes
Actor: Jason Alexander Role: George Costanza
Actor: Michael Richards Role: Cosmo Kramer
While
Loops
For
loops are an example of "definite iteration" meaning that the loop's body is run a predefined number of times. This differs from "indefinite iteration" which is when a loop repeats an unknown number of times and ends when some condition is met, which is what happens in a while
loop. pop
is a list method that removes the last element from a list and returns it.
Break, Continue
Sometimes we need more control over when a loop should end, or skip an iteration. In these cases, we use the break
and continue
keywords, which can be used in both for
and while
loops.
break
terminates a loopcontinue
skips one iteration of a loop
Zip and Enumerate
zip
and enumerate
are useful built-in functions that can come in handy when dealing with loops.
Zip
zip
returns an iterator that combines multiple iterables into one sequence of tuples. Each tuple contains the elements in that position from all the iterables. For example, printing
list(zip(['a', 'b', 'c'], [1, 2, 3]))
would output [('a', 1), ('b', 2), ('c', 3)]
.
Like we did for range()
we need to convert it to a list or iterate through it with a loop to see the elements.
You could unpack each tuple in a for
loop like this.
letters = ['a', 'b', 'c']
nums = [1, 2, 3]for letter, num in zip(letters, nums):
print("{}: {}".format(letter, num))
In addition to zipping two lists together, you can also unzip a list into tuples using an asterisk.
some_list = [('a', 1), ('b', 2), ('c', 3)]
letters, nums = zip(*some_list)
This would create the same letters
and nums
tuples we saw earlier.
Enumerate
enumerate
is a built in function that returns an iterator of tuples containing indices and values of a list. You'll often use this when you want the index along with each element of an iterable in a loop.
letters = ['a', 'b', 'c', 'd', 'e']
for i, letter in enumerate(letters):
print(i, letter)
This code would output:
0 a
1 b
2 c
3 d
4 e
Some examples
x_coord = [23, 53, 2, -12, 95, 103, 14, -5]
y_coord = [677, 233, 405, 433, 905, 376, 432, 445]
z_coord = [4, 16, -6, -42, 3, -6, 23, -1]
labels = ["F", "J", "A", "Q", "Y", "B", "W", "X"]
points = []
for point in zip(labels, x_coord, y_coord, z_coord):
points.append("{}: {}, {}, {}".format(*point))
for point in points:
print(point)
Output:
F: 23, 677, 4
J: 53, 233, 16
A: 2, 405, -6
Q: -12, 433, -42
Y: 95, 905, 3
B: 103, 376, -6
W: 14, 432, 23
X: -5, 445, -1
Notice here, the tuple was unpacked using *
in the format
method. This can help make your code cleaner!
Transpose with Zip
data = ((0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11))data_transpose = tuple(zip(*data))
print(data_transpose)
Output:
((0, 3, 6, 9), (1, 4, 7, 10), (2, 5, 8, 11))
List Comprehensions
In Python, you can create lists really quickly and concisely with list comprehensions. This example from earlier:
capitalized_cities = []
for city in cities:
capitalized_cities.append(city.title())
can be reduced to:
capitalized_cities = [city.title() for city in cities]
Conditionals in List Comprehensions
You can also add conditionals to list comprehensions (listcomps). After the iterable, you can use the if
keyword to check a condition in each iteration.
squares = [x**2 for x in range(9) if x % 2 == 0]
Examples:
Extract First Names
names = ["Rick Sanchez", "Morty Smith", "Summer Smith", "Jerry Smith", "Beth Smith"]first_names = [name.split()[0].lower() for name in names]
print(first_names)
>>> ['rick', 'morty', 'summer', 'jerry', 'beth']
Lesson 4: Functions
- Arguments, or parameters, are values that are passed in as inputs when the function is called, and are used in the function body. If a function doesn’t take arguments, these parentheses are left empty.
- Within this function body, we can refer to the argument variables and define new variables, which can only be used within these indented lines.
Function — Default Arguments
We can add default arguments in a function to have default values for parameters that are unspecified in a function call.
def cylinder_volume(height, radius=5):
pi = 3.14159
return height * pi * radius ** 2
It is possible to pass values in two ways — by position and by name. Each of these function calls are evaluated the same way.
cylinder_volume(10, 7) # pass in arguments by position
cylinder_volume(height=10, radius=7) # pass in arguments by name
Variable Scope
Variable scope refers to which parts of a program a variable can be referenced, or used, from.
If a variable is created inside a function, it can only be used within that function. Accessing it outside that function is not possible.
# This will result in an error
def some_function():
word = "hello"print(word)
In the example above and the example below, word
is said to have scope that is only local to each function. This means you can use the same name for different variables that are used in different functions.
# This works fine
def some_function():
word = "hello"def another_function():
word = "goodbye"
Variables defined outside functions, as in the example below, can still be accessed within a function. Here, word
is said to have a global scope.
# This works fine
word = "hello"def some_function():
print(word)some_function()
Notice that we can still access the value of the global variable word
within this function. However, the value of a global variable can not be modified inside the function.(UnboundLocalError) If you want to modify that variable's value inside this function, it should be passed in as an argument.
Documentation
Functions are especially readable because they often use documentation strings, or docstrings. Docstrings are a type of comment used to explain the purpose of a function, and how it should be used. Here’s a function for population density with a docstring.
def population_density(population, land_area):
"""Calculate the population density of an area. """
return population / land_area
Docstrings are surrounded by triple quotes. The first line of the docstring is a brief explanation of the function’s purpose. If you feel that this is sufficient documentation you can end the docstring at this point; single line docstrings are perfectly acceptable, as in the example above.
Lambda Expressions
You can use lambda expressions to create anonymous functions. That is, functions that don’t have a name. They are helpful for creating quick functions that aren’t needed later in your code. This can be especially useful for higher order functions, or functions that take in other functions as arguments.
With a lambda expression, this function:
def multiply(x, y):
return x * y
can be reduced to:
multiply = lambda x, y: x * y
Both of these functions are used in the same way. In either case, we can call multiply
like this:
multiply(4, 7)
This returns 28.
Iterators And Generators
Iterables are objects that can return one of their elements at a time, such as a list. Many of the built-in functions we’ve used so far, like ‘enumerate,’ return an iterator.
An iterator is an object that represents a stream of data. This is different from a list, which is also an iterable, but is not an iterator because it is not a stream of data.
Generators are a simple way to create iterators using functions. You can also define iterators using classes, which you can read more about here.
Here is an example of a generator function called my_range
, which produces an iterator that is a stream of numbers from 0 to (x - 1).
def my_range(x):
i = 0
while i < x:
yield i
i += 1
Notice that instead of using the return keyword, it uses yield
. This allows the function to return values one at a time, and start where it left off each time it’s called. This yield
keyword is what differentiates a generator from a typical function.
Remember, since this returns an iterator, we can convert it to a list or iterate through it in a loop to view its contents. For example, this code:
for x in my_range(5):
print(x)
outputs:
0
1
2
3
4
Why Generators?
You may be wondering why we’d use generators over lists. Here’s an excerpt from a stack overflow page that addresses this:
Generators are a lazy way to build iterables. They are useful when the fully realized list would not fit in memory, or when the cost to calculate each list element is high and you want to do it as late as possible. But they can only be iterated over once.