Udacity Data Scientist Nanodegree : Prerequisite — Python(L5, L6)

Try Statement

We can use try statements to handle exceptions. There are four clauses you can use (one more in addition to those shown in the video).

  • try: This is the only mandatory clause in a try statement. The code in this block is the first thing that Python runs in a try statement.
  • except: If Python runs into an exception while running the try block, it will jump to the except block that handles that exception.
  • else: If Python runs into no exceptions while running the try block, it will run the code in this block after running the try block.
  • finally: Before Python leaves this try statement, it will run the code in this finally block under any conditions, even if it's ending the program. E.g., if Python ran into an error while running code in the except or else block, this finally block will still be executed before stopping the program.

Specifying Exceptions

We can actually specify which error we want to handle in an except block like this:

try:
# some code
except ValueError:
# some code

Now, it catches the ValueError exception, but not other exceptions. If we want this handler to address more than one type of exception, we can include a parenthesized tuple after the except with the exceptions.

try:
# some code
except (ValueError, KeyboardInterrupt):
# some code

Or, if we want to execute different blocks of code depending on the exception, you can have multiple except blocks.

try:
# some code
except ValueError:
# some code
except KeyboardInterrupt:
# some code

Introduction to NumPy

NumPy stands for Numerical Python and it’s a fundamental package for scientific computing in Python. NumPy provides Python with an extensive math library capable of performing numerical computations effectively and efficiently.

Creating NumPy ndarrays

ndarray — nd stands for n-dimensional. An ndarray is a multidimensional array of elements all of the same type.

# We create a 1D ndarray that contains only integers
# it is important to remember that np.array() is NOT a class, it is just a function that returns an ndarray.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print('x = ', x)
>>> x = [1 2 3 4 5]

Rank of an Array (numpy.ndarray.ndim)

  • It returns the number of array dimensions.
# 1-D array
x = np.array([1, 2, 3])
x.ndim
>>> 1
# 2-D array
Y = np.array([[1,2,3],[4,5,6],[7,8,9], [10,11,12]])
Y.ndim
>>> 2

# The tuple (2, 3, 4) passed as an argument represents the shape of the ndarray
y = np.zeros((2, 3, 4))
y.ndim
>>> 3

numpy.ndarray.shape

  • It returns a tuple representing the array dimensions.

numpy.dtype

The type tells us the data-type of the elements. Remember, a NumPy array is homogeneous, meaning all elements will have the same data-type. In the example below, we will create a rank 1 array and learn how to obtain its shape, its type, and the data-type (dtype) of its elements.

Example 1

x = np.array([1, 2, 3, 4, 5])print('x = ', x)
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [1 2 3 4 5]

x has dimensions: (5,) — (5,) telling us that x is of rank 1 (i.e. x only has 1 dimension) and it has 5 elements
x is an object of type: class ‘numpy.ndarray’
The elements in x are of type: int64

Example 2

Y = np.array([[1,2,3],[4,5,6],[7,8,9], [10,11,12]])

print('Y = \n', Y)

# We print information about Y
print('Y has dimensions:', Y.shape)
print('Y has a total of', Y.size, 'elements')
print('Y is an object of type:', type(Y))
print('The elements in Y are of type:', Y.dtype)

Y =
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]

Y has dimensions: (4, 3)
Y has a total of 12 elements
Y is an object of type: class 'numpy.ndarray'
The elements in Y are of type: int64

Example 3 — Save the NumPy array to a File

# We create a rank 1 ndarray
x = np.array([1, 2, 3, 4, 5])
# We save x into the current directory as
np.save('my_array', x)

The above saves the x ndarray into a file named my_array.npy. You can load the saved ndarray into a variable by using the load() function.

# We load the saved array from our current directory into variable y
y = np.load('my_array.npy')
>>> y = [1 2 3 4 5]

When loading an array from a file, make sure you include the name of the file together with the extension .npy, otherwise you will get an error.

Using Built-in Functions to Create ndarrays

  • np.zeros(shapes) creates an ndarray full of zeros with the given shapes(row, column). The np.zeros() function creates by default an array with dtype float64. If desired, the data type can be changed by using the keyword dtype. np.ones() is the same but replacing with one.
  • np.full(shape, constant value) function takes two arguments. The first argument is the shape of the ndarray you want to make and the second is the constant value you want to populate the array with.
  • np.eye(N) creates a square N x N ndarray corresponding to the Identity matrix(單位矩陣). Since all Identity Matrices are square, the np.eye() function only takes a single integer as an argument.
  • The np.diag() function creates an ndarray corresponding to a diagonal matrix(對角矩陣,除了主對角線以外的元素皆為零)

numpy.arange

numpy.arange([start, ]stop, [step, ]dtype=None)
  • np.arange() function is very versatile and can be used with either one, two, or three arguments

When used with only one argument, np.arange(N) will create a rank 1 ndarray with consecutive integers between 0 and N - 1. Therefore, notice that if I want an array to have integers between 0 and 9, I have to use N = 10, NOT N = 9, as in the example below:

Example 4— np.arange(start,stop,step)

# We create a rank 1 ndarray that has sequential integers from 0 to 9
x = np.arange(10)
>>> x = [0 1 2 3 4 5 6 7 8 9]
# We create a rank 1 ndarray that has sequential integers from 4 to 9.
#
np.arange(start,stop)
x = np.arange(4,10)
>>> x = [4 5 6 7 8 9]
x = np.arange(1,14,3)
>>> x = [1 4 7 10 13]

The evenly spaced numbers will include start but exclude stop.

numpy.linspace

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
  • It returns num evenly spaced values calculated over the interval [start, stop].
  • Even though the np.arange() function allows for non-integer steps, such as 0.3, the output is usually inconsistent, due to the finite floating point precision. For this reason, in the cases where non-integer steps are required, it is usually better to use the function np.linspace().
  • The np.linspace(start, stop, N) function returns N evenly spaced numbers over the closed interval [start, stop]. This means that both the start and thestop values are included. We should also note the np.linspace() function needs to be called with at least two arguments in the form np.linspace(start,stop). In this case, the default number of elements in the specified interval will be N= 50.
  • The reason np.linspace() works better than the np.arange() function, is that np.linspace() uses the number of elements we want in a particular interval, instead of the step between values. Let's see some examples:

Example 5 — np.linspace(start, stop, n)

x = np.linspace(0,25,10)
>>> x = [ 0. 2.77777778 5.55555556 8.33333333 11.11111111 13.88888889 16.66666667 19.44444444 22.22222222 25. ]
# We create a rank 1 ndarray that has 10 integers evenly spaced between 0 and 25,
# with 25 excluded.
x = np.linspace(0,25,10, endpoint = False)
>>> x = [ 0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5]

As we can see from the above example, the function np.linspace(0,25,10) returns an ndarray with 10 evenly spaced numbers in the closed interval [0, 25]. We can also see that both the start and end points, 0 and 25 in this case, are included. However, you can let the endpoint of the interval be excluded by setting the keyword endpoint = False in the np.linspace() function.

numpy.reshape — This is a Function.

numpy.reshape(array, newshape, order='C')[source]
  • It gives a new shape to an array without changing its data.
  • So far, we have only used the built-in functions np.arange() and np.linspace() to create rank 1 ndarrays. However, we can use these functions to create rank 2 ndarrays of any shape by combining them with the np.reshape() function.
  • The np.reshape(ndarray, new_shape) function converts the given ndarray into the specified new_shape. It is important to note that the new_shape should be compatible with the number of elements in the given ndarray.
  • For example, you can convert a rank 1 ndarray with 6 elements, into a 3 x 2 rank 2 ndarray, or a 2 x 3 rank 2 ndarray, since both of these rank 2 arrays will have a total of 6 elements. However, you can't reshape the rank 1 ndarray with 6 elements into a 3 x 3 rank 2 ndarray, since this rank 2 array will have 9 elements, which is greater than the number of elements in the original ndarray. Let's see some examples:

Example 6 —reshape() function.

# We create a rank 1 ndarray with sequential integers from 0 to 19
x = np.arange(20)
>>> Original x = [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
# We reshape x into a 4 x 5 ndarray
x = np.reshape(x, (4,5))
>>>
Reshaped x =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

numpy.ndarray.reshape — This one is a Method.

ndarray.reshape(shape, order='C')
  • It returns an array containing the same data with a new shape.

Method vs. Function

A function is a piece of code that is called by name. It can be passed data to operate on (i.e. the parameters) and can optionally return data (the return value). All data that is passed to a function is explicitly passed.

A method is a piece of code that is called by a name that is associated with an object. In most respects it is identical to a function except for two key differences:

A method is implicitly passed the object on which it was called.

A method is able to operate on data that is contained within the class (remembering that an object is an instance of a class — the class is the definition, the object is an instance of that data).

  • One great feature about NumPy, is that some functions can also be applied as methods. This allows us to apply different functions in sequence in just one line of code.
  • ndarray methods are similar to ndarray attributes in that they are both applied using dot notation (.). Let's see how we can accomplish the same result as in the above example, but in just one line of code:

Example 7 — Create a Numpy array by calling the reshape() function from the output of arange() function.

# We create a a rank 1 ndarray with sequential integers from 0 to 19 and
# reshape it to a 4 x 5 array
Y = np.arange(20).reshape(4, 5)
>>> Y =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

As we can see, we get the exact same result as before. Notice that when we use reshape() as a method, it's applied as ndarray.reshape(new_shape). This converts the ndarray into the specified shape new_shape. As before, it is important to note that the new_shape should be compatible with the number of elements in ndarray. In the example above, the function np.arange(20) creates an ndarray and serves as the ndarray to be reshaped by the reshape() method. Therefore, when using reshape() as a method, we don't need to pass the ndarray as an argument to the reshape() function, instead we only need to pass the new_shape argument.

Let’s start by using the np.random.random(shape) function to create an ndarray of the given shape with random floats in the half-open interval [0.0, 1.0).

Example 8 — Create a Numpy array using the numpy.random.random() function.

# We create a 3 x 3 ndarray with random floats in the half-open interval [0.0, 1.0).
X = np.random.random((3,3))
>>> X =
[[ 0.12379926 0.52943854 0.3443525 ]
[ 0.11169547 0.82123909 0.52864397]
[ 0.58244133 0.21980803 0.69026858]]

NumPy also allows us to create ndarrays with random integers within a particular interval. The function np.random.randint(start, stop, size = shape) creates an ndarray of the given shape with random integers in the half-open interval [start, stop). Let's see an example:

Example 9 — Create a Numpy array using the numpy.random.randint() function.

# We create a 3 x 2 ndarray with random integers in the half-open interval [4, 15).
X = np.random.randint(4,15,size=(3,2))
>>> X =
[[ 7 11]
[ 9 11]
[ 6 7]]

In some cases, you may need to create ndarrays with random numbers that satisfy certain statistical properties. For example, you may want the random numbers in the ndarray to have an average of 0. NumPy allows you create random ndarrays with numbers drawn from various probability distributions. The function np.random.normal(mean, standard deviation, size=shape), for example, creates an ndarray with the given shape that contains random numbers picked from a normal (Gaussian) distribution with the given mean and standard deviation. Let's create a 1,000 x 1,000 ndarray of random floating point numbers drawn from a normal distribution with a mean (average) of zero and a standard deviation of 0.1.

Example 10 — Create a Numpy array of “Normal” distributed random numbers, using the numpy.random.normal() function.

# We create a 1000 x 1000 ndarray of random floats drawn from normal (Gaussian) distribution
# with a mean of zero and a standard deviation of 0.1.
X = np.random.normal(0, 0.1, size=(1000,1000))
# We print X
print()
print('X = \n', X)
print()
# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)
print('The elements in X have a mean of:', X.mean())
print('The maximum value in X is:', X.max())
print('The minimum value in X is:', X.min())
print('X has', (X < 0).sum(), 'negative numbers')
print('X has', (X > 0).sum(), 'positive numbers')

X =
[[ 0.04218614 0.03247225 -0.02936003 …, 0.01586796 -0.05599115 -0.03630946]
[ 0.13879995 -0.01583122 -0.16599967 …, 0.01859617 -0.08241612 0.09684025]
[ 0.14422252 -0.11635985 -0.04550231 …, -0.09748604 -0.09350044 0.02514799]
…,
[-0.10472516 -0.04643974 0.08856722 …, -0.02096011 -0.02946155 0.12930844]
[-0.26596955 0.0829783 0.11032549 …, -0.14492074 -0.00113646 -0.03566034]
[-0.12044482 0.20355356 0.13637195 …, 0.06047196 -0.04170031 -0.04957684]]

X has dimensions: (1000, 1000)
X is an object of type: class ‘numpy.ndarray’
The elements in X are of type: float64
The elements in X have a mean of: -0.000121576684405
The maximum value in X is: 0.476673923106
The minimum value in X is: -0.499114224706
X has 500562 negative numbers
X has 499438 positive numbers

As we can see, the average of the random numbers in the ndarray is close to zero, both the maximum and minimum values in X are symmetric about zero (the average), and we have about the same amount of positive and negative numbers.

Accessing, Deleting, and Inserting Elements Into ndarrays

  • NumPy ndarrays are mutable, meaning that the elements in ndarrays can be changed after the ndarray has been created. NumPy ndarrays can also be sliced.
  • Elements can be accessed using indices inside square brackets, [ ]. NumPy allows you to use both positive and negative indices to access elements in the ndarray.
  • We can also access and modify specific elements of rank 2 ndarrays. To access elements in rank 2 ndarrays we need to provide 2 indices in the form [row, column]. Let's see some examples.

Example 1 — Access individual elements of 2-D array

# We create a 3 x 3 rank 2 ndarray that contains integers from 1 to 9
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
# Let's access some elements in X
print('This is (0,0) Element in X:', X[0,0])
print('This is (2,2) Element in X:', X[2,2])

X =
[[1 2 3]
[4 5 6]
[7 8 9]]

This is (0,0) Element in X: 1
This is (2,2) Element in X: 9

np.delete(ndarray, elements, axis) function. This function deletes the given list of elements from the given ndarray along the specified axis. For rank 1 ndarrays the axis keyword is not required. For rank 2 ndarrays, axis = 0 is used to select rows, and axis = 1 is used to select columns. Let's see some examples:

Example 2 — Delete elements

x = np.array([1, 2, 3, 4, 5])
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])
# We delete the first and last element of x
x = np.delete(x, [0,4])
# We delete the first row of y
w = np.delete(Y, 0, axis=0)
# We delete the first and last column of y
v = np.delete(Y, [0,2], axis=1)

Original x = [1 2 3 4 5]

Modified x = [2 3 4]

Original Y =
[[1 2 3]
[4 5 6]
[7 8 9]]

w =
[[4 5 6]
[7 8 9]]

v =
[[2]
[5]
[8]]

numpy.append

numpy.append(array, values, axis=None)

It appends values to the end of an array.

Now, let’s see how we can append values to ndarrays. We can append values to ndarrays using the np.append(ndarray, elements, axis) function. This function appends the given list of elements to ndarray along the specified axis. Let's see some examples:

Example 3 — Append elements

x = np.array([1, 2, 3, 4, 5])
Y = np.array([[1,2,3],[4,5,6]])
# We append the integer 6 to x
x = np.append(x, 6)
# We append the integer 7 and 8 to x
x = np.append(x, [7,8])
# We append a new row containing 7,8,9 to y
v = np.append(Y, [[7,8,9]], axis=0)
# We append a new column containing 9 and 10 to y
q = np.append(Y,[[9],[10]], axis=1)

Original x = [1 2 3 4 5]

x = [1 2 3 4 5 6]

x = [1 2 3 4 5 6 7 8]

Original Y =
[[1 2 3]
[4 5 6]]

v =
[[1 2 3]
[4 5 6]
[7 8 9]]

q =
[[ 1 2 3 9]
[ 4 5 6 10]]

np.insert(ndarray, index, elements, axis)

This function inserts the given list of elements to ndarray right before the given index along the specified axis. Let's see some examples:

Example 4— Insert elements

x = np.array([1, 2, 5, 6, 7])
Y = np.array([[1,2,3],[7,8,9]])
# We insert the integer 3 and 4 between 2 and 5 in x.
x = np.insert(x,2,[3,4])
# We insert a row between the first and last row of y
w = np.insert(Y,1,[4,5,6],axis=0)
# We insert a column full of 5s between the first and second column of y
v = np.insert(Y,1,5, axis=1)

Original x = [1 2 5 6 7]

x = [1 2 3 4 5 6 7]

Original Y =
[[1 2 3]
[7 8 9]]

w =
[[1 2 3]
[4 5 6]
[7 8 9]]

v =
[[1 5 2 3]
[7 5 8 9]]

numpy.hstack and numpy.vstack

numpy.hstack(sequence_of_ndarray)

It returns a stacked array formed by stacking the given arrays in sequence horizontally (column-wise).

numpy.vstack(sequence_of_ndarray)

It returns a stacked array formed by stacking the given arrays, will be at least 2-D, in sequence vertically (row-wise).

NumPy also allows us to stack ndarrays on top of each other, or to stack them side by side. The stacking is done using either the np.vstack() function for vertical stacking, or the np.hstack() function for horizontal stacking. It is important to note that in order to stack ndarrays, the shape of the ndarrays must match. Let's see some examples:

Example 5 — Stack arrays

x = np.array([1,2])
Y = np.array([[3,4],[5,6]])
# We stack x on top of Y
z = np.vstack((x,Y))
# We stack x on the right of Y. We need to reshape x in order to stack it on the right of Y.
w = np.hstack((Y,x.reshape(2,1)))

x = [1 2]

Y =
[[3 4]
[5 6]]

z =
[[1 2]
[3 4]
[5 6]]

w =
[[3 4 1]
[5 6 2]]

Slicing ndarrays

NumPy provides a way to access subsets of ndarrays. This is known as slicing. Slicing is performed by combining indices with the colon : symbol inside the square brackets. 1. ndarray[start:end]

Example 1. Slicing in a 2-D ndarray

# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# (row: column), row 橫的,column 直的
W = X[1:,2:5] # 1:last index
Y = X[:3,2:5]
v = X[2,:]
q = X[:,2]
R = X[:,2:3]

X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

W =
[[ 7 8 9]
[12 13 14]
[17 18 19]]

Y =
[[ 2 3 4]
[ 7 8 9]
[12 13 14]]

v = [10 11 12 13 14]

q = [ 2 7 12 17]

R =
[[ 2]
[ 7]
[12]
[17]]

It is important to note that when we perform slices on ndarrays and save them into new variables, as we did above, the data is not copied into the new variable. This is one feature that often causes confusion for beginners. Therefore, we will look at this in a bit more detail.

In the above examples, when we make assignments, such as:

Z = X[1:4,2:5]

the slice of the original array X is not copied in the variable Z. Rather, X and Z are now just two different names for the same ndarray.(i.e. If you make any changes in Z, you’ll also be changing the elements in X.) We say that slicing only creates a view of the original array. This means that if you make changes in Z you will be in effect changing the elements in X as well.

numpy.ndarray.copy

ndarray.copy(order='C')

It returns a copy of the array.

However, if we want to create a new ndarray that contains a copy of the values in the slice we need to use the np.copy() function. The np.copy(ndarray) function creates a copy of the given ndarray. This function can also be used as a method.

Example 2a — Use an array as indices to either make slices, select, or change elements

# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We create a rank 1 ndarray that will serve as indices to select elements from X
indices = np.array([1,3])
# We use the indices ndarray to select the 2nd and 4th row of X
Y = X[indices,:]
# We use the indices ndarray to select the 2nd and 4th column of X
Z = X[:, indices]

X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

indices = [1 3]

Y =
[[ 5 6 7 8 9]
[15 16 17 18 19]]

Z =
[[ 1 3]
[ 6 8]
[11 13]
[16 18]]

Example 2b — Use an array as indices to extract specific rows from a rank 2 ndarray.

X = np.random.randint(1,20, size=(50,5))
>>> Shape of X is: (50, 5)
# Create a rank 1 ndarray that contains a randomly chosen 10 values between '0' to 'len(X)' (50)
# The row_indices would represent the indices of rows of X
row_indices = np.random.randint(0,50, size=10)
>>> Random 10 indices are: [1 38 31 45 44 21 6 24 19 33]

numpy.diag

numpy.diag(array, k=0)

It extracts or constructs the diagonal elements.

NumPy also offers built-in functions to select specific elements within ndarrays. For example, the np.diag(ndarray, k=N) function extracts the elements along the diagonal defined by N. As default is k=0, which refers to the main diagonal. Values of k > 0 are used to select elements in diagonals above the main diagonal, and values of k < 0 are used to select elements in diagonals below the main diagonal. Let's see an example:

Example 5. Demonstrate the diag() function

# We create a 4 x 5 ndarray that contains integers from 0 to 24
X = np.arange(25).reshape(5, 5)
# We print the elements in the main diagonal of X
print('z =', np.diag(X)) # default k=0
# We print the elements above the main diagonal of X
print('y =', np.diag(X, k=1))
# We print the elements below the main diagonal of X
print('w = ', np.diag(X, k=-1))

X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]

z = [ 0 6 12 18 24]

y = [ 1 7 13 19]

w = [ 5 11 17 23]

numpy.unique

numpy.unique(array, return_index=False, return_inverse=False, return_counts=False, axis=None)

  • It returns the sorted unique elements of an array.

It is often useful to extract only the unique elements in an ndarray. We can find the unique elements in an ndarray by using the np.unique() function. The np.unique(ndarray) function returns the unique elements in the given ndarray, as in the example below:

Example 6. Demonstrate the unique() function

# Create 3 x 3 ndarray with repeated values
X = np.array([[1,2,3],[5,2,8],[1,2,3]])
# We print the unique elements of X
print('The unique elements in X are:',np.unique(X))

X =
[[1 2 3]
[5 2 8]
[1 2 3]]

The unique elements in X are: [1 2 3 5 8]

Boolean Indexing, Set Operations, and Sorting

For example, suppose we have a 10,000 x 10,000 ndarray of random integers ranging from 1 to 15,000 and we only want to select those integers that are less than 20. Boolean indexing can help us in these cases, by allowing us select elements using logical arguments instead of explicit indices. Let’s see some examples:

Example 1. Boolean indexing

# We create a 5 x 5 ndarray that contains integers from 0 to 24
X = np.arange(25).reshape(5, 5)
# We use Boolean indexing to select elements in X:
print('The elements in X that are greater than 10:', X[X > 10])
print('The elements in X that less than or equal to 7:', X[X <= 7])
print('The elements in X that are between 10 and 17:', X[(X > 10) & (X < 17)])
# We use Boolean indexing to assign the elements that are between 10 and 17 the value of -1
X[(X > 10) & (X < 17)] = -1

Original X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]

The elements in X that are greater than 10: [11 12 13 14 15 16 17 18 19 20 21 22 23 24]
The elements in X that less than or equal to 7: [0 1 2 3 4 5 6 7]
The elements in X that are between 10 and 17: [11 12 13 14 15 16]

X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 -1 -1 -1 -1]
[-1 -1 17 18 19]
[20 21 22 23 24]]

Example 2. Set operations

x = np.array([1,2,3,4,5])
y = np.array([6,7,2,8,4])
# We use set operations to compare x and y:
print('The elements that are both in x and y:', np.intersect1d(x,y))
print('The elements that are in x that are not in y:', np.setdiff1d(x,y))
print('All the elements of x and y:',np.union1d(x,y))

x = [1 2 3 4 5]

y = [6 7 2 8 4]

The elements that are both in x and y: [2 4]
The elements that are in x that are not in y: [1 3 5]
All the elements of x and y: [1 2 3 4 5 6 7 8]

numpy.ndarray.sort method

ndarray.sort(axis=-1, kind=None, order=None)
  • The method above sorts an array in-place.

Like with other functions we saw before, the sort can be used as a method as well as a function. The difference lies in how the data is stored in memory in this case.

  • When numpy.sort() is used as a function, it sorts the ndrrays out of place, meaning, that it doesn't change the original ndarray being sorted.
  • On the other hand, when you use numpy.ndarray.sort() as a method, ndarray.sort() sorts the ndarray in place, meaning, that the original array will be changed to the sorted one.

Example 3. Sort arrays using sort() function

x = np.random.randint(1,11,size=(10,))# We sort x and print the sorted array using sort as a function.
print('Sorted x (out of place):', np.sort(x))

Original x = [9 6 4 4 9 4 8 4 4 7]

Sorted x (out of place): [4 4 4 4 4 6 7 8 9 9]

x after sorting: [9 6 4 4 9 4 8 4 4 7]

Notice that np.sort() sorts the array but, if the ndarray being sorted has repeated values, np.sort() leaves those values in the sorted array. However, if desired, we can use the unique() function. Let's see how we can sort the unique elements of x above:

# Returns the sorted unique elements of an array
print(np.unique(x))

[4 6 7 8 9]

Example 4. Sort rank-1 arrays using sort() method

# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))
# We sort x and print the sorted array using sort as a method.
x.sort()
# When we sort in place the original array is changed to the sorted array. To see this we print x again
print()
print('x after sorting:', x)

Original x = [9 9 8 1 1 4 3 7 2 8]

x after sorting: [1 1 2 3 4 7 8 8 9 9]

numpy.sort function

numpy.sort(array, axis=-1, kind=None, order=None)

It returns a sorted copy of an array. The axis denotes the axis along which to sort. It can take values in the range -1 to (ndim-1). Axis can take the following possible values for a given 2-D ndarray:

  • If nothing is specified, the default value is axis = -1, which sorts along the last axis. In the case of a given 2-D ndarray, the last axis value is 1.
  • If explicitly axis = None is specified, the array is flattened before sorting. It will return a 1-D array.
  • If axis = 0 is specified for a given 2-D array - For one column at a time, the function will sort all rows, without disturbing other elements. In the final output, you will see that each column has been sorted individually.
  • The output of axis = 1 for a given 2-D array is vice-versa for axis = 0. In the final output, you will see that each row has been sorted individually.

Tip: As mentioned in this discussion, you can read axis = 0 as "down" and axis = 1 as "across" the given 2-D array, to have a correct usage of axis in your methods/functions.

When sorting rank 2 ndarrays, we need to specify to the np.sort() function whether we are sorting by rows or columns. This is done by using the axis keyword. Let's see some examples:

Example 5. Sort rank-2 arrays by specific axis.

# We create an unsorted rank 2 ndarray
X = np.random.randint(1,11,size=(5,5))
# We sort the columns of X and print the sorted array
print('X with sorted columns :\n', np.sort(X, axis = 0))
# We sort the rows of X and print the sorted array
print('X with sorted rows :\n', np.sort(X, axis = 1))

Original X =
[[6 1 7 6 3]
[3 9 8 3 5]
[6 5 8 9 3]
[2 1 5 7 7]
[9 8 1 9 8]]

X with sorted columns :
[[2 1 1 3 3]
[3 1 5 6 3]
[6 5 7 7 5]
[6 8 8 9 7]
[9 9 8 9 8]]

X with sorted rows :
[[1 3 6 6 7]
[3 3 5 8 9]
[3 5 6 8 9]
[1 2 5 7 7]
[1 8 8 9 9]]

Arithmetic operations and Broadcasting

In order to do element-wise operations, NumPy sometimes uses something called Broadcasting. Broadcasting is the term used to describe how NumPy handles element-wise arithmetic operations with ndarrays of different shapes. For example, broadcasting is used implicitly when doing arithmetic operations between scalars and ndarrays.

It is important to note that when performing element-wise operations, the shapes of the ndarrays being operated on, must have the same shape or be broadcastable. We'll explain more about this later in this lesson. Let's start by performing element-wise arithmetic operations on rank 1 ndarrays:

Example 1. Element-wise arithmetic operations on 1-D arrays

x = np.array([1,2,3,4])
y = np.array([5.5,6.5,7.5,8.5])
# We perfrom basic element-wise operations using arithmetic symbols and functions
print('x + y = ', x + y)
print('add(x,y) = ', np.add(x,y))
print('x - y = ', x - y)
print('subtract(x,y) = ', np.subtract(x,y))
print('x * y = ', x * y)
print('multiply(x,y) = ', np.multiply(x,y))
print('x / y = ', x / y)
print('divide(x,y) = ', np.divide(x,y))

x = [1 2 3 4]

y = [ 5.5 6.5 7.5 8.5]

x + y = [ 6.5 8.5 10.5 12.5]
add(x,y) = [ 6.5 8.5 10.5 12.5]

x — y = [-4.5 -4.5 -4.5 -4.5]
subtract(x,y) = [-4.5 -4.5 -4.5 -4.5]

x * y = [ 5.5 13. 22.5 34. ]
multiply(x,y) = [ 5.5 13. 22.5 34. ]

x / y = [ 0.18181818 0.30769231 0.4 0.47058824]
divide(x,y) = [ 0.18181818 0.30769231 0.4 0.47058824]

We can also perform the same element-wise arithmetic operations on rank 2 ndarrays. Again, remember that in order to do these operations the shapes of the ndarrays being operated on, must have the same shape or be broadcastable.

Example 2. Element-wise arithmetic operations on a 2-D array (Same shape)

X = np.array([1,2,3,4]).reshape(2,2)
Y = np.array([5.5,6.5,7.5,8.5]).reshape(2,2)
# We perform basic element-wise operations using arithmetic symbols and functions
print('X + Y = \n', X + Y)
print('add(X,Y) = \n', np.add(X,Y))

print('X - Y = \n', X - Y)
print('subtract(X,Y) = \n', np.subtract(X,Y))

print('X * Y = \n', X * Y)
print('multiply(X,Y) = \n', np.multiply(X,Y))

print('X / Y = \n', X / Y)
print('divide(X,Y) = \n', np.divide(X,Y))

X =
[[1 2]
[3 4]]

Y =
[[ 5.5 6.5]
[ 7.5 8.5]]

X + Y =
[[ 6.5 8.5]
[ 10.5 12.5]]

add(X,Y) =
[[ 6.5 8.5]
[ 10.5 12.5]]

X — Y =
[[-4.5 -4.5]
[-4.5 -4.5]]

subtract(X,Y) =
[[-4.5 -4.5]
[-4.5 -4.5]]

X * Y =
[[ 5.5 13. ]
[ 22.5 34. ]]

multiply(X,Y) =
[[ 5.5 13. ]
[ 22.5 34. ]]

X / Y =
[[ 0.18181818 0.30769231]
[ 0.4 0.47058824]]

divide(X,Y) =
[[ 0.18181818 0.30769231]
[ 0.4 0.47058824]]

We can also apply mathematical functions, such as sqrt(x), to all elements of an ndarray at once.

Example 3. Additional mathematical functions

x = np.array([1,2,3,4])# We apply different mathematical functions to all elements of x
print('EXP(x) =', np.exp(x))
print('SQRT(x) =',np.sqrt(x))
print('POW(x,2) =',np.power(x,2)) # We raise all elements to the power of 2

x = [1 2 3 4]

EXP(x) = [ 2.71828183 7.3890561 20.08553692 54.59815003]

SQRT(x) = [ 1. 1.41421356 1.73205081 2. ]

POW(x,2) = [ 1 4 9 16]

Note — Most of the statistical operations can be done using either a function or an equivalent method. For example, both numpy.mean function and numpy.ndarray.mean method will return the arithmetic mean of the array elements along the given axis.

Example 4. Statistical functions

X = np.array([[1,2], [3,4]])print('Average of all elements in X:', X.mean())
print('Average of all elements in the columns of X:', X.mean(axis=0))
print('Average of all elements in the rows of X:', X.mean(axis=1))
print('Sum of all elements in X:', X.sum())
print('Sum of all elements in the columns of X:', X.sum(axis=0))
print('Sum of all elements in the rows of X:', X.sum(axis=1))
print('Standard Deviation of all elements in X:', X.std())
print('Standard Deviation of all elements in the columns of X:', X.std(axis=0))
print('Standard Deviation of all elements in the rows of X:', X.std(axis=1))
print('Median of all elements in X:', np.median(X))
print('Median of all elements in the columns of X:', np.median(X,axis=0))
print('Median of all elements in the rows of X:', np.median(X,axis=1))
print('Maximum value of all elements in X:', X.max())
print('Maximum value of all elements in the columns of X:', X.max(axis=0))
print('Maximum value of all elements in the rows of X:', X.max(axis=1))
print('Minimum value of all elements in X:', X.min())
print('Minimum value of all elements in the columns of X:', X.min(axis=0))
print('Minimum value of all elements in the rows of X:', X.min(axis=1))

X =
[[1 2]
[3 4]]

Average of all elements in X: 2.5
Average of all elements in the columns of X: [ 2. 3.]
Average of all elements in the rows of X: [ 1.5 3.5]

Sum of all elements in X: 10
Sum of all elements in the columns of X: [4 6]
Sum of all elements in the rows of X: [3 7]

Standard Deviation of all elements in X: 1.11803398875
Standard Deviation of all elements in the columns of X: [ 1. 1.]
Standard Deviation of all elements in the rows of X: [ 0.5 0.5]

Median of all elements in X: 2.5
Median of all elements in the columns of X: [ 2. 3.]
Median of all elements in the rows of X: [ 1.5 3.5]

Maximum value of all elements in X: 4
Maximum value of all elements in the columns of X: [3 4]
Maximum value of all elements in the rows of X: [2 4]

Minimum value of all elements in X: 1
Minimum value of all elements in the columns of X: [1 2]
Minimum value of all elements in the rows of X: [1 3]

Example 5. Change value of all elements of an array

X = np.array([[1,2], [3,4]])print('3 * X = \n', 3 * X)
print()
print('3 + X = \n', 3 + X)
print()
print('X - 3 = \n', X - 3)
print()
print('X / 3 = \n', X / 3)

X =
[[1 2]
[3 4]]

3 * X =
[[ 3 6]
[ 9 12]]

3 + X =
[[4 5]
[6 7]]

X — 3 =
[[-2 -1]
[ 0 1]]

X / 3 =
[[ 0.33333333 0.66666667]
[ 1. 1.33333333]]

In the examples above, NumPy is working behind the scenes to broadcast 3 along the ndarray so that they have the same shape. This allows us to add 3 to each element of X with just one line of code.

Subject to certain constraints, Numpy can do the same for two ndarrays of different shapes, as we can see below.

Example 6. Arithmetic operations on 2-D arrays (Compatible shape)

x = np.array([1,2,3])
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])
Z = np.array([1,2,3]).reshape(3,1)
print('x + Y = \n', x + Y)
print()
print('Z + Y = \n',Z + Y)

x = [1 2 3]

Y =
[[1 2 3]
[4 5 6]
[7 8 9]]

Z =
[[1]
[2]
[3]]

x + Y =
[[ 2 4 6]
[ 5 7 9]
[ 8 10 12]]

Z + Y =
[[ 2 3 4]
[ 6 7 8]
[10 11 12]]

As before, NumPy is able to add 1 x 3 and 3 x 1 ndarrays to 3 x 3 ndarrays by broadcasting the smaller ndarrays along the big ndarray so that they have compatible shapes. In general, NumPy can do this provided that the smaller ndarray, such as the 1 x 3 ndarray in our example, can be expanded to the shape of the larger ndarray in such a way that the resulting broadcast is unambiguous.

Summary

--

--

--

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joe Chao

Joe Chao

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。

More from Medium

Understand the basics of the Sigmoid function

ML algorithm on DNA sequences of Disease

Sklearn from Beggining(Part2) Splitting the dataset into train and test data set to evaluate our…

What do we mean by Big Data?