Skip to main content

Numpy for Data Analysis: the Ultimate Guide

NumPy

This article is still W.I.P.

NumPy is a foundational package for scientific computing, making it super important to have an thorough understanding of this powerful python library.

It provides foundational tools for mathematical, scientific, engineering, and data science programming within the Python ecosystem.

NumPy proves valuable because:

  • It is a linear algebra library.
  • It is powerful and remarkable speed.
  • It integrates C/C++ and Fortran code. In this atrticle, we shall delve into the fundamental principles of NumPy, which we will frequently use throughout this page.

Let's commence with NumPy arrays, together with some significant built-in methods and attributes associated with these arrays.

The primary object in NumPy is a homogeneous multidimensional array. It acts as a foundational component for most of the PyData ecosystem libraries. After installing NumPy, we must import it. Let's import NumPy and verify its version in the code provided below:

# importing NumPy
import numpy as np

# check numpy's version
print( np.__version__ )

Output:

1.14.0

NumPy arrays

In this article, we will extensively look into NumPy arrays. These arrays primarily come in two forms:

  1. Vectors: Vectors are strictly one-dimensional arrays.
  2. Matrices: Matrices are two-dimensional arrays.

Creating NumPy arrays from Python data types, such as lists and tuples, is straightforward. Let's begin by creating a Python list and subsequently using it to create a NumPy array.

NumPy arrays from Python list

# Creating a Python list "n_list".
n_list = [-1,0,1]

# Let's confirm items in "n_list" and its type! Shall we?
print(n_list, type(n_list))

Output:

[-1, 0, 1] <class 'list'>

To create a NumPy array from a Python data structure, such as a list, we utilise the array function provided by NumPy.

This array function is accessible using the dot operator (.) on np, an alias for the NumPy library. To achieve this, we type np.array. To create the NumPy array, we pass our Python data structure, n_list, as an argument to the array function.

# NumPy array from a Python list

import numpy as np

# Creating a Python list "n_list".
n_list = [-1,0,1]

# Creating a NumPy array from n_list
n_array = np.array(n_list)

print( n_array, type(n_array) )

Output:

[-1  0  1] <class 'numpy.ndarray'>

In the code above, n_array is our NumPy array. However, if we wish to create a two-dimensional array (or matrix), we can create a list of lists and then pass that nested list to the array function.

# NumPy two-dimensional array example

import numpy as np

# Create and cast a list of list to generate 2-D array
n_matrix = [[1,2,3],[4,5,6],[7,8,9]]
print(n_matrix)

# Two-dimensional array from Matrix
matrix_2d = np.array(n_matrix)
print(matrix_2d)

Output:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Initializing NumPy arrays from a Python tuple

The following code illustrates how we can initialise a NumPy array from a Python tuple:

# NumPy array from tuple

import numpy as np

# Here we are using Tuple instead of a list.
n_tuple = (-1,0,1)
n_array = np.array(n_tuple)
print( n_array, type(n_array) )

Output:

[-1  0  1] <class 'numpy.ndarray'>

Creating arrays using NumPy’s built-in methods

It is standard to utilise NumPy's built-in methods to create arrays, as they are more straightforward and faster. Let's explore how effortless it is!

The arange() method

The arange() method strongly resembles the Python function range(). It yields evenly spaced values within a specified interval.

The syntax of the arange() method is as follows:

arange([start,] stop[, step,], dtype=None)

Let’s see it in action in the code below:

# arange() method from NumPy

import numpy as np

# similar to range() in Python, up to but not including 10
print(np.arange(0,10))

# We can give a step (2 in this case)
print(np.arange(0,11,2))

# We can give the step and dtype
print(np.arange(0,10,2, dtype=float))

Output:

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10]
[0. 2. 4. 6. 8.]

The linspace() method

The linspace() method returns evenly spaced numbers over a specified interval.

# linespace from NumPy

import numpy as np

# start from 1 & end at 15 with 10 evenly spaced points between 1 to 15.
print( np.linspace(1, 15, 15))

# Find the step size with "retstep" which returns the array and the step size
n_linspace = np.linspace(5, 15, 9, retstep=True)
print( n_linspace )

# We can grab array and step size separately.
print('array is: ',n_linspace[0])
print('stepsize is: ', n_linspace[1])

Output:

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15.]

(array([ 5.  ,  6.25,  7.5 ,  8.75, 10.  , 11.25, 12.5 , 13.75, 15.  ]), 1.25)

array is:  [ 5.    6.25  7.5   8.75 10.   11.25 12.5  13.75 15.  ]
stepsize is:  1.25

Let's create another 1-dimensional (1-D) array using linspace() with 30 evenly spaced numbers between 0 and 15.

# 1-D array using linspace

import numpy as np

# 1-D array
print(np.linspace(0,15,30))

Output:

[ 0.  0.51724138  1.03448276  1.55172414  2.06896552  2.5862069
  3.10344828  3.62068966  4.13793103  4.65517241  5.17241379  5.68965517
  6.20689655  6.72413793  7.24137931  7.75862069  8.27586207  8.79310345
  9.31034483  9.82758621 10.34482759 10.86206897 11.37931034 11.89655172
 12.4137931  12.93103448 13.44827586 13.96551724 14.48275862 15.        ]

Did you notice the differences between arange() and linespace() methods?

If you pay attention to the above code, then you have seen the following distinctions between the arange() and linespace() approaches:

  • The arange() method accepts the third argument as the step size.

  • The linspace() method accepts the third argument as the number of points we desire.

  • In the arange() method, the second argument is not included in the array, whereas the second argument is included in the case of linspace().

The zeros() method

We can use the arange() method to create an array with all zeros.

# zeros from NumPy

import numpy as np

# 1-D array with 3 elements (zeros)
print(np.zeros(3))

# Creating 2-D array, (no_row, no_col) passing a tuple
print(np.zeros((4,6)))

Output:

[0. 0. 0.]
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]

The ones() method

We can create an array with all ones using the ones() method.

# ones from NumPy

import numpy as np

# 1-D array with 3 elements (ones)
print( np.ones(3) )

#Creating 2-D array, (no_row, no_col) passing a tuple
print( np.ones((4,6)) )

Output:

[1. 1. 1.]
[[1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]]

The eye() method

This built-in function generates an identity matrix (must be a square matrix), which proves beneficial in various linear algebra problems. The eye() method yields a 2-D array with ones on the diagonals and zeros everywhere else.

# eye from NumPy

import numpy as np

print(np.eye(5))

Output:

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

We can also generate arrays of random numbers using NumPy's built-in functions in the Random module.

The rand() method

This function generates an array of the specified shape and fills it with random samples from a uniform distribution over [0, 1].

# rand method from NumPy

import numpy as np

# 1-D array with three elements
print (np.random.rand(3))

# row, col, Note: we are not passing a tuple here
# each dimension (num_of_rows, num_of_columns) is a separate argument
print(np.random.rand(3,2))

Output:

[0.48957475 0.07449757 0.81513258]

[[0.75718432 0.06644407]
 [0.69464871 0.46689534]
 [0.05741213 0.14085858]]

The randn() method

This function provides a sample (or samples) drawn from the standard normal or Gaussian distribution. It differs from the rand method, which generates values from a uniform distribution.

# randn from NumPy

import numpy as np

# 1-D array of two elements.
print( np.random.randn(2) )

# 2-D array (4x4), 16 elements.
# no tuple, each dimension as a separate argument
print( np.random.randn(4,4) )

Output:

[-1.66526215 -0.0911361 ]
[[ 0.16846152  1.24966075 -0.15827118  0.17069098]
 [ 1.27932729 -1.25886287  1.00681978  0.34769069]
 [ 0.55019237 -0.3963704  -0.34419026 -1.87290798]
 [ 1.42104882  1.77977939  0.00658841  0.21983459]]

The randint() method

When dealing with integer values exclusively, we can utilise randint() to produce random integers within a specified lower bound (inclusive) and higher bound (exclusive).

Let's give it a try.

# randint from NumPy

import numpy as np

#returns one random int, 1 inclusive, 100 exclusive
print( np.random.randint(1,100) )

#returns ten random int,
print( np.random.randint(1,100,10) )

Output:

25
[77 26 12 89 97 59 70 74 40 35]

Numpy array methods and attributes

Let's dive deep into important methods and attributes commonly used in NumPy arrays.

Array methods

The following are some methods that we can employ with NumPy arrays:

  1. shape: This method returns the dimensions of the array as a tuple, indicating the size along each axis.

  2. reshape: The reshape() method allows us to modify the shape of the array, such as converting a one-dimensional array into a two-dimensional one.

  3. size: The size() method returns the total number of elements in the array.

  4. max: This method returns the maximum value in the array.

  5. min: The min() method returns the minimum value in the array.

  6. argmax: This function returns the index of the maximum value in the array.

  7. argmin: The argmin() function returns the index of the minimum value in the array.

  8. sum: The sum() method calculates the sum of all elements in the array.

  9. mean: This function calculates the arithmetic mean of the array's elements.

  10. std: The std() method computes the standard deviation of the array's elements.

  11. transpose: The transpose() method allows us to transpose the array, swapping rows and columns.

  12. dot: This method computes the dot product of two arrays.

  13. concatenate: The concatenate() function combines two or more arrays along a specified axis.

These are just a few of the many methods available in NumPy that make it a powerful tool for scientific computing and data manipulation in Python.

We will discuss the following 5 important methods below:

reshape()
max()
min()
argmax()
argmin()

Let's create two arrays using arange() and randint() methods for hands-on training.

# NumPy arrays

import numpy as np
array_arange = np.arange(16) # using arange()
array_randint = np.random.randint(0,100,10) # using randint()

print('array using arange:', array_arange)
print('array using randint:', array_randint)

Output:

array using arange: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
array using randint: [90 95 94 78 12  1 41 54 76 58]

The reshape() method

This method returns an array containing the same data passed but with a new shape.

# Reshape of array from NumPy

# any other num will give error, because we have 16
# elements in array_arange
print(array_arange.reshape(4,4))

Output:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

Post Tags: