MANUEL RADOVANOVIĆ : NumPy For Machine Learning, An Introduction to the Python Library for Manipulating Arrays and Matrices

NumPy is a Python library providing support for efficient operations on multi-dimensional arrays and matrices. It’s a cornerstone tool for scientific computing in Python and is widely used in fields like data analysis, machine learning, signal processing, visualization, and many others. Its popularity has surged alongside the rapid advancements in AI. Originally, the library was an extension to Python, first worked on by software engineer Jim Hugunin, who left Microsoft to join Google. However, the NumPy we know today is largely the work of Travis Oliphant, often considered the primary creator of NumPy, founder of Anaconda, and the SciPy package in Python.

This open-source library became significant primarily because it addressed the slowness of Python interpretation. NumPy solves this by providing multi-dimensional arrays, functions, and operators that work efficiently with these arrays. When using NumPy, you write code with fewer inner loops. Thus, any algorithm expressible as operations on arrays and matrices can run nearly as fast as equivalent C code.

NumPy's usage and functionality are often compared to the MATLAB environment, as both interpret code and allow users to quickly write computations, with most operations performed on arrays and matrices rather than scalar values. Compared to MATLAB, which originated in 1970, NumPy is integrated into Python, a modern, complete, and natively compiled programming language. However, both languages rely on BLAS and LAPACK for efficient linear algebraic computations.

The moment a programmer discovers the true power of the NumPy library

A central element in NumPy is the ndarray - an n-dimensional array, representing a multi-dimensional homogeneous array of elements of the same data type. NumPy provides efficient operations on these arrays, including mathematical, logical, statistical, and linear algebraic operations. Additionally, NumPy has a large number of built-in functions for working with arrays and the ability to easily read and write data in various formats. While this might sound complex in theory, using the NumPy library is straightforward in practice, despite being crucial for numerical computing in Python. Python was not initially designed for numerical computing but has attracted the attention of the scientific and engineering community.

As a result, a special interest group called Matrix-SIG was founded in 1995 with the goal of defining a set of computational packages for numerical computing. Thanks to NumPy, Python has become a powerful language for numerical computing, data analysis, machine learning, and other areas of scientific research. Regarding NumPy's limitations, it is designed for homogeneous data, requires arrays to be pre-defined in size, array operations require additional memory, lacks out-of-the-box parallelization, and has limited support for non-numerical operations. Despite these limitations, NumPy still provides exceptional value and efficiency in numerical computing. Many of these limitations can be overcome by using other libraries or customizing the code to specific needs.

NumPy in Action: A Practical Guide for Beginners

Start Jupyter Notebook from Anaconda Navigator. Create a new notebook named lesson2.ipynb in the AI_tutorial folder. Change the first cell to Markdown and add this title:

The Simple Basic Python NumPy Tutorial

To use the NumPy library, we first need to import it. Let's give it a shorter name, np, to make it easier to type.

# import numpy and make the alias
import numpy as np

Type a plain NumPy array.

np.array([1,2,3])

Then press CTRL + Enter. If you get the following result.

array([1, 2, 3])

That means everything is working great. If not, note the error; you may not have NumPy installed. If this is a problem, you can install NumPy either through the terminal or directly through the Jupyter cell.

! pip install numpy

If everything is fine, you can start the tutorial. First, create a simple array and display it through a variable.

# simple array
a = np.array([[1,2,3],[4,5,6]])
print(a)

The Result:
[[1 2 3]
[4 5 6]]

Uses of NumPy - Python library for numerical and scientific computing

While an array can be one-dimensional or multidimensional, a matrix is strictly two-dimensional, composed of rows and columns. Create a simple matrix.

# simple matrix
b = np.mat([[1,2,3],[4,5,6],[7,8,9]])
print(b)

The Result:
[[1 2 3]
[4 5 6]
[7 8 9]]

How can you see how many dimensions a NumPy array has? Type the following.

# get dimensions
a.ndim

The Result:
2

Sometimes you won't know what shape a NumPy array is. There is a function for that solution as well.

# get shape
a.shape

The Result:
(2, 3)

By default, when you use integers in NumPy arrays, they are of data type int32.

# get type
a.dtype

The Result:
dtype('int32')

However, you can change the data type of the NumPy array as you see fit.

# specify type int32 into int16
c = np.array([[1,2,3],[4,5,6]], dtype='int16')
c.dtype

The Result:
dtype('int16')

Now see how to see the size in bytes of just one element in a NumPy array.

# get length of one array element in bytes
a.itemsize

The Result:
4

If you look at the array a that we created, you will see that the array has 6 elements. How can you check it?

# get number of elements
a.size

The Result:
6

How can you determine the total number of bytes occupied by all elements in a NumPy array? Check out the following function.

# get total bytes consumed by the elements of the array
a.nbytes

The Result:
24

Let's go back to basics. How can you access the value of a specific element in a NumPy array? Try something like this.

# get a specific element
a[1,1]

The Result:
5

By utilizing negation, can we arrive at the same conclusion? Absolutely, provided we focus on the second array and its second element from the right.

# get a specific element using negative number
a[1,-2]

The Result:
5

What if you only want to see a specific row in a NumPy array? Try doing it this way.

# get a specific row
a[0, :]

The Result:
array([1, 2, 3])

Or maybe you need a specific column more?

# get a specific column
a[:, 1]

The Result:
array([2, 5])

How can you view specific elements of a NumPy array by defining the starting and ending positions, as well as a specific interval? Check out this example.

# get some data from an array using startindex, endindex and stepsize
d = np.array([[1,2,3,4,5,6,7,8], [9,10,11,12,13,14,15,16]])
d[0, 1:8:2]

The Result:
array([2, 4, 6, 8])

Displaying specific elements in a NumPy array is useful, but a good question is also how to change a particular element.

# get change an element in the array
d[1,4] = 23
print(d)

The Result:
[[ 1 2 3 4 5 6 7 8]
[ 9 10 11 12 23 14 15 16]]

How do we make some NumPy array and fill it with zeros? See the following example.

# get all 0s
np.zeros((2,8))

The Result:
array([[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])

This is getting interesting. How can we do the same with ones, but let us define the data type?

# get all 1s to be int64
np.ones((2,8), dtype='int64')

The Result:
array([[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]], dtype=int64)

That's excellent, but how can we customize the value that populates the NumPy array? Let's examine an example where we fill the array with 88.

# get all any number

np.full((2,8), 88)

The Result:

array([[88, 88, 88, 88, 88, 88, 88, 88],

[88, 88, 88, 88, 88, 88, 88, 88]])

What if you want to change the value of all elements in an already created NumPy array to a specific number? Maybe the following function is more suitable for that task.

# get all any number full-like

e = np.array([[1,2,3],[4,5,6],[7,8,9]])

np.full_like(e, 8)

The Result:

array([[8, 8, 8],

[8, 8, 8],

[8, 8, 8]])

Random number generation is a fundamental technique employed in game development and artificial intelligence. For instance, AI applications often rely on randomness to simulate spontaneous behaviors in robots. Let's begin by exploring how to populate a NumPy array with random decimal values.

# random decimal numbers
np.random.rand(4,3)

The Result:
array([[0.04871196, 0.35117614, 0.20647421],
       [0.62571628, 0.96698987, 0.18247642],
       [0.01029132, 0.56817888, 0.08467444],
       [0.01443212, 0.78087052, 0.26281676]])

How can you generate random integers in the range of 0 to 8 and store them in a NumPy array? The next steps would be...

# random integer numbers
np.random.randint(9, size=(4,3))

The Result:
array([[7, 3, 5],
       [4, 5, 4],
       [1, 5, 1],
       [2, 4, 1]])

Python NumPy multidimensional arrays

Let's try a more challenging task. How can you construct a square NumPy array filled with zeros and ones, where the diagonal elements are ones? Surprisingly, this can be achieved in a single line of code.

# the identity array is a square array with ones on the main diagonal
np.identity(8)

The Result:
array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.]])

Suppose you have an existing NumPy array. How would you create a new array that contains multiple copies of the original array? This task can be accomplished with a straightforward approach.

# repaeat an array
f = np.array([[1,2,3,4]])
g = np.repeat(f,4, axis=0)
print(g)

The Result:
[[1 2 3 4]
[1 2 3 4]
[1 2 3 4]
[1 2 3 4]]

When dealing with NumPy arrays, it's important to understand the difference between copying an array by reference and copying it by value. Let's explore this concept further.

# coping arrays
h = np.array([1,2,3,4,5])
i = h
j = h.copy()
print(i)
i[3] = 10 # change an element in i will change the element in h, too!
print(h)
print(j)
j[4] = 11 # it will not change the element in h, because use copy()
print(h)

The Result:
[1 2 3 4 5]
[1 2 3 10 5]
[1 2 3 4 5]
[1 2 3 10 5]
[1 2 3 4 11]

NumPy arrays offer a significant advantage over Python lists when it comes to performing arithmetic operations. You can perform these operations element-wise in two distinct ways.

# the arthimetic operations
k = np.array([5,6,7,8])
l = np.array([1,2,3,4])

m1 = k + l
print(m1)
m2 = np.add(k,l)
print(m2)

m1 = k - l
print(m1)
m2 = np.subtract(k,l)
print(m2)

m1 = k * l
print(m1)
m2 = np.multiply(k,l)
print(m2)

m1 = k / l
print(m1)
m2 = np.divide(k,l)
print(m2)

m1 = k ** l
print(m1)
m2 = np.power(k,l)
print(m2)

m1 = k % l
print(m1)
m2 = np.mod(k,l)
print(m2)

The Result:
[4 4 4 4]
[4 4 4 4]
[ 5 12 21 32]
[ 5 12 21 32]
[5. 3. 2.33333333 2. ]
[5. 3. 2.33333333 2. ]
[ 5 36 343 4096]
[ 5 36 343 4096]
[0 0 1 0]
[0 0 1 0]

NumPy is a powerful tool for working with trigonometric functions. To discover its full capabilities, check out the official documentation. Let's start with three simple examples demonstrating the use of sine, cosine, and tangent.

# Trigonometric functions
n = np.array([1,2,3])

o1 = np.sin(n)
print(o1)

o2 = np.cos(n)
print(o2)

o3 = np.tan(n)
print(o3)

The Result:
[0.84147098 0.90929743 0.14112001]
[ 0.54030231 -0.41614684 -0.9899925]
[ 1.55740772 -2.18503986 -0.14254654]

Let's go back to basics. For instance, how can we reorganize a NumPy array to have completely different dimensions, ensuring that all elements from the original array are included?

# Reorganizing arrays
p = np.array([[1,2,3,4,5],[1,2,3,4,5]])
print(f)
r = p.reshape((5,2))
print(r)

The Result:
[[1 2 3 4 5]
[1 2 3 4 5]]
[[1 2]
[3 4]
[5 1]
[2 3]
[4 5]]

See for example how to vertically stack NumPy arrays.

# Verticaly stacking vectors
s1 = np.array([1,2,3])
s2 = np.array([4,5,6])
np.vstack([s1,s2,s2,s1])

The Result:
array([[1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [1, 2, 3]])

And finally, see for example how to horizontally stack NumPy arrays.

# Horisontaly stacking vectors
np.hstack((s1,s2))

The Result:
array([1, 2, 3, 4, 5, 6])

These were just the tip of the iceberg when it comes to NumPy. The library offers a wide range of algebraic operations and capabilities. To delve deeper, you'll need to invest more time and effort into learning. The official NumPy documentation is an excellent resource to start with. You can find it here. For a visual demonstration, check out this video.