Introduction to Array Operations in Python#

Meenal Jhajharia

Meenal Jhajharia. she/her.

CS and Math undergrad, University of Delhi
PyMC core contributor | GSoC student
Contact: meenal@mjhajharia.com | mjhajharia.com

This banner is generated from this code, the code in this link is a trivial customization of the original code by Colin Caroll who designed a similar banner for pymcon’20, Colin is amazing at visualization stuff and even has a couple of talks about it!!

Overview

Introduction
Python Objects
List Comprehension
Basics of NumPy

Why Python?#

Useful for quick prototyping
Dynamically Typed, Interpreted, High level data types
Large number of scientific open source software

Best Place to learn more : Official Python Tutorial

Let’s get started!#

All the code that is shown in this webinar can be executed from its website. Therefore you have two ways to follow along:

Click on the run code button and execute the code straight from this page

Clone the GitHub repo: pymc-devs/pymc-data-umbrella and follow along locally using Jupyter

Python Data Types#

data types

Numbers#

Certain numeric modules ship with Python

import random
random.random()

0.962693373774243

Strings#

Sequence Operations

X = 'Data'
len(X)

X[0:-2]

'Da'

Immutability#

Immutable objects cannot be changed

X = 'Data'
X + 'Umbrella'

'DataUmbrella'

X[0] = 'P'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/1913827336.py in <module>
----> 1 X[0] = 'P'

TypeError: 'str' object does not support item assignment

Polymorphism#

Operators or functions mean different things for different objects

1+2

'Py'+'MC'

'PyMC'

Length or size means different things for different datatypes

len("Python")

len(["Python", "Java", "C"])

len({"Language": "Python", "IDE": "VSCode"})

Related: Class Polymorphism, Method Overriding and Inheritance

Lists#

Positionally ordered collections of arbitrarily typed objects (mutable, no fixed size)

L = ['Python', 45, 1.23]
len(L)

L + [4, 5, 6]

['Python', 45, 1.23, 4, 5, 6]

L[-1]

1.23

List-specific operations

L.append('Aesara');L

['Python', 45, 1.23, 'Aesara']

L.pop(2); L

['Python', 45, 'Aesara']

More: sort(), reverse()

List indexing and slicing

L[99]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/2456052074.py in <module>
----> 1 L[99]

IndexError: list index out of range

X = [[1,2],[2,1]]
print(len(X), len(X[0]))

2 2

X[0][0]

L[:]

['Python', 45, 'Aesara']

L[-3:]

['Python', 45, 'Aesara']

L = [1,2,3,4,5,6,7,8,9,10]
L[1::2] #L[start:end:step_size]

[2, 4, 6, 8, 10]

L[::-1]

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

List Comprehension#

List = []
 
for character in 'Python':
    List.append(character)

List = [character for character in 'Python']

M = [['OS','Percentage of Users'],['Linux', '40'],['Windows', '20'], ['OSX','40']]

[row[0] for row in M][1:]

['Linux', 'Windows', 'OSX']

[row[0] + '*' for row in M][1:]

['Linux*', 'Windows*', 'OSX*']

[row[0] for row in M if row[0][0]!='O']

['Linux', 'Windows']

Nested List Comprehension

n = 3; [[ 1 if i==j else 0 for i in range(n) ] for j in range(n)]

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]

[x for x in range(21) if x%2==0 if x%3==0] 

[0, 6, 12, 18]

Lambda Function

[i*10 for i in range(10)]

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

list(map(lambda i: i*10, [i for i in range(10)]))

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

NumPy#

NumPy’s array class -> ndarray(array)

ndarray.ndim
ndarray.shape
ndarray.size

import numpy as np

a = np.arange(16).reshape(4, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Simple array operation

2*a

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14],
       [16, 18, 20, 22],
       [24, 26, 28, 30]])

General Properties of ndarrays#

a.shape

(4, 4)

a.ndim

a.size

Ways to create new arrays#

a = np.array(['PyMC', 'Arviz', 'Aesara'])

np.zeros((4, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

np.ones((4, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Generate values in a certain range#

np.arange(1, 100, 10)

array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

Random Number Generator#

rg = np.random.default_rng(1)
x = rg.random(3);x

array([0.51182162, 0.9504637 , 0.14415961])

Cumulative sum against specified axis (in this case only one axis is present)

x.cumsum()

array([0.51182162, 1.46228532, 1.60644493])

Multi-dimensional arrays#

c = np.array([[[0,  1,  2],[ 10, 12, 13]],
[[100, 101, 102],[110, 112, 113]]])

c.shape

(2, 2, 3)

for row in c:
    print(row,'-')

[[ 0  1  2]
 [10 12 13]] -
[[100 101 102]
 [110 112 113]] -

Element-wise printing#

for row in c.flat:
    print(row)

Transpose#

c.T

array([[[  0, 100],
        [ 10, 110]],

       [[  1, 101],
        [ 12, 112]],

       [[  2, 102],
        [ 13, 113]]])

Reshape#

c.reshape((12,1))

array([[  0],
       [  1],
       [  2],
       [ 10],
       [ 12],
       [ 13],
       [100],
       [101],
       [102],
       [110],
       [112],
       [113]])

Stacking#

a = np.ones((2,2))
b = np.zeros((2,2))

np.vstack((a, b))

array([[1., 1.],
       [1., 1.],
       [0., 0.],
       [0., 0.]])

np.hstack((a, b))

array([[1., 1., 0., 0.],
       [1., 1., 0., 0.]])

Broadcasting#

Used to deal with inputs that do not have exactly the same shape

If all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.
Arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

Arrays with same dimensions

a = np.array([1, 2, 3])
b = np.array([3, 3, 3])
a*b

array([3, 6, 9])

1-d Array and a Scalar

a = np.array([1, 2, 3])
b = 3
a*b

array([3, 6, 9])

Intuitively: scalar b being “stretched” to same shape as a

Reality: broadcasting moves less memory around (computationally efficient)

Arrays where dimensions aren’t exactly same, but are aligned along the leading dimension

a = np.ones((5,2,3))
b = np.ones((2,3))
a*b

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

Arrays where dimensions aren’t exactly same, but leading dimension is 1, so it works

a = np.ones((5,2,1))
b = np.ones((2,3))
a*b

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

Broadcasting fails!

a = np.ones((5,2,2))
b = np.ones((2,3))
a*b

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/2083074354.py in <module>
      1 a = np.ones((5,2,2))
      2 b = np.ones((2,3))
----> 3 a*b

ValueError: operands could not be broadcast together with shapes (5,2,2) (2,3) 

NumPy compares shapes element-wise for two given arrays

It starts with the trailing (i.e. rightmost) dimensions Two dimensions are compatible when

they are equal, or
one of them is 1

Arrays do not need to have the same exact number of dimensions to be compatible. Broadcasting is a convenient way of taking the outer product (or any outer operation)

Here broadcasting fails because of the mismatch of leading dimensions

a = np.array([1,2,3,4])
b = np.array([1,2,3])
a*b

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/257425980.py in <module>
      1 a = np.array([1,2,3,4])
      2 b = np.array([1,2,3])
----> 3 a*b

ValueError: operands could not be broadcast together with shapes (4,) (3,) 

We transpose a to reshape it along a new axix

a = np.asarray([a]).T #a[:, np.newaxis]
a.shape

(4, 1)

Now it works!

a*b

array([[ 1,  2,  3],
       [ 2,  4,  6],
       [ 3,  6,  9],
       [ 4,  8, 12]])

Indexing#

a = np.array([0, 6, 9, 8, 8, 6, 2, 7, 2, 8, 1, 0, 4, 6, 9, 0])
i = np.array([1, 1, 2, 3])
a[i]

array([6, 6, 9, 8])

j = np.array([[3, 0], [2, 1]])
a[j]

array([[8, 0],
       [9, 6]])

print(a.shape, i.shape, j.shape)
a[i,j]

(16,) (4,) (2, 2)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/3150026016.py in <module>
      1 print(a.shape, i.shape, j.shape)
----> 2 a[i,j]

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

a = a.reshape((4,4))
print(a.shape, i.shape, j.shape)
a[i,j]

(4, 4) (4,) (2, 2)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/var/folders/sd/pc07b3wn65nflgx8wpkkr7wr0000gn/T/ipykernel_2451/214778350.py in <module>
      1 a = a.reshape((4,4))
      2 print(a.shape, i.shape, j.shape)
----> 3 a[i,j]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (2,2) 

i = i.reshape((2,2))
print(a.shape, i.shape, j.shape)
a[i,j]

(4, 4) (2, 2) (2, 2)

array([[7, 8],
       [1, 6]])

Next thing to look at -> https://numpy.org/doc/stable/user/basics.html

Note / Reference: A lot of the things here are modified/original versions of examples given in official Python or NumPy documentation, so that’s the best source to learn comprehensively, this is meant to be an accessible introduction!!