Why Numpy?

Share This Post

Numpy is a popular python package for creating and performing operation on n-dimensional arrays. It facilitate scientific computing in python. The major advantage of numpy over standard python sequence objects such as list are:

  1. Ease of Use
  2. Speed

The code examples below demonstate this arguement

Let’s see the procedure for list and array multiplication

In [ ]:
#import packages
import numpy as np
In [ ]:
#create a list

a = list(range(100))
b = list(range(100, 200))

#create arrray with similar values as list a and b

a_arr = np.arange(100)
b_arr = np.arange(100, 200)
In [ ]:
#check data type of a and b
print(type(a), type(b))
print(type(a_arr), type(b_arr))
<class 'list'> <class 'list'>
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
In [ ]:
#Try multiplication of a and b
a * b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_368740/2627605228.py in <module>
      1 #let's do multiplication of a and b
----> 2 a * b

TypeError: can't multiply sequence by non-int of type 'list'

As we can see, we can’t use simple * for element-wise multiplaction of values in a list. We have to loop through the list values.

In [ ]:
#list muliplication

c= []
for i in range (100):
    c.append(a[i] * b[i])
print (c)
[0, 101, 204, 309, 416, 525, 636, 749, 864, 981, 1100, 1221, 1344, 1469, 1596, 1725, 1856, 1989, 2124, 2261, 2400, 2541, 2684, 2829, 2976, 3125, 3276, 3429, 3584, 3741, 3900, 4061, 4224, 4389, 4556, 4725, 4896, 5069, 5244, 5421, 5600, 5781, 5964, 6149, 6336, 6525, 6716, 6909, 7104, 7301, 7500, 7701, 7904, 8109, 8316, 8525, 8736, 8949, 9164, 9381, 9600, 9821, 10044, 10269, 10496, 10725, 10956, 11189, 11424, 11661, 11900, 12141, 12384, 12629, 12876, 13125, 13376, 13629, 13884, 14141, 14400, 14661, 14924, 15189, 15456, 15725, 15996, 16269, 16544, 16821, 17100, 17381, 17664, 17949, 18236, 18525, 18816, 19109, 19404, 19701]

How about if we try * for element-wise multiplication of values in an array

In [ ]:
c= a_arr * b_arr
print (c)
[    0   101   204   309   416   525   636   749   864   981  1100  1221
  1344  1469  1596  1725  1856  1989  2124  2261  2400  2541  2684  2829
  2976  3125  3276  3429  3584  3741  3900  4061  4224  4389  4556  4725
  4896  5069  5244  5421  5600  5781  5964  6149  6336  6525  6716  6909
  7104  7301  7500  7701  7904  8109  8316  8525  8736  8949  9164  9381
  9600  9821 10044 10269 10496 10725 10956 11189 11424 11661 11900 12141
 12384 12629 12876 13125 13376 13629 13884 14141 14400 14661 14924 15189
 15456 15725 15996 16269 16544 16821 17100 17381 17664 17949 18236 18525
 18816 19109 19404 19701]

We have no problem using * on arrays. Comparing the two codes to perform element-wise operation above, it is evident that performing operation than list. In fact, numpy has lot of functions for creating, manipulating, sorting, selecting and operating on multidimensional array object. The element-wise operation above can also be achieved using multiply method of numpy. We can find the dot product using dot method

In [ ]:
np.multiply(a_arr, b_arr)
Out[ ]:
array([    0,   101,   204,   309,   416,   525,   636,   749,   864,
         981,  1100,  1221,  1344,  1469,  1596,  1725,  1856,  1989,
        2124,  2261,  2400,  2541,  2684,  2829,  2976,  3125,  3276,
        3429,  3584,  3741,  3900,  4061,  4224,  4389,  4556,  4725,
        4896,  5069,  5244,  5421,  5600,  5781,  5964,  6149,  6336,
        6525,  6716,  6909,  7104,  7301,  7500,  7701,  7904,  8109,
        8316,  8525,  8736,  8949,  9164,  9381,  9600,  9821, 10044,
       10269, 10496, 10725, 10956, 11189, 11424, 11661, 11900, 12141,
       12384, 12629, 12876, 13125, 13376, 13629, 13884, 14141, 14400,
       14661, 14924, 15189, 15456, 15725, 15996, 16269, 16544, 16821,
       17100, 17381, 17664, 17949, 18236, 18525, 18816, 19109, 19404,
       19701])
In [ ]:
np.dot(a_arr, b_arr)
Out[ ]:
823350

Besides the simplicity that numpy provides, it is also faster. Let’s time the operation of the list multiplication and numpy arrays multiplication.

In [ ]:
%%time
c= []
for i in range (100):
    c.append(a[i] * b[i])
CPU times: user 97 µs, sys: 41 µs, total: 138 µs
Wall time: 148 µs
In [ ]:
%%time
c_arr = a_arr * b_arr
CPU times: user 21 µs, sys: 0 ns, total: 21 µs
Wall time: 25.5 µs

Whoops! multiplication of numpy arrays is about 5 times faster than list. This is a very important advantage in scientific computing involving large datasets with millions of data points. Numpy operation is fast due to absence of explicit looping, indexing etc. This property is called vectorization. The looping, indexing and operation take place behind the scene in optimized precompiled c++ codes.

 

 

More To Explore