Numpy is a popular python package for creating and performing operation on n-dimensional arrays. It facilitate scientific computing in python. The major advantage of numpy over standard python sequence objects such as list are:
- Ease of Use
- Speed
The code examples below demonstate this arguement
Let’s see the procedure for list and array multiplication
#import packages
import numpy as np
#create a list
a = list(range(100))
b = list(range(100, 200))
#create arrray with similar values as list a and b
a_arr = np.arange(100)
b_arr = np.arange(100, 200)
#check data type of a and b
print(type(a), type(b))
print(type(a_arr), type(b_arr))
<class 'list'> <class 'list'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
#Try multiplication of a and b
a * b
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_368740/2627605228.py in <module> 1 #let's do multiplication of a and b ----> 2 a * b TypeError: can't multiply sequence by non-int of type 'list'
As we can see, we can’t use simple * for element-wise multiplaction of values in a list. We have to loop through the list values.
#list muliplication
c= []
for i in range (100):
c.append(a[i] * b[i])
print (c)
[0, 101, 204, 309, 416, 525, 636, 749, 864, 981, 1100, 1221, 1344, 1469, 1596, 1725, 1856, 1989, 2124, 2261, 2400, 2541, 2684, 2829, 2976, 3125, 3276, 3429, 3584, 3741, 3900, 4061, 4224, 4389, 4556, 4725, 4896, 5069, 5244, 5421, 5600, 5781, 5964, 6149, 6336, 6525, 6716, 6909, 7104, 7301, 7500, 7701, 7904, 8109, 8316, 8525, 8736, 8949, 9164, 9381, 9600, 9821, 10044, 10269, 10496, 10725, 10956, 11189, 11424, 11661, 11900, 12141, 12384, 12629, 12876, 13125, 13376, 13629, 13884, 14141, 14400, 14661, 14924, 15189, 15456, 15725, 15996, 16269, 16544, 16821, 17100, 17381, 17664, 17949, 18236, 18525, 18816, 19109, 19404, 19701]
How about if we try * for element-wise multiplication of values in an array
c= a_arr * b_arr
print (c)
[ 0 101 204 309 416 525 636 749 864 981 1100 1221 1344 1469 1596 1725 1856 1989 2124 2261 2400 2541 2684 2829 2976 3125 3276 3429 3584 3741 3900 4061 4224 4389 4556 4725 4896 5069 5244 5421 5600 5781 5964 6149 6336 6525 6716 6909 7104 7301 7500 7701 7904 8109 8316 8525 8736 8949 9164 9381 9600 9821 10044 10269 10496 10725 10956 11189 11424 11661 11900 12141 12384 12629 12876 13125 13376 13629 13884 14141 14400 14661 14924 15189 15456 15725 15996 16269 16544 16821 17100 17381 17664 17949 18236 18525 18816 19109 19404 19701]
We have no problem using * on arrays. Comparing the two codes to perform element-wise operation above, it is evident that performing operation than list. In fact, numpy has lot of functions for creating, manipulating, sorting, selecting and operating on multidimensional array object. The element-wise operation above can also be achieved using multiply
method of numpy. We can find the dot product using dot
method
np.multiply(a_arr, b_arr)
array([ 0, 101, 204, 309, 416, 525, 636, 749, 864, 981, 1100, 1221, 1344, 1469, 1596, 1725, 1856, 1989, 2124, 2261, 2400, 2541, 2684, 2829, 2976, 3125, 3276, 3429, 3584, 3741, 3900, 4061, 4224, 4389, 4556, 4725, 4896, 5069, 5244, 5421, 5600, 5781, 5964, 6149, 6336, 6525, 6716, 6909, 7104, 7301, 7500, 7701, 7904, 8109, 8316, 8525, 8736, 8949, 9164, 9381, 9600, 9821, 10044, 10269, 10496, 10725, 10956, 11189, 11424, 11661, 11900, 12141, 12384, 12629, 12876, 13125, 13376, 13629, 13884, 14141, 14400, 14661, 14924, 15189, 15456, 15725, 15996, 16269, 16544, 16821, 17100, 17381, 17664, 17949, 18236, 18525, 18816, 19109, 19404, 19701])
np.dot(a_arr, b_arr)
823350
Besides the simplicity that numpy provides, it is also faster. Let’s time the operation of the list multiplication and numpy arrays multiplication.
%%time
c= []
for i in range (100):
c.append(a[i] * b[i])
CPU times: user 97 µs, sys: 41 µs, total: 138 µs Wall time: 148 µs
%%time
c_arr = a_arr * b_arr
CPU times: user 21 µs, sys: 0 ns, total: 21 µs Wall time: 25.5 µs
Whoops! multiplication of numpy arrays is about 5 times faster than list. This is a very important advantage in scientific computing involving large datasets with millions of data points. Numpy operation is fast due to absence of explicit looping, indexing etc. This property is called vectorization. The looping, indexing and operation take place behind the scene in optimized precompiled c++ codes.