flowchart TB
A["NumPy ndarray"] --> B["Creation<br/>array, arange, zeros, ones, linspace"]
A --> C["Attributes<br/>shape, dtype, ndim, size"]
A --> D["Access<br/>indexing, slicing, boolean masks"]
A --> E["Math<br/>+, -, *, /, sqrt, exp, log"]
A --> F["Statistics<br/>mean, median, std, var, min, max"]
A --> G["Shape ops<br/>reshape, concatenate, split"]
A --> H["Random<br/>rng.random, normal, choice"]
47 Numpy
What This Chapter Covers
NumPy (Charles R. Harris et al., 2020) is the numerical backbone of almost every Python analytics library. By the end of this chapter you will be able to:
- Create arrays from lists, ranges and special constructors (
zeros,ones,linspace). - Inspect an array’s shape, dtype and dimensions.
- Use indexing, slicing and boolean masking to pull out the values you need.
- Apply element-wise arithmetic and mathematical functions across whole arrays at once.
- Reshape, join and split arrays.
- Compute summary statistics (
mean,median,std,var,min,max) along one or more axes. - Understand broadcasting — how NumPy aligns arrays of different shapes.
- Generate reproducible random numbers for simulation and sampling.
47.1 A Map of NumPy
Everything in NumPy revolves around the ndarray — an n-dimensional array of homogeneous, typed values. The diagram below groups the functionality you will actually use day-to-day.
NumPy (Numerical Python) is a fundamental package for numerical computing in Python. It provides support for multi-dimensional arrays, mathematical functions, linear algebra operations, and random number generation, making it an essential tool for scientific computing, data analysis, and machine learning.
- NumPy is a powerful, efficient, and versatile library that serves as the backbone of data analysis, machine learning, and scientific computing in Python. Its ability to perform fast array computations, mathematical operations, and linear algebra functions makes it a must-learn for data science professionals.
47.1.1 Key Features of NumPy
-
Efficient Array Handling: Supports
ndarray, a powerful multi-dimensional array object that is more efficient than Python lists. - Vectorized Operations: Eliminates the need for explicit loops by applying operations element-wise.
- Broadcasting: Allows arithmetic operations on arrays of different shapes without explicit looping.
- Mathematical Functions: Provides a wide range of functions for algebra, statistics, trigonometry, and more.
- Random Number Generation: Generates pseudo-random numbers for simulations and machine learning.
- Integration with Other Libraries: Works seamlessly with pandas, matplotlib, scikit-learn, and TensorFlow.
47.1.2 Installing NumPy
To install NumPy, use: pip install numpy Or, if using Anaconda: conda install numpy
The convention in every Python project is to import NumPy as np:
import numpy as np47.2 Array Creation
-
np.array([1, 2, 3]): Create a NumPy array from a list or tuple. -
np.arange(10): Create an array with a range of numbers. -
np.zeros((3, 4)): Create a 3-row, 4-column array of zeros. -
np.ones((2, 3)): Create a 2-row, 3-column array of ones. -
np.linspace(0, 1, 5): Five evenly spaced numbers from 0 to 1 (inclusive on both ends).
47.3 Array Attributes
Every ndarray carries metadata that describes it. Knowing these attributes makes debugging much easier:
-
a.shape— the size along each axis, returned as a tuple. -
a.ndim— number of dimensions (axes). -
a.size— total number of elements. -
a.dtype— the element type (e.g.int64,float64,bool).
47.4 Indexing and Slicing
Indexing pulls out individual elements; slicing pulls out sub-arrays. For 2-D arrays, use a comma between row and column indexes.
-
a[0]— first row (for a 2-D array) or first element (for a 1-D array). -
a[0, 1]— element at row 0, column 1. -
a[:, 0]— every row, column 0 → a single column. -
a[1:3]— rows 1 and 2 (slice is exclusive at the end).
47.4.1 Boolean Masking
Comparisons against a NumPy array return a Boolean array. Using that Boolean array as an index keeps only the elements where the condition is True — this is the standard way to filter data in NumPy and Pandas.
47.5 Array Manipulation
-
np.concatenate((a1, a2), axis=0): Join a sequence of arrays along an existing axis. -
np.split(array, indices_or_sections): Split an array into multiple sub-arrays. -
a.reshape(rows, cols): Return a view ofawith a new shape (the total size must match).
47.6 Mathematical Operations
-
np.add(a, b),np.subtract(a, b),np.multiply(a, b),np.divide(a, b): Perform element-wise addition, subtraction, multiplication, and division. -
np.sqrt(a): Square root of each element in the array. -
np.exp(a): Calculate the exponential of all elements in the array. -
np.log(a): Natural logarithm of each element in the array. -
np.power(a, b): Elements ofaraised to the powers fromb, element-wise.
All of these are vectorised — they run a compiled inner loop in C, not a Python loop — which is why NumPy is typically 10–100× faster than plain Python for numerical work.
47.6.1 Broadcasting
Broadcasting is how NumPy handles arithmetic between arrays whose shapes don’t match exactly. Instead of requiring you to replicate the smaller array by hand, NumPy virtually stretches it to fit — without actually copying memory.
The simplest case: array + scalar applies the scalar to every element. More generally, two arrays are compatible if, reading their shapes right-to-left, each dimension is either equal or 1.
47.7 Statistical Functions
-
np.mean(a): Compute the arithmetic mean along the specified axis. -
np.median(a): Compute the median along the specified axis. -
np.std(a): Compute the standard deviation along the specified axis. -
np.var(a): Compute the variance along the specified axis. -
np.min(a),np.max(a): Find the minimum or maximum values. -
np.argmin(a),np.argmax(a): Find the indices of the minimum or maximum values. - Passing
axis=0collapses rows (giving one value per column);axis=1collapses columns (one value per row).
47.8 Random Number Generation
NumPy’s modern random API uses a generator object. Create one once with a seed for reproducibility, then draw samples from it.
-
rng.random(size)— uniform samples in [0, 1). -
rng.integers(low, high, size)— random integers. -
rng.normal(loc, scale, size)— samples from a normal distribution. -
rng.choice(a, size)— random selection from an array.
47.9 Common Pitfalls with NumPy
-
Mixing dtypes unintentionally →
np.array([1, 2, 3.0])is promoted tofloat64. Checka.dtypeif results look off. -
Integer overflow → Fixed-width integer dtypes can wrap around silently. Use
np.int64or a float if the values could get large. -
Slices are views, not copies → Modifying a slice modifies the original array. Use
.copy()when you need an independent array. -
Shape mismatches in broadcasting → If you get a
ValueError: operands could not be broadcast, print.shapeon both operands first. -
axis=0vsaxis=1→axis=0collapses down the rows (per column);axis=1collapses across the columns (per row). The axis you name is the one that disappears. -
Using
and/oron arrays → RaisesValueError. Use&,|(and parenthesise each side) for element-wise logic. -
np.random.seed(42)(the old API) → Prefernp.random.default_rng(seed=42). The generator API avoids global state and is thread-safe.
Summary
| Concept | Description |
|---|---|
| Foundations | |
| NumPy | The core scientific-computing library for Python, providing efficient multi-dimensional arrays and numerical functions |
| ndarray | NumPy's n-dimensional array, a contiguous block of typed data that is far more efficient than a Python list for numerical work |
| Why NumPy is Fast | |
| Vectorized Operations | Operations are applied element-wise across whole arrays, eliminating explicit Python loops and running at compiled speed |
| Broadcasting | Arithmetic between arrays of different shapes is aligned automatically using broadcasting rules, avoiding manual reshaping |
| Integration with Other Libraries | NumPy interoperates seamlessly with pandas, matplotlib, scikit-learn and TensorFlow as the common numerical foundation |
| Creating and Inspecting Arrays | |
| np.array() | Construct an array from a list or tuple, such as np.array([1, 2, 3]) |
| np.arange() | Create an array of evenly spaced integers or floats across a range, such as np.arange(10) |
| np.zeros / np.ones / np.linspace | Build pre-filled arrays of zeros, ones, or evenly spaced floats for initialisation and plotting |
| shape, ndim, size, dtype | Attributes that describe an array's size in each direction, its number of dimensions, total elements and element type |
| Accessing Elements | |
| Indexing and Slicing | Access elements with a[i, j] and sub-arrays with slices like a[:, 0] or a[1:3] |
| Boolean Masking | Filter an array by passing a Boolean array of the same shape as the index, keeping only True elements |
| Reshaping and Joining | |
| np.concatenate() | Join two or more arrays along an existing axis into a single larger array |
| np.split() | Cut a single array into a list of sub-arrays at specified indices or equal sections |
| reshape() | Return a view with a new shape; -1 lets NumPy infer one dimension from the total size |
| Mathematical Operations | |
| Element-wise Arithmetic | np.add, np.subtract, np.multiply and np.divide apply their operations element by element across arrays |
| np.sqrt(), np.exp(), np.log() | Apply square root, exponential or natural log to every element, enabling fast transformations of large datasets |
| np.power() | Raise each element of one array to the corresponding power in another, element-wise |
| Broadcasting Rules | Arrays with compatible shapes are aligned automatically; dimensions of 1 are stretched virtually without copying memory |
| Statistical Functions | |
| np.mean() and np.median() | Compute the arithmetic mean and median of array values along a given axis |
| np.std() and np.var() | Compute standard deviation and variance as measures of spread across the array |
| np.min(), np.max(), argmin, argmax | Locate the smallest or largest value in an array, or the index at which each occurs |
| axis=0 vs axis=1 | axis=0 collapses down the rows (one value per column); axis=1 collapses across columns (one value per row) |
| Random Numbers and Gotchas | |
| default_rng() | The modern NumPy random API; create one seeded generator for reproducible simulations and sampling |
| Common Pitfalls | Watch out for dtype promotion, views vs copies, shape mismatches, axis confusion, and using and/or on arrays |