NumPy Fundamentals for Numerical Data (with Finance Applications)
Welcome to post #11 in our Python learning journey! If you’ve been following along, you’re starting to build a solid foundation in Python. Now it’s time to explore NumPy, the powerhouse library that makes Python a serious contender for numerical computing and data analysis.
As a finance professional myself, I’ve found NumPy particularly useful for financial calculations, portfolio analysis, and working with large datasets. Let’s dive in and see how this library can level up your Python skills.
What is NumPy and Why Should You Care?
NumPy (Numerical Python) is a fundamental library that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.
Why is this important, especially for finance work?
- Performance: NumPy operations are executed in optimized C code, making them much faster than equivalent Python loops
- Memory efficiency: NumPy arrays use less memory than Python lists for numerical data
- Mathematical operations: Built-in functions for statistical analysis, linear algebra, and other operations common in finance
- Foundation for other tools: Libraries like pandas, matplotlib, and scikit-learn are built on NumPy
Simply put, if you’re doing any kind of numerical analysis in Python—whether that’s calculating portfolio returns, analyzing balance sheets, or running financial simulations—NumPy will make your life easier.
Installing NumPy
Before we get started, let’s make sure NumPy is installed in your environment:
pip install numpy
If you’re using Anaconda (which we covered in post #10), NumPy is already included!
To confirm the installation worked, open a Python shell and try:
import numpy as np
print(np.__version__)
You should see the version number printed (like 1.24.3
or similar). The convention is to import NumPy with the alias np
to save typing.
The NumPy ndarray: Python Lists on Steroids
The core of NumPy is the ndarray
(n-dimensional array) object. Think of it as a more powerful version of Python lists, specifically designed for numerical data.
Creating Arrays
Let’s start by creating some simple arrays:
import numpy as np
# Create an array from a Python list
revenue_data = np.array([10500, 15200, 12100, 18900])
print(revenue_data)
# Output: [10500 15200 12100 18900]
# Create an array of zeros (useful as a placeholder)
empty_balance_sheet = np.zeros((3, 4)) # 3 rows, 4 columns
print(empty_balance_sheet)
# Output:
# [[0. 0. 0. 0.]
# [0. 0. 0. 0.]
# [0. 0. 0. 0.]]
# Create an array of ones
unit_values = np.ones(5)
print(unit_values)
# Output: [1. 1. 1. 1. 1.]
# Create an array with a range of values (like Python's range)
quarters = np.arange(1, 5) # 1 to 4
print(quarters)
# Output: [1 2 3 4]
# Create evenly spaced values (useful for time series)
interest_rates = np.linspace(2.5, 3.5, 5) # 5 values from 2.5 to 3.5
print(interest_rates)
# Output: [2.5 2.75 3. 3.25 3.5 ]
# Create a 2D array (matrix) for financial data
# Rows: Q1, Q2, Q3, Q4
# Columns: Revenue, Expenses, Profit
financial_data = np.array([
[10500, 8200, 2300], # Q1
[15200, 9500, 5700], # Q2
[12100, 7800, 4300], # Q3
[18900, 10200, 8700] # Q4
])
print(financial_data)
# Output:
# [[10500 8200 2300]
# [15200 9500 5700]
# [12100 7800 4300]
# [18900 10200 8700]]
Array Attributes
NumPy arrays come with useful attributes that tell you about their shape and contents:
# Using the financial_data from above
# Shape (dimensions) of the array
print(financial_data.shape) # Output: (4, 3)
# Number of dimensions
print(financial_data.ndim) # Output: 2
# Total number of elements
print(financial_data.size) # Output: 12
# Data type of the elements
print(financial_data.dtype) # Output: int64
The shape
attribute is particularly useful—it tells us we have 4 rows (quarters) and 3 columns (financial metrics).
Indexing and Slicing: Accessing Your Data
Getting to specific elements in your arrays is similar to Python lists, but with more power.
Basic Indexing
# Using our financial_data array from above
# Get Q1 revenue (first row, first column)
q1_revenue = financial_data[0, 0]
print(q1_revenue) # Output: 10500
# Get Q3 profit (third row, third column)
q3_profit = financial_data[2, 2]
print(q3_profit) # Output: 4300
# Get all expenses (second column)
all_expenses = financial_data[:, 1]
print(all_expenses) # Output: [8200 9500 7800 10200]
# Get data for Q2 and Q3 (second and third rows)
mid_year_data = financial_data[1:3, :]
print(mid_year_data)
# Output:
# [[15200 9500 5700]
# [12100 7800 4300]]
Fancy Indexing
NumPy also lets you select elements that meet certain conditions—incredibly useful for financial analysis:
# Find quarters where revenue exceeded 12000
high_revenue_mask = financial_data[:, 0] > 12000
print(high_revenue_mask) # Output: [False True True True]
# Get the financial data for those high-revenue quarters
high_revenue_quarters = financial_data[high_revenue_mask]
print(high_revenue_quarters)
# Output:
# [[15200 9500 5700]
# [12100 7800 4300]
# [18900 10200 8700]]
# Find quarters where profit margin (profit/revenue) exceeded 30%
revenue = financial_data[:, 0]
profit = financial_data[:, 2]
profit_margin = profit / revenue
print(profit_margin) # Output: [0.21904762 0.375 0.35537191 0.46031746]
high_margin_quarters = financial_data[profit_margin > 0.3]
print(high_margin_quarters)
# Output:
# [[15200 9500 5700]
# [12100 7800 4300]
# [18900 10200 8700]]
This ability to filter data based on conditions is one of NumPy’s most powerful features for financial analysis.
Vectorized Operations: NumPy’s Superpower
In traditional Python, if you want to perform calculations across arrays, you’d need to write loops. NumPy eliminates this need with vectorized operations—allowing you to express calculations directly on entire arrays.
Element-wise Operations
# Revenue growth calculation for quarters
revenue = financial_data[:, 0]
revenue_prev = np.roll(revenue, 1) # Shift values to calculate previous quarter
revenue_prev[0] = revenue[0] # Set first value (no previous quarter)
revenue_growth = (revenue - revenue_prev) / revenue_prev * 100
print(revenue_growth) # Output: [ 0. 44.76190476 -20.39473684 56.19834711]
# Apply 10% discount to all revenue figures
discounted_revenue = revenue * 0.9
print(discounted_revenue) # Output: [ 9450. 13680. 10890. 17010.]
# Add a 5% bonus to all profit figures
profit = financial_data[:, 2]
bonus_adjusted_profit = profit * 1.05
print(bonus_adjusted_profit) # Output: [2415. 5985. 4515. 9135.]
# Calculate profit margin for each quarter
profit_margin = profit / revenue * 100 # as percentage
print(profit_margin) # Output: [21.9047619 37.5 35.53719008 46.03174603]
Broadcasting
One of NumPy’s most powerful features is broadcasting, which allows operations between arrays of different shapes. This is especially useful in finance for scenarios like applying different tax rates or calculating weighted averages.
# Applying different tax rates to quarterly profits
tax_rates = np.array([0.21, 0.22, 0.21, 0.20]) # Different rates each quarter
profit = financial_data[:, 2]
taxes_paid = profit * tax_rates
after_tax_profit = profit - taxes_paid
print(taxes_paid) # Output: [ 483. 1254. 903. 1740.]
print(after_tax_profit) # Output: [1817. 4446. 3397. 6960.]
# Creating a weighted average of financial metrics
# Giving more weight to recent quarters
weights = np.array([0.1, 0.2, 0.3, 0.4]) # Weights sum to 1
weighted_revenue = np.sum(revenue * weights)
print(weighted_revenue) # Output: 14970.0
Statistical Methods: Financial Analysis Made Easy
NumPy provides built-in methods for common statistical operations, which are invaluable for financial analysis:
revenue = financial_data[:, 0]
expenses = financial_data[:, 1]
profit = financial_data[:, 2]
# Basic statistics
print(f"Average Quarterly Revenue: {np.mean(revenue)}") # Output: 14175.0
print(f"Total Annual Revenue: {np.sum(revenue)}") # Output: 56700
print(f"Lowest Quarterly Profit: {np.min(profit)}") # Output: 2300
print(f"Highest Quarterly Expenses: {np.max(expenses)}") # Output: 10200
print(f"Revenue Variance: {np.var(revenue)}") # Output: 12577500.0
print(f"Profit Standard Deviation: {np.std(profit)}") # Output: 2326.53...
# Finding best and worst performing quarters
best_quarter_idx = np.argmax(profit)
worst_quarter_idx = np.argmin(profit)
print(f"Best Quarter: Q{best_quarter_idx + 1} with ${profit[best_quarter_idx]} profit")
# Output: Best Quarter: Q4 with $8700 profit
print(f"Worst Quarter: Q{worst_quarter_idx + 1} with ${profit[worst_quarter_idx]} profit")
# Output: Worst Quarter: Q1 with $2300 profit
# Cumulative sums (running totals) - useful for YTD calculations
ytd_revenue = np.cumsum(revenue)
print(f"Revenue YTD: {ytd_revenue}") # Output: [10500 25700 37800 56700]
# Moving averages (e.g., 2-quarter moving average)
def moving_average(a, window_size):
return np.convolve(a, np.ones(window_size)/window_size, mode='valid')
revenue_2q_avg = moving_average(revenue, 2)
print(f"2-Quarter Moving Average Revenue: {revenue_2q_avg}")
# Output: [12850. 13650. 15500.]
Performance Comparison: Why NumPy is Faster
To really appreciate NumPy, let’s compare it with pure Python for a common financial calculation: computing the cumulative returns from a series of daily returns.
import time
import random
# Generate sample daily returns (+/- 2%)
daily_returns_list = [random.uniform(-0.02, 0.02) for _ in range(10000)]
daily_returns_np = np.array(daily_returns_list)
# Pure Python approach
start_time = time.time()
cumulative_return = 1
for daily_return in daily_returns_list:
cumulative_return *= (1 + daily_return)
final_value_python = cumulative_return
python_time = time.time() - start_time
# NumPy approach
start_time = time.time()
final_value_numpy = np.prod(1 + daily_returns_np)
numpy_time = time.time() - start_time
print(f"Pure Python result: {final_value_python}, time: {python_time:.6f} seconds")
print(f"NumPy result: {final_value_numpy}, time: {numpy_time:.6f} seconds")
print(f"NumPy is {python_time/numpy_time:.1f}x faster")
On my machine, the NumPy version typically runs about 100x faster! This performance advantage becomes even more significant as your datasets grow, which is why NumPy is essential for serious financial analysis.
Finance Case Study: Portfolio Analysis
Let’s put everything together in a simple portfolio analysis example:
# Portfolio stocks data (price at end of each quarter)
# Rows: Quarters, Columns: Stocks (AAPL, MSFT, GOOGL, AMZN)
stock_prices = np.array([
[150.10, 225.75, 2720.30, 3110.50], # Q1 prices
[168.30, 265.90, 2781.25, 3450.75], # Q2 prices
[155.80, 285.40, 2950.60, 3320.20], # Q3 prices
[182.40, 305.65, 3050.40, 3680.35] # Q4 prices
])
# Initial portfolio allocation (number of shares)
holdings = np.array([10, 5, 1, 2])
# Calculate portfolio value each quarter
portfolio_values = np.sum(stock_prices * holdings, axis=1)
print(f"Portfolio value by quarter: {portfolio_values}")
# Output: [9807.8 11207.15 11612.4 12687.05]
# Calculate quarterly returns
portfolio_returns = np.diff(portfolio_values) / portfolio_values[:-1] * 100
print(f"Quarterly returns (%): {portfolio_returns}")
# Output: [14.26731123 3.61594237 9.25456674]
# Calculate annualized return (geometric mean)
annual_return = (portfolio_values[-1] / portfolio_values[0]) ** (4/4) - 1
print(f"Annualized return: {annual_return*100:.2f}%")
# Output: Annualized return: 29.36%
# Calculate volatility (standard deviation of returns)
volatility = np.std(portfolio_returns)
print(f"Return volatility: {volatility:.2f}%")
# Output: Return volatility: 5.33%
# Calculate Sharpe ratio (assuming risk-free rate of 2%)
risk_free_rate = 2 # 2% annual
sharpe_ratio = (annual_return*100 - risk_free_rate) / volatility
print(f"Sharpe ratio: {sharpe_ratio:.2f}")
# Output: Sharpe ratio: 5.13
# Analyze correlation between stocks
# Calculate returns for each stock
stock_returns = np.diff(stock_prices, axis=0) / stock_prices[:-1] * 100
# Calculate correlation matrix
corr_matrix = np.corrcoef(stock_returns.T)
print("Stock correlation matrix:")
print(corr_matrix)
# Output:
# [[ 1. -0.71294839 -0.99819453 0.77866361]
# [-0.71294839 1. 0.75844502 -0.17973701]
# [-0.99819453 0.75844502 1. -0.80727807]
# [ 0.77866361 -0.17973701 -0.80727807 1. ]]
This example shows how NumPy makes it straightforward to perform complex financial analyses with just a few lines of code!
Key Takeaways
NumPy transforms Python from a general-purpose language into a powerful numerical computing tool. For finance professionals, it offers:
- Speed and efficiency for working with large datasets
- Intuitive syntax for numerical operations that maps well to financial calculations
- Built-in statistical functions that eliminate the need for custom implementations
- Vectorized operations that make mathematical formulas easier to express
- The foundation for more advanced financial analysis libraries like pandas
Next Steps
Now that you have a solid understanding of NumPy basics, you’re ready to move on to pandas, which builds on NumPy to provide even more powerful data analysis capabilities specifically designed for tabular data like financial statements, price histories, and transaction records.
In our next post, we’ll explore how pandas can make your financial data analysis workflow even more efficient and insightful.
Exercise: Calculate Financial Metrics
Before you go, try this exercise to practice your NumPy skills:
Create an array with monthly sales data for a year, then:
- Calculate the average monthly sales
- Find the highest and lowest sales months
- Calculate the quarter-by-quarter growth rate
- Determine if any month had sales 20% above the yearly average
Good luck, and feel free to share your solutions in the comments!
This post is part of my journey learning Python. I’m a chartered accountant exploring programming to enhance my financial analysis toolkit. Follow along as we discover together how Python can transform our work in finance and accounting!