Post 4: Core Data Structures

Welcome to the fourth post in my Python learning journey. So far, we’ve installed Python, set up a development environment, and explored the basic syntax. Now it’s time to dive deeper into Python’s core data structures; the building blocks you’ll use to organise and manipulate data in your programs.

In this post, we’ll cover:

Lists: Python’s versatile sequence type
Tuples: Immutable collections
Dictionaries: Key-value mapping
Sets: Unique value collections
Choosing the right data structure

I’ve found these data structures similar to concepts we use every day; lists are like columns in spreadsheets, dictionaries resemble lookup tables, and sets are perfect for tracking unique items like account codes.

1. Lists

Lists are ordered, mutable (changeable) collections that can contain items of different types. They’re perhaps the most commonly used data structure in Python.

1.1 Creating Lists

# Empty list
empty_list = []

# List with initial values
numbers = [1, 2, 3, 4, 5]
mixed_data = [42, "hello", True, 3.14]

# Creating a list with the list() constructor
chars = list("Python")  # ['P', 'y', 't', 'h', 'o', 'n']

# List of repeated values
zeros = [0] * 5  # [0, 0, 0, 0, 0]

1.2 Accessing List Elements

expenses = [1200, 450, 700, 95, 800]

# Indexing (zero-based)
first_expense = expenses[0]  # 1200
last_expense = expenses[-1]  # 800

# Slicing [start:end:step] - end index is exclusive
first_three = expenses[0:3]  # [1200, 450, 700]
# Shorthand for starting from beginning
first_three = expenses[:3]   # [1200, 450, 700]
# Shorthand for going to the end
last_three = expenses[2:]    # [700, 95, 800]
# Negative indices count from the end
last_two = expenses[-2:]     # [95, 800]
# Step value
every_other = expenses[::2]  # [1200, 700, 800]
# Reverse a list
reversed_expenses = expenses[::-1]  # [800, 95, 700, 450, 1200]

1.3 Modifying Lists

departments = ["Finance", "Marketing", "IT", "Operations"]

# Changing an element
departments[1] = "Digital Marketing"

# Adding elements
departments.append("HR")  # Adds to the end
departments.insert(2, "Sales")  # Inserts at specific position

# Removing elements
departments.remove("IT")  # Removes first occurrence of value
popped_item = departments.pop()  # Removes and returns last item
popped_item = departments.pop(1)  # Removes item at index 1
del departments[0]  # Removes item at index 0

# Extending lists
dept1 = ["Finance", "HR"]
dept2 = ["IT", "Operations"]
dept1.extend(dept2)  # dept1 now contains ["Finance", "HR", "IT", "Operations"]
# Alternative: concatenation
all_depts = dept1 + dept2  # Creates a new list

# Finding elements
position = departments.index("HR")  # Raises ValueError if not found
count = departments.count("Finance")  # Counts occurrences of value

1.4 Useful List Operations

numbers = [5, 2, 8, 1, 9]

# Sorting
numbers.sort()  # Modifies the list in-place: [1, 2, 5, 8, 9]
numbers.sort(reverse=True)  # Descending order: [9, 8, 5, 2, 1]

# If you don't want to modify the original list
sorted_numbers = sorted(numbers)  # Returns a new sorted list

# Reversing
numbers.reverse()  # Modifies the list in-place

# Finding min/max
minimum = min(numbers)
maximum = max(numbers)

# Sum of all elements
total = sum(numbers)

# Checking membership
if 5 in numbers:
    print("Found 5 in the list!")

# List comprehension (filtering and transforming)
even_numbers = [x for x in numbers if x % 2 == 0]
doubled = [x * 2 for x in numbers]

1.5 Nested Lists (2D Lists)

# Quarterly expenses by department
quarterly_expenses = [
    [1200, 1500, 1100, 1800],  # Finance
    [900, 950, 1025, 1150],    # Marketing
    [850, 880, 920, 980]       # IT
]

# Accessing elements
finance_q1 = quarterly_expenses[0][0]  # 1200
marketing_q3 = quarterly_expenses[1][2]  # 1025

# Looping through a 2D list
for department in quarterly_expenses:
    for expense in department:
        print(expense, end=" ")
    print()  # New line after each department

2. Tuples

Tuples are similar to lists but are immutable (cannot be changed after creation). They’re commonly used for fixed collections of items.

2.1 Creating Tuples

# Empty tuple
empty_tuple = ()

# Tuple with values
coordinates = (10, 20)
person = ("John", 30, "Developer")

# Single-item tuple needs a comma
single_item = (42,)  # Without comma, it's just a number in parentheses

# Tuple packing (no parentheses needed)
employee = "Jane", 35, "Manager"

# Creating with tuple() constructor
letters = tuple("abc")  # ('a', 'b', 'c')

2.2 Accessing Tuple Elements

coordinates = (10, 20, 30, 40, 50)

# Similar to list indexing and slicing
x = coordinates[0]  # 10
last = coordinates[-1]  # 50
subset = coordinates[1:4]  # (20, 30, 40)

2.3 Tuple Operations

employee = ("Jane", 35, "Manager", "HR")

# Count and index (like lists)
age_pos = employee.index(35)
count = employee.count("Manager")

# Concatenation
more_info = employee + ("Full-time",)

# Unpacking (very useful feature!)
name, age, role, department = employee

# Returning multiple values from a function (using tuple unpacking)
def get_employee_stats():
    return "Jane", 35, 85000

name, age, salary = get_employee_stats()

2.4 Why Use Tuples?

Immutability - Values can’t be changed accidentally
Hashable - Can be used as dictionary keys (lists cannot)
Slightly more efficient than lists for fixed data
Signal intent - Using a tuple indicates the data shouldn’t change

# Using tuples as dictionary keys (not possible with lists)
locations = {
    (40.7128, -74.0060): "New York",
    (34.0522, -118.2437): "Los Angeles"
}

3. Dictionaries

Dictionaries store data as key-value pairs, providing fast lookups by key. They’re unordered in Python versions before 3.7 and preserve insertion order in 3.7+.

3.1 Creating Dictionaries

# Empty dictionary
empty_dict = {}

# Dictionary with initial key-value pairs
employee = {
    "name": "Jane Smith",
    "age": 35,
    "department": "Finance",
    "salary": 75000
}

# Alternative creation with dict() constructor
employee = dict(name="Jane Smith", age=35, department="Finance")

# Creating from sequences of pairs
items = [("name", "Jane"), ("age", 35)]
employee = dict(items)

3.2 Accessing Dictionary Values

employee = {
    "name": "Jane Smith",
    "age": 35,
    "department": "Finance",
    "salary": 75000
}

# Access by key
name = employee["name"]  # "Jane Smith"

# KeyError if key doesn't exist
# salary = employee["bonus"]  # Raises KeyError

# Using get() method (safer, returns None or default value if key not found)
bonus = employee.get("bonus")  # None
bonus = employee.get("bonus", 0)  # Returns 0 if key not found

3.3 Modifying Dictionaries

employee = {"name": "Jane", "department": "Finance"}

# Adding or updating values
employee["age"] = 35  # Add new key-value pair
employee["name"] = "Jane Smith"  # Update existing value

# Removing items
removed_value = employee.pop("age")  # Removes and returns value
del employee["department"]  # Removes key-value pair

# Clearing all items
employee.clear()  # Empty dictionary {}

3.4 Useful Dictionary Operations

expenses = {
    "rent": 1200,
    "utilities": 250,
    "groceries": 400,
    "entertainment": 150
}

# Get all keys
keys = expenses.keys()  # dict_keys(['rent', 'utilities', 'groceries', 'entertainment'])

# Get all values
values = expenses.values()  # dict_values([1200, 250, 400, 150])

# Get all key-value pairs as tuples
items = expenses.items()  # dict_items([('rent', 1200), ('utilities', 250), ...])

# Iterating over a dictionary
for key in expenses:
    print(f"{key}: ${expenses[key]}")

# Better way to iterate over keys and values
for category, amount in expenses.items():
    print(f"{category}: ${amount}")

# Check if key exists
if "rent" in expenses:
    print("Rent is accounted for")

# Merging dictionaries (Python 3.9+)
monthly = {"rent": 1200, "utilities": 250}
occasional = {"repairs": 100, "insurance": 80}
all_expenses = monthly | occasional  # Python 3.9+

# Merging dictionaries (earlier versions)
all_expenses = {**monthly, **occasional}  # Unpacking syntax

# Dictionary comprehension
doubled_expenses = {k: v * 2 for k, v in expenses.items()}
large_expenses = {k: v for k, v in expenses.items() if v > 200}

3.5 Nested Dictionaries

# Department budget by quarter and category
department_budget = {
    "Finance": {
        "Q1": {"salaries": 50000, "equipment": 10000, "travel": 5000},
        "Q2": {"salaries": 52000, "equipment": 8000, "travel": 6000}
    },
    "IT": {
        "Q1": {"salaries": 60000, "equipment": 20000, "travel": 3000},
        "Q2": {"salaries": 65000, "equipment": 15000, "travel": 2000}
    }
}

# Accessing nested values
finance_q1_salaries = department_budget["Finance"]["Q1"]["salaries"]

# Safely accessing nested values
import pprint  # Pretty print module for better display of nested structures

# Loop through nested dictionary
for dept, quarters in department_budget.items():
    print(f"\n{dept} Department:")
    for quarter, categories in quarters.items():
        print(f"  {quarter}:")
        for category, amount in categories.items():
            print(f"    {category}: ${amount}")

4. Sets

Sets are unordered collections of unique elements. They’re perfect for removing duplicates and performing mathematical set operations.

4.1 Creating Sets

# Empty set (can't use {} as that creates an empty dictionary)
empty_set = set()

# Set with initial values
fruits = {"apple", "banana", "cherry"}

# Creating a set from a list (removes duplicates)
numbers = set([1, 2, 2, 3, 4, 4, 5])  # {1, 2, 3, 4, 5}
unique_chars = set("mississippi")  # {'m', 'i', 's', 'p'}

4.2 Set Operations

employees_dept_a = {"Jess", "Bob", "Charlie", "David"}
employees_dept_b = {"Charlie", "David", "Eve", "Frank"}
candidates = {"Eve", "Grace", "Heidi"}

# Add and remove elements
employees_dept_a.add("Grace")
employees_dept_a.remove("Bob")  # Raises KeyError if not found
employees_dept_a.discard("Bob")  # No error if not found

# Set operations
# Union (all employees)
all_employees = employees_dept_a | employees_dept_b
# or
all_employees = employees_dept_a.union(employees_dept_b)

# Intersection (employees in both departments)
in_both_depts = employees_dept_a & employees_dept_b
# or
in_both_depts = employees_dept_a.intersection(employees_dept_b)

# Difference (employees in A but not in B)
only_in_a = employees_dept_a - employees_dept_b
# or
only_in_a = employees_dept_a.difference(employees_dept_b)

# Symmetric difference (employees in either dept but not both)
in_one_dept = employees_dept_a ^ employees_dept_b
# or
in_one_dept = employees_dept_a.symmetric_difference(employees_dept_b)

# Subset and superset
is_subset = employees_dept_a <= employees_dept_b  # False
is_proper_subset = candidates < all_employees  # True (candidates is a proper subset of all_employees)

4.3 Common Set Uses

# Removing duplicates from a list
transactions = [1001, 1002, 1001, 1003, 1002, 1004]
unique_transactions = list(set(transactions))

# Membership testing (very efficient for large datasets)
if "Jess" in employees_dept_a:
    print("Jess works in Department A")

# Finding common elements
customer_ids = {101, 102, 103, 104, 105}
premium_ids = {102, 104, 106}
common_ids = customer_ids & premium_ids

# Set comprehensions
even_numbers = {x for x in range(10) if x % 2 == 0}  # {0, 2, 4, 6, 8}

5. Choosing the Right Data Structure

Selecting the appropriate data structure can make your code more efficient and readable. Here’s a quick guide:

5.1 When to Use Each Structure

Use Lists when:

You need an ordered collection
Items might need to be changed, added, or removed
You need to store duplicate values
You need to maintain insertion order

Use Tuples when:

You have a fixed collection that shouldn’t change
You want to return multiple values from a function
You need elements that can serve as dictionary keys
You want to ensure data integrity (immutability)

Use Dictionaries when:

You need key-value mapping (lookups by key)
You want fast lookups by a specific identifier
You’re working with named attributes or properties
You need to count occurrences of items

Use Sets when:

You only care about unique values (no duplicates)
You need to perform set operations (union, intersection)
You want to quickly check if an item exists
You’re removing duplicates from a collection

5.2 Performance Considerations

Data structure choice affects performance. In general:

Lists: O(1) for append/pop at end, O(n) for insert/delete elsewhere
Dictionaries: O(1) average for key lookups, insertions, and deletions
Sets: O(1) average for membership testing, adding, removing

# Example: Different approaches to counting word frequencies

text = "to be or not to be that is the question"
words = text.split()

# Using a list (inefficient for counting)
def count_with_list(words):
    counts = []
    for word in words:
        found = False
        for item in counts:
            if item[0] == word:
                item[1] += 1
                found = True
                break
        if not found:
            counts.append([word, 1])
    return counts

# Using a dictionary (efficient)
def count_with_dict(words):
    counts = {}
    for word in words:
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1
    return counts

# Even more concise with collections.Counter
from collections import Counter
def count_with_counter(words):
    return Counter(words)

# Compare results
print(count_with_list(words))
print(count_with_dict(words))
print(dict(count_with_counter(words)))

Practice Exercise: Financial Portfolio Tracker

Let’s apply these data structures by creating a simple portfolio tracker:

def portfolio_tracker():
    # Initialise portfolio using a dictionary of dictionaries
    portfolio = {}
    transactions = []
    
    while True:
        print("\nPortfolio Tracker")
        print("1. Add stock")
        print("2. Record transaction")
        print("3. View portfolio")
        print("4. View transaction history")
        print("5. Top holdings")
        print("6. Exit")
        
        choice = input("Enter your choice (1-6): ")
        
        if choice == "1":
            ticker = input("Enter stock ticker symbol: ").upper()
            company = input("Enter company name: ")
            sector = input("Enter sector: ")
            
            portfolio[ticker] = {
                "company": company,
                "sector": sector,
                "shares": 0,
                "cost_basis": 0,
                "current_price": float(input("Enter current price: $"))
            }
            print(f"Added {ticker} to portfolio.")
            
        elif choice == "2":
            ticker = input("Enter stock ticker symbol: ").upper()
            if ticker not in portfolio:
                print(f"Error: {ticker} not in portfolio. Add it first.")
                continue
                
            transaction_type = input("Buy or sell? ").lower()
            shares = int(input("Number of shares: "))
            price = float(input("Price per share: $"))
            date = input("Date (YYYY-MM-DD): ")
            
            # Record transaction
            transaction = (date, ticker, transaction_type, shares, price)
            transactions.append(transaction)
            
            # Update portfolio
            if transaction_type == "buy":
                # Calculate new cost basis
                current_shares = portfolio[ticker]["shares"]
                current_basis = portfolio[ticker]["cost_basis"]
                new_shares = current_shares + shares
                
                if new_shares > 0:  # Avoid division by zero
                    new_basis = (current_shares * current_basis + shares * price) / new_shares
                else:
                    new_basis = 0
                
                portfolio[ticker]["shares"] += shares
                portfolio[ticker]["cost_basis"] = new_basis
            
            elif transaction_type == "sell":
                if portfolio[ticker]["shares"] < shares:
                    print("Error: Not enough shares.")
                    transactions.pop()  # Remove the last transaction
                    continue
                portfolio[ticker]["shares"] -= shares
            
            print(f"Transaction recorded. You now have {portfolio[ticker]['shares']} shares of {ticker}.")
            
        elif choice == "3":
            if not portfolio:
                print("Portfolio is empty.")
                continue
                
            print("\nCurrent Portfolio:")
            print(f"{'Ticker':<6} {'Company':<20} {'Sector':<15} {'Shares':<8} {'Cost Basis':<12} {'Current':<9} {'Value':<12} {'Gain/Loss':<10}")
            print("-" * 100)
            
            total_value = 0
            for ticker, data in portfolio.items():
                shares = data["shares"]
                if shares > 0:  # Only show stocks we still own
                    cost_basis = data["cost_basis"]
                    current = data["current_price"]
                    value = shares * current
                    gain_loss = value - (shares * cost_basis)
                    
                    print(f"{ticker:<6} {data['company'][:20]:<20} {data['sector'][:15]:<15} {shares:<8} ${cost_basis:<10.2f} ${current:<7.2f} ${value:<10.2f} ${gain_loss:<8.2f}")
                    total_value += value
            
            print("-" * 100)
            print(f"Total Portfolio Value: ${total_value:.2f}")
            
        elif choice == "4":
            if not transactions:
                print("No transactions recorded.")
                continue
                
            print("\nTransaction History:")
            print(f"{'Date':<12} {'Ticker':<6} {'Type':<6} {'Shares':<8} {'Price':<8} {'Total':<12}")
            print("-" * 55)
            
            for date, ticker, trans_type, shares, price in transactions:
                total = shares * price
                print(f"{date:<12} {ticker:<6} {trans_type:<6} {shares:<8} ${price:<6.2f} ${total:<10.2f}")
            
        elif choice == "5":
            if not portfolio:
                print("Portfolio is empty.")
                continue
                
            # Use a list to sort holdings by value
            holdings = []
            for ticker, data in portfolio.items():
                if data["shares"] > 0:
                    value = data["shares"] * data["current_price"]
                    holdings.append((ticker, data["company"], value))
            
            # Sort by value (descending)
            holdings.sort(key=lambda x: x[2], reverse=True)
            
            print("\nTop Holdings:")
            print(f"{'Rank':<5} {'Ticker':<6} {'Company':<20} {'Value':<12}")
            print("-" * 45)
            
            for i, (ticker, company, value) in enumerate(holdings[:5], 1):
                print(f"{i:<5} {ticker:<6} {company[:20]:<20} ${value:<10.2f}")
            
        elif choice == "6":
            print("Thank you for using Portfolio Tracker!")
            break
        
        else:
            print("Invalid choice. Please try again.")

# Run the application
if __name__ == "__main__":
    portfolio_tracker()

This example demonstrates:

Dictionaries for storing portfolio data
Lists for transaction history
Tuples for individual transactions
Sorting and filtering data
Calculating values based on stored data

What’s Next?

Now that we’ve covered Python’s core data structures, the next post will explore functions, modules, and file I/O—essential tools for organising your code and working with external data.

Stay tuned for Post 5: Functions, Modules & File I/O!

This post is part of my journey learning Python. I’m a chartered accountant exploring programming to enhance my analytical toolkit. If you have questions or spot any errors, please reach out.

Post 4: Core Data Structures#

1. Lists#

1.1 Creating Lists#

1.2 Accessing List Elements#

1.3 Modifying Lists#

1.4 Useful List Operations#

1.5 Nested Lists (2D Lists)#

2. Tuples#

2.1 Creating Tuples#

2.2 Accessing Tuple Elements#

2.3 Tuple Operations#

2.4 Why Use Tuples?#

3. Dictionaries#

3.1 Creating Dictionaries#

3.2 Accessing Dictionary Values#

3.3 Modifying Dictionaries#

3.4 Useful Dictionary Operations#

3.5 Nested Dictionaries#

4. Sets#

4.1 Creating Sets#

4.2 Set Operations#

4.3 Common Set Uses#

5. Choosing the Right Data Structure#

5.1 When to Use Each Structure#

5.2 Performance Considerations#

Practice Exercise: Financial Portfolio Tracker#

What’s Next?#

Related Posts

Part 9: Command-Line Tools & Automation with Python

Part 8: Testing & Debugging Python Code

Part 7: Code Quality & Collaboration in Python