Data Structures — Lists, Dicts, Sets & Comprehensions
Choosing the Right Container
Every time you store a collection of values, you are making a performance and semantic decision. Python's four built-in containers — list, tuple, dict, and set — each have distinct time complexities, mutability characteristics, and intended use cases. Choosing the wrong one does not just slow your code; it obscures your intent.
This lesson covers each container deeply: how it works internally, when to use it, and the comprehension syntax that makes Python code read like well-written prose.
Lists
A list is a mutable, ordered, dynamically resized array of references. Under the hood, CPython stores a contiguous array of pointers to PyObject structs. This means random access by index is O(1) but inserting at the front is O(n) — the entire array shifts.
Mutation Methods
Sorting with key=
The key parameter is one of Python's most powerful features. It lets you sort by any function of the element without transforming the list itself.
List as Stack and Queue
Big-O Complexity of List Operations
| Operation | Time | Notes |
|-----------|------|-------|
| lst[i] | O(1) | Random access |
| len(lst) | O(1) | Cached |
| lst.append(x) | O(1) amortized | Occasional resize |
| lst.pop() | O(1) | From the end |
| lst.pop(0) | O(n) | From the front — use deque |
| lst.insert(i, x) | O(n) | Shifts elements |
| x in lst | O(n) | Linear scan |
| lst.sort() | O(n log n) | Timsort, stable |
| lst[a:b] | O(k) | k = b - a |
Tuples
A tuple is an immutable, ordered sequence. Once created, you cannot change its elements. This makes tuples suitable for fixed collections: coordinates, RGB values, database rows, function return values, dict keys.
Dictionaries
A dictionary is a hash table — a mutable mapping from keys to values. Key lookup, insertion, and deletion are all O(1) average. Since Python 3.7, dicts preserve insertion order.
For a key to be used in a dict, it must be hashable (immutable and implements __hash__). Strings, numbers, and tuples of hashables work. Lists and dicts do not.
defaultdict and Counter
Sets
A set is an unordered collection of unique hashable elements, implemented as a hash table without values. Membership testing (in) is O(1) — far faster than a list's O(n).
Comprehensions
Comprehensions are syntactic sugar for creating collections by transforming or filtering an iterable. They are faster than equivalent for loops (the loop runs in C), and — when kept simple — far more readable.
List Comprehensions
Dict and Set Comprehensions
Generator Expressions
A generator expression looks like a list comprehension but uses parentheses. It returns a generator — a lazy iterator that produces one value at a time without storing the entire sequence in memory. For large datasets, this is the difference between crashing and running fine.
When to Use Each Data Structure
| Structure | Ordered | Mutable | Duplicates | Key Lookup | Best For |
|-----------|---------|---------|-----------|------------|----------|
| list | Yes | Yes | Yes | O(n) | Ordered sequences, stacks |
| tuple | Yes | No | Yes | O(n) | Fixed records, dict keys |
| dict | Yes (3.7+) | Yes | Keys: No | O(1) | Key-value mapping |
| set | No | Yes | No | O(1) | Deduplication, membership |
| frozenset | No | No | No | O(1) | Immutable sets, dict keys |
| deque | Yes | Yes | Yes | O(n) | FIFO queues, sliding windows |
| Counter | No | Yes | — | O(1) | Frequency counting |
Decision guide:
- Need fast lookup by a key? → dict
- Need to check "is X in collection" frequently? → set
- Need ordered, indexable, and will mutate? → list
- Returning multiple values from a function? → tuple
- Processing a huge stream without loading into memory? → generator
PROJECT: Contact Management System
A full CLI-style contact book demonstrating dicts, lists, sets, and comprehensions working together.
Challenge
Push your skills further with these exercises:
-
LRU Cache — implement an
LRUCache(capacity)class using adictandcollections.deque. It should supportget(key)andput(key, value)in O(1) time, evicting the least recently used item when at capacity. -
Frequency analysis — given a list of strings, use a
defaultdict(Counter)to build a map offirst_letter -> Counter of words. Find which starting letter has the most lexical diversity. -
Set operations for analytics — you have two sets of user IDs:
users_week1andusers_week2. Compute: retained users (in both), churned users (only in week1), new users (only in week2), and the retention rate. -
Comprehension golf — write these as single-line comprehensions:
- All prime numbers below 100
- A dict mapping each word in a sentence to its reverse
- The transpose of a 4x4 matrix
-
Contact book extension — add a
merge(other_book)method that merges two ContactBooks, handling duplicates by keeping the one with more fields filled in.
Key Takeaways
- Lists are O(1) for indexed access and tail operations, but O(n) for arbitrary insertion/deletion — use
dequefor queue operations - Tuples signal immutability and are hashable — they work as dict keys and can represent fixed records
- Dict lookups are O(1) average thanks to hashing — for frequent "is X in collection" checks, always prefer a set or dict over a list
defaultdictandCountersolve 80% of grouping and counting tasks without boilerplate- Comprehensions are faster than
forloops for building collections and express intent more clearly - Generator expressions produce values lazily — always prefer them over list comprehensions when you only need to iterate once
- The
key=parameter insorted()is one of Python's most powerful features — it avoids DSU (Decorate-Sort-Undecorate) patterns - Choose your container by its O-complexity and semantics: the right choice is often obvious once you ask "what operations will I do most?"