File I/O, JSON, CSV & the pathlib API
The Context Manager: Always Use with
When you open a file, the OS gives you a file descriptor — a limited system resource. If your program crashes or you forget to call f.close(), the descriptor leaks. The with statement guarantees the file is closed even if an exception is raised.
The with statement works with any context manager — objects implementing __enter__ and __exit__. Files, database connections, locks, and temporary directories are all context managers.
Reading Modes
| Mode | Meaning | Creates file? | Truncates? |
|------|---------|--------------|-----------|
| r | Read text (default) | No | No |
| w | Write text | Yes | Yes |
| a | Append text | Yes | No |
| r+ | Read + write text | No | No |
| rb | Read binary | No | No |
| wb | Write binary | Yes | Yes |
| x | Exclusive create | Fails if exists | — |
Use binary mode (rb, wb) for images, PDFs, archives, or any file that is not plain text. Use text mode with encoding="utf-8" for all text files.
Writing Files
The pathlib API
pathlib.Path is the modern, object-oriented way to handle file system paths. It is cross-platform (handles Windows vs POSIX separators), composable with /, and replaces os.path for most use cases.
JSON: Encoding and Decoding
JSON is the lingua franca of web APIs and configuration files. Python's json module converts between JSON strings and Python objects.
| JSON | Python |
|------|--------|
| object | dict |
| array | list |
| string | str |
| number | int or float |
| true/false | True/False |
| null | None |
CSV: Reading and Writing Tabular Data
CSV (comma-separated values) is everywhere — databases, spreadsheets, data pipelines. Python's csv module handles quoting, escaping, and different dialects correctly.
configparser and tempfile
PROJECT: Configuration Manager
A production-quality config manager with JSON storage, defaults, nested dot-notation access, and validation.
PROJECT: CSV Data Pipeline
A data pipeline that validates, transforms, and cleans CSV data — a pattern that appears in virtually every data engineering task.
Challenge
Push further with these exercises:
-
Recursive directory scanner — write a function
scan_directory(path, extensions=None)usingpathlib.Path.rglob()that returns a dict mapping each extension to a list of matching file paths. Exclude hidden files (starting with.). -
JSON diff — write
json_diff(a_str, b_str)that compares two JSON objects and returns a dict describing changes: added keys, removed keys, changed values (with old and new). Handle nested dicts recursively. -
CSV streaming aggregator — write a function that reads a very large CSV (simulate with a 10,000-row StringIO) without loading it all into memory. Compute: row count, sum and mean of a numeric column, value counts for a categorical column. Use a single pass.
-
Config merger — extend
ConfigManagerwith amerge(other: ConfigManager)method that deep-merges two configs, with the other config winning on conflicts. Write tests that verify nested dict merging. -
NDJSON writer/reader — implement a
NDJSONWriterandNDJSONReaderclass for newline-delimited JSON (one JSON object per line, used in log streaming). The writer should accept a datetime encoder. The reader should yield decoded objects lazily.
Key Takeaways
- Always use the
withstatement for file I/O — it guarantees cleanup via__exit__even when exceptions occur - Text mode with
encoding="utf-8"is the safe default; only use binary mode when dealing with non-text data pathlib.Pathreplacesos.path— it is composable with/, self-documenting, and cross-platformjson.dumps/loadsandjson.dump/load(with file) are the two patterns you use 99% of the time; custom encoders handle non-serializable types likedatetimecsv.DictReaderandcsv.DictWriterare almost always better than the plain reader/writer — named fields prevent index bugs- Iterate over large files line by line (or row by row with DictReader) — never load a multi-GB file with
read()orreadlines() io.StringIOandio.BytesIOare your best friends for testing file I/O code without touching the real filesystemconfigparseris perfect for.ini-style human-editable configs; use JSON for programmatically generated or deeply nested configs