Use src layout when publishing to PyPI. Use flat layout for web applications and scripts. The key difference: with src layout, you can only import your package if it's installed, which catches packaging bugs early.
pyproject.toml — The Modern Standard
pyproject.toml replaces setup.py, setup.cfg, requirements.txt, and MANIFEST.in:
Install in development mode: pip install -e ".[dev]". The -e flag installs a symlink to your source, so edits take effect immediately without reinstalling.
FastAPI — Async ML Serving
FastAPI generates OpenAPI docs automatically, validates requests via Pydantic, and handles async operations natively:
Click Run to execute — Python runs in your browser via WebAssembly
Performance Profiler — 5 Implementations
Click Run to execute — Python runs in your browser via WebAssembly
Concurrency in FastAPI
python
from fastapi import FastAPI, BackgroundTasksimport asyncioapp = FastAPI()# Async endpoints: release event loop during I/O waits@app.post("/predict/async")async def predict_async(request: PredictRequest): # Non-blocking: event loop handles other requests while waiting result = await run_prediction_in_thread(request) return result# CPU-bound: offload to thread pool to avoid blocking event loopfrom fastapi.concurrency import run_in_threadpool@app.post("/predict/cpu")async def predict_cpu(request: PredictRequest): # run_in_threadpool runs blocking code in a thread pool result = await run_in_threadpool(blocking_predict, request) return result# Background tasks: return immediately, work continues after response@app.post("/train")async def trigger_training(background_tasks: BackgroundTasks): background_tasks.add_task(retrain_model_job, dataset_path="data/new.csv") return {"status": "training started"}async def retrain_model_job(dataset_path: str): """Runs after the response is sent.""" await asyncio.sleep(0) # Yield once to allow response to flush # ... long training job ...
Layer caching matters: put slowly-changing content (dependencies) before rapidly-changing content (code). The COPY pyproject.toml + RUN pip install layer is cached unless pyproject.toml changes.
Key Takeaways
Use src/ layout for libraries, flat layout for applications — this prevents silent import bugs in packaging
pyproject.toml is the single source of truth for project metadata, dependencies, and tool configuration
FastAPI's lifespan context manager is the correct place to load ML models — not module-level globals
Pydantic v2 validators (@field_validator, @model_validator) are the first line of defense against bad input
Never use print() in production — structured JSON logs are queryable by log aggregation systems
Profile with cProfile for CPU, tracemalloc for memory — never optimize without measuring first
NumPy vectorization can be 100x faster than Python loops; use np.dot() for sum-of-products patterns
Docker layer order is a performance decision: COPY requirements before COPY source maximizes cache hits