Models degrade silently. The distribution of incoming requests shifts away from training data, labels change meaning, upstream systems change schema — and the model keeps returning confident wrong predictions. Production monitoring catches this before business metrics crater.
Types of Drift
| Drift type | What changes | Detection method |
|---|---|---|
| Data drift (covariate shift) | Input feature distribution P(X) | Statistical tests on feature values |
| Label drift (prior shift) | Target distribution P(Y) | Monitor prediction distribution |
| Concept drift | P(Y|X) relationship | Monitor model accuracy on labeled windows |
| Schema drift | Feature names, types, nullable | Schema validation on every batch |
You need detection mechanisms for all four categories.
Setting Up Evidently
bash
pip install evidently
python
import pandas as pdfrom evidently import ColumnMappingfrom evidently.report import Reportfrom evidently.metric_preset import DataDriftPreset, ClassificationPreset# Reference: data from training periodreference_df = pd.read_parquet("data/reference_window.parquet")# Current: last 24h of production requests (logged by the serving layer)current_df = pd.read_parquet("data/production_2024_03_27.parquet")column_mapping = ColumnMapping( target="label", prediction="prediction", numerical_features=["age", "click_rate", "session_duration"], categorical_features=["country", "device_type"],)report = Report(metrics=[DataDriftPreset(), ClassificationPreset()])report.run(reference_data=reference_df, current_data=current_df, column_mapping=column_mapping)report.save_html("drift_report.html")
Evidently runs the Population Stability Index (PSI) and Kolmogorov-Smirnov test per feature, and Jensen-Shannon divergence on the full feature space.
Extracting Drift Signals Programmatically
python
from evidently.test_suite import TestSuitefrom evidently.tests import TestShareOfDriftedColumnssuite = TestSuite(tests=[ TestShareOfDriftedColumns(lt=0.3), # fail if >30% of columns drift])suite.run(reference_data=reference_df, current_data=current_df, column_mapping=column_mapping)result = suite.as_dict()if not result["summary"]["all_passed"]: print("DRIFT ALERT: retraining trigger fired") trigger_retraining_pipeline()
Logging Predictions for Monitoring
The monitoring pipeline only works if the serving layer logs inputs and outputs: