Skip to main content

Deployment Patterns

Learning Objectives

By the end of this lesson, you will be able to:

  • Compare batch scoring, online APIs, streaming inference, embedded models, and staged rollout patterns.
  • Choose a deployment pattern based on latency, cost, freshness, privacy, and operational complexity.
  • Sketch a model-serving architecture that connects to CI/CD and monitoring.
  • Implement minimal batch and FastAPI-style serving examples.

Watch First

Deployment Decision Map

Training produces a model artifact. Deployment makes that artifact useful.

The best deployment pattern depends on the product constraint:

  • How fresh must predictions be?
  • How much latency can users tolerate?
  • Where does the data live?
  • How often does the model change?
  • Can the system fall back if the model is unavailable?
Launch Rule

Choose the simplest deployment pattern that satisfies the product need. Real-time serving is powerful, but it is not automatically better than batch scoring.

Pattern 1: Batch Scoring

Batch scoring runs predictions on many records at once and stores the results.

Use batch when:

  • latency is not urgent,
  • predictions feed dashboards or scheduled workflows,
  • cost matters more than real-time freshness,
  • you want simpler operations.

Examples:

  • nightly learner-risk list for mentors,
  • weekly protocol-health report,
  • daily recommendation refresh,
  • monthly contribution scoring.
import joblib
import pandas as pd

model = joblib.load("model_latest.joblib")

data = pd.read_csv("learners_to_score.csv")
X = data[["hours_studied", "quiz_score"]]

data["risk_score"] = model.predict_proba(X)[:, 1]
data.to_csv("scored_learners.csv", index=False)

Batch deployments still need monitoring:

  • Did the job run?
  • Were all records scored?
  • Did the input schema match?
  • Did the prediction distribution shift?

Pattern 2: Online API Serving

Online serving exposes the model through an API.

Use online serving when:

  • a product needs a prediction during a user request,
  • each request is small,
  • prediction freshness matters,
  • latency requirements are clear.

Architecture:

Minimal FastAPI example:

# app.py
import joblib
import numpy as np
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
model = joblib.load("model_latest.joblib")

class LearnerFeatures(BaseModel):
hours_studied: float
quiz_score: float

@app.post("/predict")
def predict(features: LearnerFeatures):
X = np.array([[features.hours_studied, features.quiz_score]])
score = model.predict_proba(X)[0, 1]
return {
"risk_score": float(score),
"needs_support": bool(score >= 0.7),
}

Run locally:

uvicorn app:app --reload

Online serving adds operational concerns:

  • request validation,
  • authentication,
  • rate limiting,
  • logging,
  • latency,
  • rollback,
  • dependency management.

Pattern 3: Streaming Inference

Streaming inference scores events as they arrive.

Use streaming when:

  • event freshness is important,
  • actions depend on near-real-time signals,
  • records arrive continuously.

Examples:

  • suspicious protocol event detection,
  • live recommendation updates,
  • real-time learner intervention triggers.

Streaming is more complex than batch. Start here only when the product really needs it.

Pattern 4: Embedded or On-Device Models

Embedded models run inside a mobile app, browser, desktop app, or edge device.

Use embedded deployment when:

  • connectivity is unreliable,
  • data should stay on-device,
  • latency must be very low,
  • the model is small enough to ship with the app.

Tradeoffs:

BenefitCost
Works offlineHarder to update
Low latencyHarder to monitor
Better privacyModel size constraints
Fewer server callsVersion fragmentation

Examples:

  • offline learner recommendations in a mobile app,
  • local content quality checks,
  • lightweight ranking in a browser.

Pattern 5: Canary, A/B, and Shadow Deployment

Staged rollouts reduce risk when replacing a model.

PatternHow it worksUse when
CanarySend a small percentage to the new modelYou want a cautious rollout
A/B testSplit users into variants and compare outcomesYou need product-level evidence
ShadowRun the new model silently without affecting usersYou want safety checks before exposure

Shadow deployment is especially useful when mistakes could affect people. The new model sees real traffic, but its predictions are logged, not acted on.

Choosing a Pattern

RequirementStrong candidate
Dashboard updated dailyBatch
User waits for prediction in product flowOnline API
Event must be handled immediatelyStreaming
Offline or privacy-sensitive appEmbedded
Replacing a risky modelCanary, A/B, or shadow

Latency targets should be written as service-level objectives.

For example:

P(latency200ms)0.95P(\text{latency} \leq 200ms) \geq 0.95

That means at least 95% of predictions should return within 200 milliseconds.

Connecting Deployment to CI/CD and Monitoring

A launch-ready deployment connects three systems.

Before launch, document:

  • model version,
  • deployment pattern,
  • input contract,
  • output contract,
  • fallback behavior,
  • monitoring metrics,
  • rollback procedure,
  • owner.

Practical Exercises

Exercise 1: Choose a Pattern

Pick one model idea and choose a deployment pattern. Explain why not the other patterns.

Exercise 2: Build Batch Scoring

Train a small classifier, save it with joblib, then write a batch script that scores a CSV.

Exercise 3: Build a Minimal API

Wrap the same model in FastAPI. Return both a probability and a decision threshold.

Exercise 4: Design a Rollout

Write a canary or shadow deployment plan for a model that influences learner support.

Self-Assessment

Rate yourself from 1 to 5:

  • I can compare batch, online, streaming, embedded, and staged rollout patterns.
  • I can choose a deployment pattern based on product constraints.
  • I can sketch a model-serving architecture.
  • I can connect deployment to CI/CD, monitoring, and rollback.

Further Reading

Next Steps

Next, combine the intermediate ideas: frame the problem, engineer features, tune the model, ship it through CI/CD, monitor it, and deploy it using the simplest pattern that serves the product well.