Skip to main content

Regression and Classification

Learning Objectives

By the end of this lesson, you will be able to:

  • Distinguish regression, binary classification, and multiclass classification problems.
  • Choose metrics that match the product decision a model will support.
  • Train and evaluate baseline regression and classification models with scikit-learn.
  • Explain probability thresholds, error tradeoffs, and when the same product question can be framed more than one way.

Watch First

Problem Framing Map

Regression and classification are not just algorithm categories. They are ways of translating a real product question into a target a model can learn.

In a Flow Research-style AI product, you might ask:

  • Which learners may need mentor support?
  • What completion score should we expect next week?
  • Which governance proposals need extra review?
  • Which protocol events look unusual?

The modeling decision starts with the target:

Target shapeML framingExample
Continuous numberRegressionPredict a learner's next quiz score
Yes/no labelBinary classificationPredict whether a learner may drop out
Multiple labelsMulticlass classificationPredict low, medium, or high engagement
Ranked probabilityClassification with scoresRank learners by intervention priority
Launch Rule

Start with the decision the model supports, then choose the target and metric. Do not choose a model type first and force the product problem to fit it.

Regression

Regression predicts a numeric value.

Examples:

  • expected quiz score,
  • estimated study time,
  • monthly active contributors,
  • projected protocol transaction volume,
  • expected reward amount.

The simplest regression model learns a function:

y^=f(x)\hat{y} = f(x)

For linear regression:

y^=w1x1+w2x2++wnxn+b\hat{y} = w_1x_1 + w_2x_2 + \dots + w_nx_n + b

The model learns weights that minimize prediction error. A common loss is mean squared error:

MSE=1ni=1n(yiy^i)2MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2

Baseline Regression Example

This example trains a small regression model on synthetic learner-style data.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

X, y = make_regression(
n_samples=300,
n_features=3,
noise=12,
random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42,
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("MAE:", mean_absolute_error(y_test, y_pred))
print("RMSE:", rmse)
print("R2:", r2_score(y_test, y_pred))

Regression Metrics

Use metrics that match how people will use the prediction.

MetricMeaningUse when
MAEAverage absolute error in target unitsStakeholders need a readable error
RMSESquared-error penalty, then square rootLarge mistakes are especially costly
R2Share of variance explainedYou want a high-level fit measure

For example, "average score error is 4.8 points" is easier for a mentor to understand than "MSE is 23.04".

Classification

Classification predicts labels. A binary classifier often returns a probability:

p(y=1x)=σ(wTx+b)p(y=1 \mid x) = \sigma(w^Tx + b)

where the sigmoid function is:

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

The model can convert that probability into a label using a threshold:

y^={1if pt0if p<t\hat{y} = \begin{cases} 1 & \text{if } p \geq t \\ 0 & \text{if } p < t \end{cases}

The threshold t is a product decision, not only a math detail.

Baseline Classification Example

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

X, y = make_classification(
n_samples=300,
n_features=4,
n_informative=3,
n_redundant=0,
class_sep=1.2,
random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42,
stratify=y,
)

classifier = LogisticRegression(max_iter=1000)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Classification Error Map

In a learner-support model:

  • A false positive may waste mentor time.
  • A false negative may miss a learner who needed help.

That tradeoff decides whether precision, recall, or F1 matters most.

MetricQuestion it answers
AccuracyHow often is the model correct overall?
PrecisionWhen the model raises an alert, how often is it right?
RecallOf all real cases, how many did the model catch?
F1What is the balance between precision and recall?
ROC-AUCHow well does the model rank positives above negatives?

Same Problem, Different Framing

The same product idea can often be framed multiple ways.

Product questionRegression framingClassification framing
Learner supportPredict dropout risk score from 0 to 100Predict support-needed yes/no
Governance qualityPredict proposal health scorePredict review-needed yes/no
Course progressPredict next scorePredict low/medium/high progress

Choose regression when:

  • the numeric value itself matters,
  • stakeholders can act on magnitude,
  • ranking or forecasting is the core use case.

Choose classification when:

  • the product needs a discrete decision,
  • actions are label-based,
  • precision/recall tradeoffs matter.

Threshold Tuning

The default threshold for binary classification is often 0.5, but that is not always best.

from sklearn.metrics import precision_score, recall_score

probabilities = classifier.predict_proba(X_test)[:, 1]

for threshold in [0.3, 0.5, 0.7]:
custom_pred = (probabilities >= threshold).astype(int)
precision = precision_score(y_test, custom_pred)
recall = recall_score(y_test, custom_pred)
print(threshold, "precision:", precision, "recall:", recall)

Lowering the threshold usually catches more positives but creates more false alarms. Raising it usually reduces false alarms but misses more positives.

For public-good ML systems, threshold choice should be documented because it encodes a policy preference.

Common Mistakes

Optimizing the Wrong Metric

Accuracy can look good when the positive class is rare. If only 5% of learners drop out, a model that predicts "no dropout" for everyone is 95% accurate and still useless.

Treating Probabilities as Certainty

A probability of 0.72 is not a fact. It is a model estimate and should be used with uncertainty in mind.

Hiding the Product Tradeoff

False positives and false negatives have different costs. Make the tradeoff explicit with stakeholders before launch.

Practical Exercises

Exercise 1: Frame the Same Problem Twice

Choose a Flow Research-style problem and write:

  • one regression framing,
  • one classification framing,
  • one metric for each.

Exercise 2: Train Both Baselines

Use the two code examples above. Change the dataset sizes, noise, and class separation. Observe how the metrics change.

Exercise 3: Tune a Threshold

Train a classifier, compute probabilities, and compare precision and recall at thresholds 0.2, 0.5, and 0.8.

Self-Assessment

Rate yourself from 1 to 5:

  • I can identify whether a target is regression or classification.
  • I can choose a metric based on product consequences.
  • I can train baseline scikit-learn models for both problem types.
  • I can explain why threshold choice matters.

Further Reading

Next Steps

Next, study feature engineering. Better targets and metrics help you frame the problem; better features help the model learn the signal.