Skip to main content

Graph Neural Networks

Watch First

Learning Objectives

By the end of this lesson, you will be able to:

  • Represent relational ML problems as nodes, edges, and features.
  • Explain message passing and neighborhood aggregation.
  • Compare GCN, GAT, and GraphSAGE-style models.
  • Implement a small graph aggregation step and decide when a GNN is justified.

Graph Learning Map

Many ML problems are not just rows or sequences. They are networks.

A graph neural network learns from entities and the relationships between them. In a Flow Research-style system, this might mean learners connected to lessons, contributors connected to proposals, or services connected through dependencies.

Launch Rule

Reach for a GNN when the edges are meaningful. If the links are arbitrary or weak, a simpler tabular or sequence model may be better.

Graph Basics

A graph is usually written:

G=(V,E)G = (V, E)

where:

  • V is a set of nodes,
  • E is a set of edges.

Nodes can have feature vectors:

xvRdx_v \in \mathbb{R}^d

Edges can also have attributes:

  • type,
  • weight,
  • timestamp,
  • direction.

Example: Learning Path Graph

In this graph:

  • lessons are nodes,
  • prerequisite relationships are edges,
  • node features can include difficulty, topic, estimated time, or completion rate.

A GNN could learn lesson embeddings that understand both content and position in the curriculum.

Message Passing

The core GNN operation is message passing.

For node v, the model gathers information from its neighbors:

hv(k+1)=UPDATE(hv(k),AGGREGATE({hu(k):uN(v)}))h_v^{(k+1)} = UPDATE\left(h_v^{(k)}, AGGREGATE\left(\{h_u^{(k)} : u \in N(v)\}\right)\right)

Read it as:

  1. Look at the current node embedding.
  2. Collect neighbor embeddings.
  3. Aggregate them with mean, sum, max, or attention.
  4. Update the node embedding.

After one layer, each node knows about immediate neighbors. After two layers, it can include neighbors-of-neighbors.

Message Passing From Scratch

This pure NumPy example performs one neighborhood aggregation step.

import numpy as np

# Nodes: 0=Math, 1=Pipelines, 2=Lifecycle, 3=Regression
adjacency = np.array([
[1, 1, 0, 0],
[1, 1, 1, 0],
[0, 1, 1, 1],
[0, 0, 1, 1],
], dtype=float)

features = np.array([
[1.0, 0.0], # math-heavy
[0.8, 0.4], # data-heavy
[0.4, 0.8], # lifecycle-heavy
[0.9, 0.7], # modeling-heavy
])

degree = adjacency.sum(axis=1, keepdims=True)
neighbor_mean = adjacency @ features / degree

W = np.array([
[0.7, -0.2],
[0.3, 0.8],
])

new_embeddings = np.maximum(0, neighbor_mean @ W)

print(np.round(new_embeddings, 3))

This is not a full GNN training loop, but it shows the central operation: each node is updated from its neighborhood.

Common GNN Families

Graph Convolutional Networks

GCNs use normalized neighbor aggregation. A common update pattern is:

H(k+1)=σ(D~12A~D~12H(k)W(k))H^{(k+1)} = \sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(k)}W^{(k)})

where:

  • A is the adjacency matrix,
  • D is the degree matrix,
  • H is the node embedding matrix,
  • W is a learned weight matrix.

GCNs are a strong baseline for node classification and graph representation.

Graph Attention Networks

GATs learn how much attention each neighbor deserves.

hv=uN(v)αvuWhuh_v' = \sum_{u \in N(v)} \alpha_{vu}Wh_u

where alpha is a learned attention weight.

Use attention when some neighbors matter more than others.

GraphSAGE-Style Models

GraphSAGE samples neighborhoods and learns aggregators. This helps scale to large graphs where using every neighbor is expensive.

Use GraphSAGE-style approaches when:

  • graphs are large,
  • nodes have many neighbors,
  • new nodes arrive over time.

Task Types

TaskWhat the model predictsExample
Node classificationLabel for each nodelearner risk, lesson difficulty
Link predictionWhether an edge should existmentor match, related lesson
Graph classificationLabel for whole graphproposal quality, workflow risk
Node regressionNumeric value for each nodeexpected completion score

GNNs vs. Transformers

Transformers and GNNs both move information between elements, but they start from different assumptions.

ModelRelationship structureBest fit
TransformerDense attention over tokenstext, code, sequence context
GNNExplicit graph edgesnetworks, dependencies, relational data

A transformer can be seen as using attention over a mostly dense graph of tokens. A GNN usually respects explicit edges.

Practical Design Checklist

Before building a GNN, answer:

  • What are the nodes?
  • What are the edges?
  • Are edges directed or undirected?
  • Are there edge types or weights?
  • What features are available at prediction time?
  • Is the task node-level, edge-level, or graph-level?
  • What baseline will you compare against?

If you cannot define meaningful edges, the GNN probably has no useful graph to learn from.

Practical Exercises

Exercise 1: Build a Graph Schema

Choose a Flow Research-style system and define nodes, edges, features, and prediction target.

Exercise 2: Run the NumPy Aggregator

Modify the adjacency matrix and observe how node embeddings change.

Exercise 3: Compare Against a Baseline

For your graph problem, write down a non-GNN baseline and explain what the graph is expected to add.

Self-Assessment

Rate yourself from 1 to 5:

  • I can represent a domain as a graph.
  • I can explain message passing.
  • I can compare GCNs, GATs, and GraphSAGE.
  • I can decide whether graph structure is meaningful enough for a GNN.

Further Reading

Next Steps

Next, study reinforcement learning. GNNs learn from relationships; RL learns from interaction and feedback over time.