Steerer Project

1. Transformer Internals and Latent Spaces

You cannot steer a model if you do not understand its internal geometry. You must move beyond using LLMs via APIs and understand how tokens are converted into dense mathematical vectors (embeddings) and processed through layers.

Crucial Subtopics

  • Hidden State Extraction: How to run a forward pass in PyTorch and extract the intermediate activation vectors (the “latents”) from specific layers (e.g., extracting Layer 18 out of 36).

  • Attention Mechanisms: Understanding cross-attention and self-attention, as the student/steerer needs to attend to the base model’s outputs.

  • Logits and Decoding: Understanding the final layer where vectors are converted back into word probabilities (logits), and how to mathematically intervene during generation (e.g., contrastive decoding, logit biasing).

Diving into the raw architecture and math of these models is a big leap, but it is the right move if you want to understand how they actually work beneath the hood.

Topic 1: Transformer Internals and Latent Spaces

To learn how transformers work, you need a mix of visual intuition, for how the data flows, and raw code, for how the math is applied.

  • The Coding Masterclass: “Let’s build GPT: from scratch, in code, spelled out” by Andrej Karpathy

    • Why it’s useful: Karpathy builds a working Transformer from a blank Python file. This teaches what an embedding is, how self-attention is calculated using matrix multiplication, and how to extract internal latent vectors.
    • Link: Watch on YouTube
  • The Visual Guide: “The Illustrated Transformer” by Jay Alammar

    • Why it’s useful: Alammar’s blog post uses step-by-step color-coded diagrams to show how Query, Key, and Value vectors interact.
    • Link: Read the Blog

Topic 2: Representation and Contrastive Learning

This topic shifts from “predicting the next word” to “understanding the geometry of concepts.” It requires more mathematical grounding.

  • The Theory and Architecture: Yann LeCun’s Deep Learning Course (NYU DS-GA 1008)

    • Why it’s useful: The course covers Energy-Based Models and Self-Supervised Learning, which form the foundation of contrastive learning.
    • Link: Access the Course Material and focus on the Self-Supervised Learning lectures.
  • The Mathematical Breakdown: “Contrastive Representation Learning” by Lilian Weng

    • Why it’s useful: This post breaks down the math behind forcing similar concepts together and pushing dissimilar concepts apart in vector space, including the InfoNCE loss formulation:

2. Representation and Self-Supervised Learning

This is the math behind the JEPA (Joint-Embedding Predictive Architecture) part of the proposal. You need to learn how to train models to measure the distance or similarity between concepts in a continuous vector space, rather than just classifying exact text.

Crucial Subtopics

  • Distance Metrics: Cosine similarity, L1 (Manhattan), and L2 (Euclidean) distances.
  • Contrastive Learning (InfoNCE): The foundation of modern retrieval and representation. You need to understand how to pull matching pairs together in vector space while pushing negative examples apart.
  • Linear Probing: Training simple regressions, such as Ridge or Logistic Regression, on top of frozen embeddings to prove that a specific concept, such as cyclomatic complexity, exists within the vector space.
  • Vector Search: Using libraries like FAISS (Facebook AI Similarity Search) to index and retrieve millions of high-dimensional vectors efficiently.

3. Inference-Time Guidance and Reinforcement Learning

Once you have a model that understands code quality (the student/evaluator), you need to use it to actively steer the generation process.

Crucial Subtopics

  • Outcome Reward Models (ORMs) vs. Process Reward Models (PRMs): Training models to score a final output (ORM) versus scoring every intermediate step or decision (PRM).
  • Guided Decoding / Steered Generation: Techniques like ThinkLogit, where you calculate a delta vector and inject it into the large model’s generation loop without altering its weights.
  • Offline Reinforcement Learning: Using historical data (like accepted vs. rejected Pull Requests) to train a policy, rather than having the agent learn by blindly exploring an environment.

4. Program Analysis and Code Semantics

To train a model on software engineering concepts, you must treat code as structural data, not just flat text.

Crucial Subtopics

  • Abstract Syntax Trees (ASTs): Using tools like tree-sitter to parse code into trees so you can precisely mask out “function bodies” or “class methods” rather than random character strings.
  • Software Metrics: Understanding cyclomatic complexity, inter-module dependencies, and codebase churn.

5. Distributed Deep Learning Engineering

Implementing this requires managing millions of parameters and massive datasets. You will quickly outgrow a single GPU.

Crucial Subtopics

  • PyTorch DDP (Distributed Data Parallel): Synchronizing gradients across multiple GPUs.
  • Mixed Precision Training: Using bf16 or fp16 to save memory.
  • Gradient Checkpointing: Trading compute for memory to fit large models onto GPUs.