Steerer Model

  • Goal: make RL for thousands of environments unnecessary.
    • Train a small steerer model to guide a good base coding-model LLM.