Steerer Model Goal: make RL for thousands of environments unnecessary. Train a small steerer model to guide a good base coding-model LLM.