A short note on DeepSeek v3.2 Dec 10, 2025
Attention is a differentiable lookup Oct 26, 2025
A systems level understanding of LLM inference process Oct 24, 2025
Initial results from the Reward Hacking Benchmark Jul 20, 2025
Environments are everything Jun 20, 2025
The Human Evaluator's Goodbye Jun 13, 2025
Why I'm working on reward hacking research Jun 13, 2025