Hello

I spend my time thinking about how to make powerful AI systems honest and aligned to human values. I work on things like honesty training, character training, dangerous capability evaluations and control.

I’m currently an Anthropic Fellow working on alignment research with Jon Kutasov, Sara Price and Sam Marks. Previously, I was the Program Lead and TA of ARENA, a ML engineering program for upskilling people in doing technical AI safety work. Before that I was the director of Cambridge AI Safety Hub, where I founded the research program MARS and led upskilling ML programs like CaMLAB. I have a MSc in machine learning from UCL and a BA Hons in psychology & neuroscience from the University of Cambridge.

Publications & other work