I am a Postdoctoral Associate at Rutgers University-Newark, working with William Graves and Kimele Persaud in the Department of Psychology.

Originally trained to probe biological neural circuits, I have since pivoted to exploring artificial neural networks in the context of language modeling—work that is more commonly called “mechanistic interpretability”.

Throughout my research career, I have always been interested in learning. That is: how do neural circuits (biological or otherwise) adapt to patterns of regularity in their inputs, whether that’s the variety of scents and sights in the world around us (my PhD work), or the statistics of language (my postdoctoral work)?

To get at this broad question tractably (for me), my work has focused on identifying the progression of model component specialization, with particular effort applied to attention heads in Transformer-based language models. My approach to interpretability is grounded in rich psycholinguistic and neurophysiology traditions, with a focus on carefully controlling stimuli; developing additional stress-tests (Rivière & Trott, 2025; see also Shapira et al., 2024) to carve out the boundaries of an attention head’s functional scope; and subsequently causally intervening on a given attention head to determine necessity.

Lastly, interpretability benefits substantially from the ability to perform probes at multiple model snapshots (or checkpoints) taken over the course of pretraining. Learning dynamics provide valuable context for the final state of affairs: the point at which the model has been deemed “fully pretrained”. To date, I’ve made use of openly available language models, such as EleutherAI’s Pythia Suite (Biderman et al., 2023) and AI2’s OLMo 2 Suite, which offer a variety of pretraining checkpoints to directly characterize and intervene on!

CV publications

contact information

email

pr693@psychology.rutgers.edu