Nikhil Prakash

22nd floor, 177 Huntington Ave

Boston, MA 02115

I’m a third year Ph.D. student at Northeastern University, advised by Prof. David Bau. I completed my Bachelor of Engineering from RV College of Engineering, Bangalore, India in fall 2020, with a focus on electrical and computer science.

This summer, I’ll be interning at Apple to work on mechanistic interpretability of LLMs. During my PhD, I’ve interned at Practical AI Alignment and Interpretability Research Group with Dr. Atticus Geiger and SERI-MATS (first phrase) with Neel Nanda. Prior to that, I worked as a visiting scholar at the Max Planck Institute for Security and Privacy, and had stints at Korea Advanced Institute of Science & Technology and Indian Institute of Technology Ropar.

Broadly, my interest lies in understanding the internal mechanisms of deep neural networks to enhance human-AI collaboration and prevent misalignment. Currently, I’m investigating cognitive abilities such as reasoning and theory of mind in large language models.

I have received invaluable support from many people throughout my career, and as a result, I’m always happy to assist others and share insights from my experiences. Please feel free to reach out.

news

Apr, 2025 Oral Presentation of our recent work on Belief Tracking Mechanism in LMs at New England NLP 2025.
Mar, 2025 Reviewing for ICML 2025, COLM 2025, TMLR.
Mar, 2025 PhD Candidacy Achieved!
Mar, 2025 Accepted research internship offer from Apple.
Jan, 2025 Our paper NNsight and NDIF: Democratizing Access to Foundation Model Internals got accepted to ICLR 2025! :tada:
Nov, 2024 Received a complimentary NeurIPS 2024 registration for my service as a reviewer.
Aug, 2024 Reviewing for ICLR 2025.
Jul, 2024 Our paper NNsight and NDIF: Democratizing Access to Foundation Model Internals is on ArXiv!
Jul, 2024 Interning at Practical AI Alignment and Interpretability Research Group with Dr. Atticus Geiger.
Jun, 2024 Reviewing for NeurIPS 2024 (main conference and workshop proposals).
May, 2024 Invited talk at Practical AI Alignment and Interpretability Research Group.
May, 2024 Invited talk at Computational Linguistics and Complex Social Networks in Indian Institute of Technology Gandhinagar.
May, 2024 Attending ICLR 2024 at Vienna in-person :beach_umbrella:!
Apr, 2024 Invited talk at New England NLP 2024.
Apr, 2024 Co-organizing Mechanistic Interpretability Social at ICLR 2024 with Gabriele Sarti.

selected publications

  1. ICLR
    Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
    Prakash, Nikhil, Shaham, Tamar Rott, Haklay, Tal, Belinkov, Yonatan, and Bau, David
    In International Conference on Learning Representations (ICLR) 2024
  2. ICML
    Discovering Variable Binding Circuitry with Desiderata
    Davies, Xander, Nadeau, Max, Prakash, Nikhil, Shaham, Tamar Rott, and Bau, David
    In Challenges in Deployable Generative AI Workshop, International Conference on Machine Learning (ICML) 2023