Participated in Stanford Existential Risks Initiative ML Alignment Theory Scholars (SERI-MATS) 2023.
SERI-MATS is a research program aimed at providing support and mentorship to individual who are interested in reducing existential risks from unaligned AI. It has three phases: 1) Training, 2) Research, and 3) Extension. There are multiple mentors associated with the prgram who are experts in various subdomain of AI alignment. Each of those mentors select a handful of participants for the training phase, which last for 1 month. During this phase, participants are expected to read and learn about the research that they are interested in exploring during the next phase.
I was selected in the mechanistic interpretability stream, specifically with Neel Nanda. In addition to reading papers and covering ARENA curriculum, I investigated superposition in attention heads, of GPT2-XL, that extract factual information from the subject token residual stream. Although, the results were not very satisfactory, it was still a great experience. Hopefully, next time I’ll get to the research phase…