AI RESEARCH
Disillusionment with mechanistic interpretability research [D]
r/MachineLearning
•
Hey all, apologies if this is the wrong place to post this. I'm currently an undergrad computer scientist that got swept up in the mechanistic interpretability wave c. 2024 or so (sparse autoencoders, attribution graphs) and found it generally promising (and still do); that being said a lot of the new research out of Anthropic (which I understand as the mech interp house) doesn't sit well with me. They recently published a blogpost on so called "natural language autoencoders