AI SAFETY & ETHICS

Mean field sequence: an introduction

LessWrong AI

This is the first post in a planned series about mean field theory by Dmitry and Lauren (this post was generated by Dmitry with lots of input from Lauren, and a second part should be coming soon). The posts are a combination of an explainer and some original research/ experiments. The goal of this series is to explain an approach to understanding and interpreting model internals which we informally denote "mean field theory" or MFT. In the literature, the closest matching term is "adaptive mean field theory.