[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do frontier math research. My core idea is to build the transformer architecture out of Claudes. The Architecture (Q/K/V Attention) concretely, the setup consists of a single claude controlling a team of claudes through the agent teams feature. The team leader is the “attention head” claude, and the team members are the mlp claudes. ClaudeFormer maps precisely to the transformer. Each component has a direct analog: Workers = MLPs.