I am not sure if I should be proud or not.

I managed to get working 4 sub-agents Qwen3.6 35b on dual rtx 3090, I am using deepseek as orchestrator. They all working locally in parallel! Each subagent has a max context of 131072, which is "good" for the task it needs to work with. Once the 4 builders the orchestrator is calling 4 local reviewers just to make sure the job was done correctly. And after everything is passed, the reviewer sub-agent (cloud model will review the whole thing) With this configuration I managed to have very low usage on APIs (I am now using deepseek for being cheap but $20 to chatgpt is also than enough.