Replaced an LLM's text generation head with one that emits raw machine opcodes. Here are my findings

Follow-up to my previous post about why AI agents should not control machines through text. The idea: every AI agent today generates human text, parses it, then executes it. That's like controlling a robot arm by dictating English. Tesla FSD replaced that pattern. Cameras go in, steering commands come out, no text in between. Can we do the same for software? Skip the text, emit machine instructions directly? I took a frozen Qwen 1.5B, ripped off the decoder head, and replaced it with a small cross-attention head (38M params) that emits raw CHIP-8 opcodes.