Fine-tuned Gemma 4 E4B for structured JSON extraction from regulatory docs - 75% to 94% accuracy, notebook + 432 examples included

r/LocalLLaMA
Machine Learning Open Source AI AI Research

Gemma 4 dropped this week so I fine-tuned E4B for a specific task: extracting structured JSON (doc type, obligations, key fields) from technical and regulatory documents. Results on held-out test set: - doc_type accuracy: 75% base → 94% fine-tuned - Hallucinated obligations: 1.25/doc → 0.59/doc - JSON validity: 100% - Field coverage: 100% Setup: - QLoRA 4-bit, LoRA r=16 alpha=16, Unsloth + TRL - 432