thinking about running Gemma4 E2B as a preprocessor before every Claude Code API call. anyone see obvious problems with this?
r/LocalLLaMA
•
Generative AI
NLP
Open Source AI
Background: I write mostly in Korean and my Claude API bill is kind of embarrassing. Korean tokenizes really inefficiently compared to English for the same meaning, so a chunk of the cost is basically just encoding overhead. the idea is a small proxy in Bun that sits in front of the Claude API. Claude Code talks to localhost, doesn't know anything changed. before each request goes out, Gemma4 E2B (llama.cpp, local) would do: - translate Korean input to English.