Use Prompt Caching to Reduce Input Tokens with Claude
Towards AI
•
Generative AI
Image by author How to Save Time and Money on Repeated LLM Calls with Ephemeral Caching The Problem A large prompt can rapidly incur costs due to the model charging per output and input tokens. During Prompt development, or prompt engineering, an iterative process takes place of designing, refining, and optimizing the prompt to guide a Large Language Model to produce desirable outputs. It’s during this time especially when high costs can creep up on you, especially if you also have a large context or knowledge base.