LLM inference infrastructure for a systems audience

This is an opinionated discussion of the basics of LLM inference and the ecosystem of serving runtimes powering inference today, from a systems design perspective. It was originally written internally at IOP Systems, but thanks to some gentle arm-twisting persuasion, it has been posted in case it is useful to anyone. (Thanks for the push, Yao