Am I misunderstanding RAG? I thought it basically meant separate retrieval + generation

Disclaimer: sorry if this post comes out weirdly worded, English is not my main language. I’m a bit confused by how people use the term RAG. I thought the basic idea was: use an embedding model / retriever to find relevant chunks maybe rerank them pass those chunks into the main LLM let the LLM generate the final answer So in my head, RAG is mostly about having a retrieval component and a generator component, often with different models doing different jobs.