RAG is 80% retrieval and 20% generation.
So if RAG isn’t working, most likely, it’s a retrieval issue, which further originates from chunking and embedding.
Contextualized chunk embedding models solve this.
Press enter or click to view image in full size![]()
In this article, let’s dive in to understand what they are and how they address the common issues with RAG setups.
The problem
In RAG:
Press enter or click to view image in full size![]()
- No chunking drives up token costs
- Large chunks lose fine-grained context
- Small chunks lose global/neighbourhood context
In fact, chunking also involves determining chunk overlap, generating summaries, etc., which are tedious.
There’s another problem!
Despite tuning and balancing tradeoffs, the final chunk embeddings are generated independently with no interaction with each other.
This isn’t true with real-world docs, which have long-range dependencies.
