From Naïve Retrieval to Sentence Window Retrieval in RAG Systems

Lomanu4 · 11 Май 2025

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

RAG systems? Oh,

they not like us.

They don’t read between the lines.

Why RAG Systems Need Extra Help to Shine

RAG systems might seem magical, but their performance heavily depends on well-structured data and careful design choices.

Here are some of the key factors that affect a RAG system’s effectiveness:

Retrieval Quality – The accuracy and relevance of retrieved documents directly impact the final output. If the input is bad, so is the output. ?
Embedding Model Choice – Think of this as the system’s vocabulary. The better your model understands the meaning behind queries, the better your results. Choose wisely.?
Chunking Strategy – Whether it’s sentence-level, paragraph, or sliding windows, how you split your data affects what context the system retrieves .?
Prompt Design – The art of asking ?️. Clear and well-structured prompts help the system provide more accurate responses.
Feedback Loops & Evaluation – Monitoring and improving them is part of the fun .Tune, test, repeat.?

Why Naïve RAG Falls Short for Fine-Grained Retrieval

Loses Context Precision – Fixed-size chunks (e.g., 500 tokens) often mix unrelated content, confusing the model, leading to less precise answers.
Information Overload – Pulling in large chunks can introduce unnecessary information to the model, making it harder to focus on what actually matters.
Lack of Granularity – Important details can get lost inside bulky text blocks, reducing accuracy.

Sentence Window Retrieval: A Granular Approach

Sentence Window Retrieval enhances traditional retrieval by focusing on individual sentences and their surrounding context. Here's how it improves precision:

Fine-Grained Context: The SentenceWindowNodeParser parses documents into individual sentences and associates each with its surrounding context through metadata.
Post-Processing: The MetadataReplacementPostProcessor allows you to customize the metadata, which can help you track specific parts of the document or modify retrieved information before displaying it.

Here's a sample code snippet demonstrating how to implement Sentence Window Retrieval in your RAG pipeline using LlamaIndex with metadata configuration.

Set Up Environments and Imports

Import the required modules and set up your environment.

import os
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import Document, Settings
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.core import VectorStoreIndex

# LLM Configuration
llm = AzureOpenAI(
model="gpt-4.1",
deployment_name="gpt-4.1",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)

# Azure embedding configuration
embed_model = AzureOpenAIEmbedding(
model="text-embedding-3-large",
deployment_name="text-embedding-3-large",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)

# Set global settings
Settings.llm = llm
Settings.embed_model = embed_model

Parse Documents into Sentence Windows

Use the SentenceWindowNodeParser to parse your documents. This parser extracts sentences and their immediate context, including metadata about the text, such as the original sentence.
- window_size=1: This means one sentence per window.
- window_metadata_key="window”:The key for the window metadata.
- original_text_metadata_key="original_sentence": The original sentence is included in the metadata, preserving the context.

# Sample Product Description

product_description = """
Experience the power of the new MacBook Pro with M3 chip, delivering unprecedented speed and battery life.
With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.
Choose between 14-inch and 16-inch models with up to 96GB of unified memory.
It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.
Now available in Silver and Space Black. Free shipping and trade-in offers available.
"""

# Create a Document object
document = Document(text=product_description)

# Initialize the SentenceWindowNodeParser
parser = SentenceWindowNodeParser.from_defaults(
window_size=1,
window_metadata_key="window",
original_text_metadata_key="original_sentence"
)

# Parse the document into nodes
nodes = parser.get_nodes_from_documents([document])

Build the Vector Index

Create a vector index from the parsed nodes.

# Build the VectorStoreIndex
index = VectorStoreIndex(nodes)

Set Up the Query Engine

Configure the query engine with a postprocessor to handle sentence windows.

# Initialize the MetadataReplacementPostProcessor

post_processor = MetadataReplacementPostProcessor(target_metadata_key="window")

# Create a query engine with the post-processor
query_engine = index.as_query_engine(
similarity_top_k=3,
node_postprocessors=[post_processor],
llm=llm
)

Query the Engine

Now, query your engine.

# Perform a query

query = "What are the display features of the MacBook?"
response = query_engine.query(query)

print(f"Response: {response}")
print(f"Length: {len(response.source_nodes)}")

for i, node in enumerate(response.source_nodes):
print(f"Node {i+1}:")
print("Text:", node.text)
print("Metadata:", node.metadata)

Response: The MacBook features a stunning Liquid Retina XDR display.
Length: 3

Node 1:
Text: Choose between 14-inch and 16-inch models with up to 96GB of unified memory.
It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.
Now available in Silver and Space Black.
Metadata: {'window': 'Choose between 14-inch and 16-inch models with up to 96GB of unified memory.\n It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.\n Now available in Silver and Space Black. ', 'original_sentence': 'It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.\n'}

Node 2:
Text:
Experience the power of the new MacBook Pro with M3 chip, delivering unprecedented speed and battery life.
With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.
Choose between 14-inch and 16-inch models with up to 96GB of unified memory.

Metadata: {'window': '\nExperience the power of the new MacBook Pro with M3 chip, delivering unprecedented speed and battery life.\n With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.\n Choose
between 14-inch and 16-inch models with up to 96GB of unified memory.\n', 'original_sentence': 'With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.\n'}

Node 3:
Text: With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.
Choose between 14-inch and 16-inch models with up to 96GB of unified memory.
It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.

Metadata: {'window': 'With a stunning Liquid Retina XDR display and up to 18 hours of battery, it’s designed for professionals on the go.\n Choose between 14-inch and 16-inch models with up to 96GB of unified memory.\n It includes a 1080p FaceTime HD camera, a six-speaker sound system, and advanced thermal architecture.\n', 'original_sentence': 'Choose between 14-inch and 16-inch models with up to 96GB of unified memory.\n'}

Final Thoughts

While basic chunk-based retrieval works for simpler use cases, Sentence Window Retrieval shines in domains that require high precision such as legal, medical, or technical fields.

It helps:

Reduce hallucinations.
Focus generation on the most relevant context.
Align responses more closely with user intent.

Of course, no retrieval strategy is a silver bullet. The key is selecting the right method for your specific project needs and continuously refining it through better chunking, smarter prompts, and robust feedback loops.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

From Naïve Retrieval to Sentence Window Retrieval in RAG Systems

Lomanu4