Simple RAG Implementation with LangChain
1. Introduction to RAG
RAG (Retrieval Augmented Generation) is an architectural pattern that combines retrieval systems with large language models to enhance the accuracy and reliability of AI responses.
2. System Architecture
The RAG system consists of two main phases:
- Indexing Phase (Offline Processing)
- Query Phase (Online Processing)
3. Detailed Implementation
3.1 Environment Setup
python
# Install required packages
!pip install langchain chromadb ollama sentence_transformers
# Import required libraries
from langchain.document_loaders import DirectoryLoader, PDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import Ollama
from langchain.prompts import PromptTemplate
3.2 Document Processing
3.2.1 Document Loading and Splitting
python
# Configure document loader
loader = DirectoryLoader(
"./docs",
glob="**/*.pdf",
loader_cls=PDFLoader
)
documents = loader.load()
# Configure text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)
Document processing flow:
3.3 Vectorization and Storage
python
# Initialize embedding model
embeddings = OllamaEmbeddings(model="qwen")
# Create vector store
vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings,
persist_directory="./chroma_db"
)
Vectorization flow:
3.4 Prompt Template Design
python
# Create prompt template
prompt_template = """Use the following context to answer the question. If you don't know the answer, say you don't know.
Context: {context}
Question: {question}
Answer:"""
PROMPT = PromptTemplate(
template=prompt_template,
input_variables=["context", "question"]
)
Prompt template processing flow:
3.5 Retrieval Chain Configuration
python
# Create retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=Ollama(model="qwen"),
chain_type="stuff",
retriever=vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 3}
),
chain_type_kwargs={
"prompt": PROMPT
},
return_source_documents=True
)
Retrieval chain execution flow:
3.6 Query Interface Implementation
python
class RAGSystem:
def __init__(self, qa_chain):
self.qa_chain = qa_chain
def query(self, question: str) -> dict:
"""
Process user query
Args:
question: User question
Returns:
dict: Dictionary containing answer and source documents
"""
try:
result = self.qa_chain({
"query": question
})
return {
"answer": result["result"],
"sources": [
{
"content": doc.page_content,
"metadata": doc.metadata
} for doc in result["source_documents"]
]
}
except Exception as e:
return {
"error": f"Query processing failed: {str(e)}"
}
4. Usage Example
python
# Create RAG system instance
rag_system = RAGSystem(qa_chain)
# Example query
question = "What is machine learning?"
response = rag_system.query(question)
print("Answer:", response["answer"])
print("\nReference Source Documents:")
for source in response["sources"]:
print(f"- {source['content'][:100]}...")
5. Performance Optimization Tips
Document Splitting Optimization
- Adjust chunk size based on document characteristics
- Maintain appropriate overlap
- Consider semantic completeness
Vector Retrieval Optimization
- Adjust k value
- Implement retrieval caching
- Consider hybrid retrieval strategies
Prompt Engineering Optimization
- Optimize prompt templates
- Add system instructions
- Handle edge cases
6. Complete System Flow
Summary
Advantages of implementing RAG system with LangChain:
- Modular design, easy to extend
- Rich component selection
- Simplified interface calls
- Comprehensive documentation support
Through proper configuration and optimization, you can build an efficient and reliable RAG system that enhances the accuracy and usability of AI applications.