Skip to content

Simple RAG Implementation with LangChain

1. Introduction to RAG

RAG (Retrieval Augmented Generation) is an architectural pattern that combines retrieval systems with large language models to enhance the accuracy and reliability of AI responses.

2. System Architecture

The RAG system consists of two main phases:

  1. Indexing Phase (Offline Processing)
  2. Query Phase (Online Processing)

3. Detailed Implementation

3.1 Environment Setup

python
# Install required packages
!pip install langchain chromadb ollama sentence_transformers

# Import required libraries
from langchain.document_loaders import DirectoryLoader, PDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import Ollama
from langchain.prompts import PromptTemplate

3.2 Document Processing

3.2.1 Document Loading and Splitting

python
# Configure document loader
loader = DirectoryLoader(
    "./docs",
    glob="**/*.pdf",
    loader_cls=PDFLoader
)
documents = loader.load()

# Configure text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)

Document processing flow:

3.3 Vectorization and Storage

python
# Initialize embedding model
embeddings = OllamaEmbeddings(model="qwen")

# Create vector store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

Vectorization flow:

3.4 Prompt Template Design

python
# Create prompt template
prompt_template = """Use the following context to answer the question. If you don't know the answer, say you don't know.

Context: {context}

Question: {question}

Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template, 
    input_variables=["context", "question"]
)

Prompt template processing flow:

3.5 Retrieval Chain Configuration

python
# Create retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=Ollama(model="qwen"),
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 3}
    ),
    chain_type_kwargs={
        "prompt": PROMPT
    },
    return_source_documents=True
)

Retrieval chain execution flow:

3.6 Query Interface Implementation

python
class RAGSystem:
    def __init__(self, qa_chain):
        self.qa_chain = qa_chain
    
    def query(self, question: str) -> dict:
        """
        Process user query
        
        Args:
            question: User question
            
        Returns:
            dict: Dictionary containing answer and source documents
        """
        try:
            result = self.qa_chain({
                "query": question
            })
            
            return {
                "answer": result["result"],
                "sources": [
                    {
                        "content": doc.page_content,
                        "metadata": doc.metadata
                    } for doc in result["source_documents"]
                ]
            }
        except Exception as e:
            return {
                "error": f"Query processing failed: {str(e)}"
            }

4. Usage Example

python
# Create RAG system instance
rag_system = RAGSystem(qa_chain)

# Example query
question = "What is machine learning?"
response = rag_system.query(question)

print("Answer:", response["answer"])
print("\nReference Source Documents:")
for source in response["sources"]:
    print(f"- {source['content'][:100]}...")

5. Performance Optimization Tips

  1. Document Splitting Optimization

    • Adjust chunk size based on document characteristics
    • Maintain appropriate overlap
    • Consider semantic completeness
  2. Vector Retrieval Optimization

    • Adjust k value
    • Implement retrieval caching
    • Consider hybrid retrieval strategies
  3. Prompt Engineering Optimization

    • Optimize prompt templates
    • Add system instructions
    • Handle edge cases

6. Complete System Flow

Summary

Advantages of implementing RAG system with LangChain:

  1. Modular design, easy to extend
  2. Rich component selection
  3. Simplified interface calls
  4. Comprehensive documentation support

Through proper configuration and optimization, you can build an efficient and reliable RAG system that enhances the accuracy and usability of AI applications.

License

This article is licensed under CC BY-NC-SA 4.0 . You are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial — You may not use the material for commercial purposes.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Last updated at: