Implementing RAG using Langchain Ollama and Chainlit on Windows using WSL

Plaban Nayak
AI Planet
Published in
15 min readNov 11, 2023

--

What is Ollama ?

Ollama empowers you to acquire the open-source model for local usage. It automatically fetches models from optimal sources and, if your computer has a dedicated GPU, it seamlessly employs GPU acceleration without requiring manual configuration. Customizing the model is easily achievable by modifying the prompt, and Langchain is not a prerequisite for this. Additionally, Ollama can be accessed as a docker image, allowing you to deploy your personalized model as a docker container.

How Ollama works ?

  • Ollama allows you to run open-source large language models, such as Llama 2,Mistral,etc locally.
  • Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.
  • It optimizes setup and configuration details, including GPU usage.

How to install Ollama ?

At present Ollama is only available for MacOS and Linux. For Windows users we can install Ollama — using WSL2. We can install WSL2 using this link .

Install Ollama in WSL2:

curl https://ollama.ai/install.sh | sh

Start the Ollama Server

ollama serve

Run the desired model

Please check all the available models here. We have chosen LLM as Mistral in this illustration. Open another terminal and download the models of your choice by executing the below command

ollama run mistral

After successful download of the model we can interact with the model for inference. We have now installed Ollama and can use it for RAG Implementation.

Note: All of our local models are automatically served on localhost:11434

What is Chainlit?

Chainlit is an open-source Python package that makes it incredibly fast to build Chat GPT like applications with your own business logic and data.

Primary characteristics:

  • Rapid Construction: Effortlessly incorporate into an existing code base swiftly or commence development from the ground up within minutes.
  • Data Continuity: Harness user-generated data and feedback for enhanced performance.
  • Visualize Complex Reasoning: Gain insight into the intermediate steps leading to a specific outcome with a quick overview.
  • Prompt Refinement: Delve deeply into prompts within the Prompt Playground to pinpoint issues and iterate for improvement

Integrations available:

  • Langchain
  • Haystack
  • LLama-Index
  • Langflow

Here we will use Langchain framework for the RAG implementation

What is Langchain?

LangChain is a freely available framework crafted to streamline the development of applications utilizing large language models (LLMs). It furnishes a standardized interface for chains, extensive integrations with various tools, and complete chains tailored for prevalent applications. This facilitates AI developers in creating applications that leverage the collective power of Large Language Models (LLMs), such as GPT-4, alongside external sources of computation and data. The framework is accompanied by packages for both Python and JavaScript.

What is VectorStore ?

Vector stores are databases specifically crafted to efficiently store and retrieve vector embeddings. They are essential because conventional databases such as SQL are not finely tuned for the storage and retrieval of extensive vector data.

Embeddings denote data, typically unstructured data like text, in numerical vector formats within a high-dimensional space. Traditional relational databases are not aptly designed for the storage and retrieval of these vector representations.

Vector stores have the capability to index and rapidly search for similar vectors using similarity algorithms. This functionality enables applications to identify related vectors based on a provided target vector query.

What is Chroma DB?

Chroma DB is an open-source vector store used for storing and retrieving vector embeddings. Its main use is to save embeddings along with metadata to be used later by large language models. Additionally, it can also be used for semantic search engines over text data.

Primary Characteristics:

  • Supports different underlying storage options like DuckDB for standalone or ClickHouse for scalability.
  • Provides SDKs for Python and JavaScript/TypeScript.
  • Focuses on simplicity, speed, and enabling analysis.
Image from Chroma

Embeddings

GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer. These embeddings are comparable in quality for many tasks with OpenAI.

Speed of embedding generation

The following table lists the generation speed for text document captured on an Intel i913900HX CPU with DDR5 5600 running with 8 threads under stable load.

What is RAG ?

Retrieval-augmented generation (RAG) serves as an artificial intelligence framework designed to enhance the accuracy of responses generated by large language models (LLMs). This is achieved by integrating external sources of knowledge to complement the LLM’s internal representation of information. When applied to a question answering system built on LLM, implementing RAG offers two primary advantages. Firstly, it guarantees the model’s access to the latest and most reliable facts. Secondly, it provides users with visibility into the model’s sources, enabling them to verify the accuracy of its claims and establish trust in the generated responses. It consists of two functions:-

  1. Retrieval: Retrieve the best context from the Vectorstore matching the query
  2. Generation: Based on the Context retrieved formulate a response

RAG Pipeline Implementation Steps:

  1. Read the PDF file.
  2. Consider breaking down a large document into smaller parts or chunks to ensure that the model receives the pertinent information for a given question without overburdening it with excessive resources. This approach is akin to providing the model with only the necessary pieces instead of inundating it with the entire content all at once, considering that each model has a token limit.
  3. Convert these chunks into vector embeddings to enhance the model’s comprehension of the data. Subsequently, create a vector store to efficiently store and retrieve these embeddings as needed, employing the FAISS vector store for this purpose.
  4. With the vector store in place, proceed to query the PDF file using RetrievalQA from Langchain. This involves utilizing the vector store as a retriever and specifying the model to be employed, along with adjusting other parameters based on specific requirements.

Code Implementation

Implementation Stack

  • Chromadb — Vectorstore
  • gpt4all — text embeddings
  • langchain — Framework to facilitate Application Development using LLMs
  • chainlit — Build ChatGPT like interface

Folder Structure

  • data folder: store the required documents in .pdf format
  • Vectorstore/db : Store the data embeddings

Install required dependencies:

pip install chromadb
pip install langchain
pip install BeautifulSoup4
pip install gpt4all
pip install langchainhub
pip install pypdf
pip install chainlit

Upload required Data and load into VectorStore

  • Execute the below script to convert the documents into embeddings and store into chromadb
  • python3 load_data_vdb.py
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain.document_loaders import UnstructuredHTMLLoader, BSHTMLLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings
from langchain.embeddings import OllamaEmbeddings

import os

DATA_PATH="data/"
DB_PATH = "vectorstores/db/"

def create_vector_db():
loader = PyPDFDirectoryLoader(DATA_PATH)
documents = loader.load()
print(f"Processed {len(documents)} pdf files")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts=text_splitter.split_documents(documents)
vectorstore = Chroma.from_documents(documents=texts, embedding=GPT4AllEmbeddings(),persist_directory=DB_PATH)
vectorstore.persist()

if __name__=="__main__":
create_vector_db()

The vector store is created in the vectorstores/db folder

Run the Chatbot Application to generate the response

  • Create the user interface using chainlit.
  • Accept the user query
  • Refer to the vectorstore
  • Retrieve matching context
  • Pass the matching context to the LLM
  • Generate the response

Execute the below command:

chainlit run RAG.py

Code RAG.py

#import required dependencies
from langchain import hub
from langchain.embeddings import GPT4AllEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import chainlit as cl
from langchain.chains import RetrievalQA,RetrievalQAWithSourcesChain
# Set up RetrievelQA model
QA_CHAIN_PROMPT = hub.pull("rlm/rag-prompt-mistral")

#load the LLM
def load_llm():
llm = Ollama(
model="mistral",
verbose=True,
callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
)
return llm


def retrieval_qa_chain(llm,vectorstore):
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
return_source_documents=True,
)
return qa_chain


def qa_bot():
llm=load_llm()
DB_PATH = "vectorstores/db/"
vectorstore = Chroma(persist_directory=DB_PATH, embedding_function=GPT4AllEmbeddings())

qa = retrieval_qa_chain(llm,vectorstore)
return qa

@cl.on_chat_start
async def start():
chain=qa_bot()
msg=cl.Message(content="Firing up the research info bot...")
await msg.send()
msg.content= "Hi, welcome to research info bot. What is your query?"
await msg.update()
cl.user_session.set("chain",chain)

@cl.on_message
async def main(message):
chain=cl.user_session.get("chain")
cb = cl.AsyncLangchainCallbackHandler(
stream_final_answer=True,
answer_prefix_tokens=["FINAL", "ANSWER"]
)
cb.answer_reached=True
# res=await chain.acall(message, callbacks=[cb])
res=await chain.acall(message.content, callbacks=[cb])
print(f"response: {res}")
answer=res["result"]
answer=answer.replace(".",".\n")
sources=res["source_documents"]

if sources:
answer+=f"\nSources: "+str(str(sources))
else:
answer+=f"\nNo Sources found"

await cl.Message(content=answer).send()

Chainlit Interface Snapshot

response: {'query': 'What is ESOPS?', 
'result': '\nESOPs are options given by a company to its employees to buy shares in their company at a predetermined price within a prescribed time period. The purchase price may or may not be at a discount to the ruling/ fair market value of the shares of the company. ESOPs are increasingly being accepted as a reward for employee productivity and have been recognised worldwide as an important tool for employee wealth sharing and motivation. They create a sense of ownership in employees and make them feel part of the organisation.', 'source_documents': [Document(page_content='16\nESOP S\n16.1 Meaning\nEmployee Stock Option Plans or ESOPs are increasingly being accepted as a\nreward for Employee Productivity. Earlier, the use of ESOPs was restricted to\nknowledge -based companies only but now they have spread across the entire\nspectrum o f industries. ESOPs have been recognised the world over as an important\ntool for Employee Wealth Sharing and Employee Motivation. ESOPs create a sense\nof ownership in the employees in relation to their employer company and they feel\nthat they are part of t he organisation. Companies such as Infosys, Wipro, etc., have\nmade several millionaires out of their employees due to ESOPs.\nSimply put, an ESOP is an option (but not an obligation) given by a company to its\nemployees to buy shares in their company at a predetermined price within a\nprescribed time period. The purchase price may or may not be at a discount to the\nruling/ fair market value of the shares of the Company.', metadata={'page': 0, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}), Document(page_content='ESOPs 81\noptions to the trust and the trust then allocates the options as per the\nguidelines of the ESOP.\n16.2 Regulatory framework\n16.2.1 The primary legislation which concerns an ESOP is the Companies Act.\nEvery company which wants to issue an ESOP would need to comply with the\nprovisions of the Act. Depending upon whether the company issuing the ESOPs is a\nlisted company or an unlist ed company the applicable laws would vary. Listed\ncompanies are subjected to the SEBI Guidelines on ESOPs. In addition, in order that\nthese ESOPs remain tax neutral in the hands of the employees, the ESOPs should\nalso comply with the Guidelines issued by t he Central Government u/s. 17(2)(iii) of\nthe Income -tax Act, 1961. Hence, any ESOP by a listed company would be\nsubjected to both these legislations. In case of an unlisted public limited company,\nthe Unlisted Public Companies (Preferential Allotment) Rule s would apply along with', metadata={'page': 1, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}), Document(page_content='ESOPs 85\nindirectly, is more than 51% may pur chase Equity shares of foreign company.\nThe ESOP may be offered directly by the issuing company or through a trust/\na Special Purpose Vehicle/ a subsidiary.\n(d) JV/ WOS abroad\nAn Indian company’s domestic employees and directors may purchase the\nshares of the promoter company’s Joint Venture or Wholly Owned Subsidiary\nabroa d in the field of software, if:\n\uf0b7the maximum purchase consideration does not exceed the ceiling\nstipulated by the RBI from time to time (US$10,000 per employee in a\nblock of five cale ndar years);\n\uf0b7the Shares do not exceed 5% of the paid -up capital of the issuing\nCompany; and\n\uf0b7the post -allotment holding of the Indian promoter company and the\nshares held by the employees must not be lower than the pre -allotment\nholding of the Indian promoter company .\n(e) ADR linked ESOPs\nResident employees and working directors of Indian Companies operating in\nknowledge based sectors can purchase foreign securities under the ADR/',
metadata={'page': 5, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}),
Document(page_content='disclosures.\n(e) The percentage of Equity Capital set aside for the ESOP depends entirely on\nthe company. There is no statutory requirement for this and it varies from\ncompany to company.',
metadata={'page': 2, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'})]}

Query 2

response: {'query': 'What  has been mentioned about the companies act ?', 
'result': "The Companies Act is the primary legislation that concerns an Employee Stock Ownership Plan (ESOP). Listed companies are subjected to SEBI Guidelines on ESOPs and the Central Government's Income-tax Act, 1961. Unlisted public limited companies are governed by the Preferential Allotment Rules, 2003. Private limited companies are only subjected to the Income-tax Guidelines.",
'source_documents': [Document(page_content='ESOPs 81\noptions to the trust and the trust then allocates the options as per the\nguidelines of the ESOP.\n16.2 Regulatory framework\n16.2.1 The primary legislation which concerns an ESOP is the Companies Act.\nEvery company which wants to issue an ESOP would need to comply with the\nprovisions of the Act. Depending upon whether the company issuing the ESOPs is a\nlisted company or an unlist ed company the applicable laws would vary. Listed\ncompanies are subjected to the SEBI Guidelines on ESOPs. In addition, in order that\nthese ESOPs remain tax neutral in the hands of the employees, the ESOPs should\nalso comply with the Guidelines issued by t he Central Government u/s. 17(2)(iii) of\nthe Income -tax Act, 1961. Hence, any ESOP by a listed company would be\nsubjected to both these legislations. In case of an unlisted public limited company,\nthe Unlisted Public Companies (Preferential Allotment) Rule s would apply along with',
metadata={'page': 1, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}),
Document(page_content='whether the company is a listed company or an unlisted company. Further, s. 77 of\nthe Act prohibits a public company from directly or indirectly providing any financial\nassistance for the purpose of or in connection with a purchase of any shares in such\ncompany or in its holding company. However, this prohibition does not apply if, in\naccordance with any scheme for the time being in force, the company has provided\na loan to a Trust for purchase of such shares to be held for the benefit of employees\nincluding working directors. Thus, if the company provides a loan to an ESOP Trust\nfor acqui ring shares under the ESOP and distributing them to the employees, then\ns.77 would not apply. S. 77 also permits the company to give a loan directly to the\nemployees for the purchase/ subscription of its shares. However, such loan is\nrestricted to a maximu m of 6 months’ salary of the employee. It may be noted that',
metadata={'page': 1, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}),
Document(page_content='Hand Book o n Capital Market Regulations 84\n16.2.4 Unlisted Company Rules\nThe Preferential Allotment Ru les, 2003 are applicable to a preferential issue u/s.\n81(1A) of the Companies Act by unlisted public companies. The Rules do not\nprovide an exemption for an ESOP issue and hence, they would need to be\ncomplied with by an unlisted public company instituting and ESOP. The main\nrequirements of these Rules are as follows:\n\uf0b7A Special Resolution of the shareholders is required\n\uf0b7If the issue consists of convertible warrants, then the price of the resultant\nshare s must be determined beforehand\n\uf0b7Several discl osures are required in the Explanatory Statement to the Notice,\nincluding the price or price band at which the allotment is proposed\n\uf0b7The relevant date on the basis of which the price has been arrived at must be\nstated\n\uf0b7The shareholding pattern pre and p ost issue must be stated\n\uf0b7The Special Resolution must be acted upon within 12 months',
metadata={'page': 4, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'}),
Document(page_content='the Income -tax Guidelines. In case a private limited company issues an ESOP, the\nonly applicable legislation would be the Income -tax Guidelines. However, even in\ncase of unlisted private/ public companies, it is advisable that the ESOPs comply\nwith the SEBI Guidelines, because only then can the shares issued under the ESOP\nprior to the IPO be listed. Further, any outstanding options under the ESOP at the\ntime of the IPO would be allowed to remain outstanding only if the scheme is in\naccordance with the SEBI Guidelines. Post -IPO, the company can issue fresh\noptions under its pre -IPO ESOP only if the ESOP is in accordance with the SEBI\nGuidelines.\n16.2.2 Companies Act\nEvery public company, whether listed or unlisted, would need to pass a special\nresolution u/s. 81(1A) of the Act. The Explanatory Statement to the Notice calling the\nmeeting would require certain information which would vary depending upon',
metadata={'page': 1, 'source': 'data/PDFFile5b28ce3c2eb412.05300945.pdf'})]}

Query 3

Response:

response: {'query': 'What is the class aware auto encoder ?', 
'result': 'A Class Aware Autoencoder is a modified version of an autoencoder that incorporates class labels into the reference data during training. This ensures that the features learned are representations of individual data points as well as the corresponding class, leading to better feature extraction for training classifiers. The effectiveness of this method is measured by comparing the accuracy of classifiers trained on features extracted by the Class Aware Autoencoder with those trained on features extracted by traditional autoencoders.', 'source_documents': [Document(page_content='Fig. 7. First Row Shows original training data of CIFAR10, second row\nshows the same images encoded and then decoded using the Class Aware\nAutoencoders\nFig. 8. First Row Shows original training data of CIFAR10, second row shows\nthe same images encoded and then decoded using simple Autoencoders\nSimilarly, Figure 9 shows the output of the class aware\nautoencoder on the training images of the UTKFace dataset\nwhile Figure 10 shows the output of traditional autoencoders\non the same images.\nFig. 9. Original image padded with class indicator in top row with images\nafter encoding and decoding in the bottom row. Class Aware Autoencoder\nwas used\nFig. 10. Training images and images after applying a traditional auto encoder\nC. Training Classifier\nIn order to check the efficacy of the Class Aware Autoen-\ncoders, we use the features extracted to train a classifier. In thiscase, we train two similar classifiers – one with the features\nextracted by the Class Aware Autoencoders and the other',
metadata={'page': 3, 'source': 'data/Class_Aware_Auto_Encoders_for_Better_Feature_Extraction.pdf'}), Document(page_content='Fig. 7. First Row Shows original training data of CIFAR10, second row\nshows the same images encoded and then decoded using the Class Aware\nAutoencoders\nFig. 8. First Row Shows original training data of CIFAR10, second row shows\nthe same images encoded and then decoded using simple Autoencoders\nSimilarly, Figure 9 shows the output of the class aware\nautoencoder on the training images of the UTKFace dataset\nwhile Figure 10 shows the output of traditional autoencoders\non the same images.\nFig. 9. Original image padded with class indicator in top row with images\nafter encoding and decoding in the bottom row. Class Aware Autoencoder\nwas used\nFig. 10. Training images and images after applying a traditional auto encoder\nC. Training Classifier\nIn order to check the efficacy of the Class Aware Autoen-\ncoders, we use the features extracted to train a classifier. In thiscase, we train two similar classifiers – one with the features\nextracted by the Class Aware Autoencoders and the other', metadata={'page': 3, 'source': 'data/Class_Aware_Auto_Encoders_for_Better_Feature_Extraction.pdf'}),
Document(page_content='Class Aware Auto Encoders for Better Feature\nExtraction\nAshhadul Islam\nCollege Of Science & Engineering\nHamad Bin Khalifa University\nDoha, Qatar\naislam@hbku.edu.qaSamir Brahim Belhaouari\nCollege Of Science & Engineering\nHamad Bin Khalifa University\nDoha, Qatar\nsbelhaouari@hbku.edu.qa\nAbstract —In this work, a modified operation of Auto Encoder\nhas been proposed to generate better features from the input\ndata. General autoencoders work unsupervised and learn features\nusing the input data as a reference for output. In our method\nof training autoencoders, we include the class labels into the\nreference data so as to gear the learning of the autoencoder\ntowards the reference data as well as the specific class it belongs\nto. This ensures that the features learned are representations of\nindividual data points as well as the corresponding class. The\nefficacy of our method is measured by comparing the accuracy\nof classifiers trained on features extracted by our models from',
metadata={'page': 0, 'source': 'data/Class_Aware_Auto_Encoders_for_Better_Feature_Extraction.pdf'}), Document(page_content='Class Aware Auto Encoders for Better Feature\nExtraction\nAshhadul Islam\nCollege Of Science & Engineering\nHamad Bin Khalifa University\nDoha, Qatar\naislam@hbku.edu.qaSamir Brahim Belhaouari\nCollege Of Science & Engineering\nHamad Bin Khalifa University\nDoha, Qatar\nsbelhaouari@hbku.edu.qa\nAbstract —In this work, a modified operation of Auto Encoder\nhas been proposed to generate better features from the input\ndata. General autoencoders work unsupervised and learn features\nusing the input data as a reference for output. In our method\nof training autoencoders, we include the class labels into the\nreference data so as to gear the learning of the autoencoder\ntowards the reference data as well as the specific class it belongs\nto. This ensures that the features learned are representations of\nindividual data points as well as the corresponding class. The\nefficacy of our method is measured by comparing the accuracy\nof classifiers trained on features extracted by our models from', metadata={'page': 0, 'source': 'data/Class_Aware_Auto_Encoders_for_Better_Feature_Extraction.pdf'})]}

Conclusion

Here we have illustrated how to perform RAG operation in a fully local environment using Ollama and Lanchain. The speed of inference depends on the CPU processing capacityu and the data load , but all the above inferences were generated within seconds and below 1 minute duration.

Referrences:

connect with me

--

--