Vision-language models can generate text based on multimodal inputs. However, they have a very limited useful context window. Documents often contain more information than can be processed in a single pass of a transformer. This can lead to poor responses in the best case to hallucinated information in the worst case. Retrieval-Augmented Generation (RAG) is a technique that allows you to equip a large language model with a large knowledge base. While you may have tried to apply this technique for your data with GPT-4-Vision, you may find it incredibly difficult to do with other existing frameworks. Here, we show you how to set this up on your documents using The Pipe and ChromaDB in just 40 lines of Python.
With RAG, documents are stored in a database indexed by their vector embeddings. Indexed this way, contextually relevant information can be given "on-the-fly" to the LLM to improve response quality based on the user's exact query. This guide assumes you already have an understanding of RAG, how it works, and why it's useful (if you don't, click here). Here's a quick guide on how to implement RAG for GPT-4-Vision on your documents using The Pipe and ChromaDB:
The following scripts are designed to be run independently. The first script adds new documents to the vector database, and the second script queries the database to retrieve relevant content and generates a response with GPT-4-Vision.
from thepipe_api import thepipe import chromadb import json def add_documents_to_collection(data_source, collection_name): # Initialize ChromaDB client chroma_client = chromadb.PersistentClient(path="/path/to/save/database") collection = chroma_client.get_or_create_collection(name=collection_name) # Prepare RAG-ready chunks from a data_source messages = thepipe.extract(data_source) chunks = thepipe.core.create_chunks_from_messages(messages) # Embed the text for each chunk, with the prompt message as metadata for i, (chunk, message) in enumerate(zip(chunks, messages)): if chunk.text: collection.add( ids=[data_source + str(i)], documents=[chunk.text], metadatas=[{"message": json.dumps(message)}] ) if __name__ == "__main__": data_source = "https://arxiv.org/pdf/0806.1525.pdf" collection_name = 'vectordb' add_documents_to_collection(data_source, collection_name)
from openai import OpenAI import chromadb import json def query_vector_db(collection_name, query): # Initialize ChromaDB client chroma_client = chromadb.PersistentClient(path="/path/to/save/database") collection = chroma_client.get_collection(name=collection_name) # Retrieve prompt from ChromaDB related to the user query retrieved_messages = collection.query(query_texts=[query], n_results=4)['metadatas'][0] retrieval_messages = [json.loads(md['message']) for md in retrieved_messages] # Prepare prompt message for the user query in OpenAI format user_message = [{"role": "user", "content": [{"type": "text", "text": query}]}] # Generate response from GPT-4-Vision using prompt messages openai_client = OpenAI() response = openai_client.chat.completions.create( model = "gpt-4-turbo", messages = retrieval_messages + user_message ) print(response.choices[0].message.content) if __name__ == "__main__": collection_name = 'vectordb' query = "What probability distributions do turbulent flows follow?" query_vector_db(collection_name, query)
For more details, see The Pipe documentation by clicking here. Happy coding! 🚀