Skip to main content

Welcome to OpenRAG 🚀

OpenRAG is a complete, modular, and production-ready RAG (Retrieval-Augmented Generation) solution. It allows you to query your documents using advanced language models with precise and relevant context.

What is a RAG system?

A RAG system combines information retrieval with text generation to provide accurate answers based on your own documents:
  1. Retrieval: Finds relevant passages in your document base
  2. Augmented: Enriches the query with the found context
  3. Generation: Generates a coherent response with an LLM

RAG Workflow

Main Features

Document Upload

PDF, DOCX, TXT, Markdown - Automatic processing

Semantic Search

Advanced vector search with Qdrant

Answer Generation

Ollama, OpenAI, Anthropic Claude

Modular Architecture

Decoupled microservices with Docker

Main Components

OpenRAG consists of 10 Docker services:
ServicePortRole
frontend-user8501User chat interface (Streamlit)
frontend-admin8502Admin panel, upload, stats (Streamlit)
api8000REST API (FastAPI)
orchestrator8001RAG workflow coordination
embedding8002Embeddings generation (sentence-transformers)
ollama11434Local LLM server
qdrant6333Vector database
postgres5432Metadata and history
redis6379Cache and queues
minio9000/9001File storage (S3-compatible)

Use Cases

Create an AI assistant that knows all your internal documents, procedures, and company policies.
Automatically answer questions based on your product documentation and FAQ.
Explore and synthesize large collections of scientific research papers.

Quick Start

1

Clone and Launch

git clone https://github.com/3ntrop1a/openrag.git
cd openrag
sudo docker-compose up -d
2

Download LLM Model

docker exec -it openrag-ollama ollama pull llama3.1:8b
3

Open Chat Interface

Navigate to http://localhost:8501 and ask your first question!
First time? Initial startup takes 10-15 minutes (downloading Docker images + 4.9 GB LLM model)

Next Steps

Technical Characteristics

  • With GPU: 1-3s per query
  • Without GPU: 5-15s per query (after warm-up)
  • Vector search: 100-200ms
  • Indexing: 10-30s per PDF document
  • Horizontally scalable microservices architecture
  • Support for millions of documents
  • Redis for distributed cache
  • PostgreSQL for metadata
  • Qdrant for high-performance vector search
  • Data stored locally (no third-party cloud)
  • Docker service isolation
  • Support for local LLMs (Ollama) for total privacy
  • Compatible with cloud LLMs (OpenAI, Claude) if desired
  • Easy integration of new document formats
  • Multi-LLM support (Ollama, OpenAI, Anthropic)
  • REST API for integration into your applications
  • Multiple collections for document segmentation

Support and Contributions