Quick Start

Install and launch OpenRAG with all its web interfaces in 5 minutes flat.

Prerequisites

Docker 26.0+

Installation: Guide

Docker Compose 2.26+

Included with modern Docker

16 GB RAM minimum

32 GB recommended with GPU

50 GB storage

For Docker images + LLM model (4.9 GB)

Important: The system requires 16 GB RAM minimum to run the llama3.1:8b LLM. See Detailed Requirements for more information.

Installation in 4 Steps

1. Clone the Repository

git clone https://github.com/3ntrop1a/openrag.git
cd openrag

2. Launch All Services

# Start all microservices
sudo docker-compose up -d

What does the stack look like? (docker-compose.yml overview)

services:
  # Infrastructure
  postgres:          # PostgreSQL 16 — document metadata & query history
  redis:             # Redis 7 — cache & task queue
  minio:             # MinIO — S3-compatible file storage (port 9000/9001)
  qdrant:            # Qdrant — vector database (port 6333)
  ollama:            # Ollama — local LLM server (port 11434)

  # Application
  embedding:         # Sentence-transformer embedding service (port 8002)
  orchestrator:      # RAG pipeline orchestration (port 8001)
  api:               # FastAPI REST gateway (port 8000)
  frontend-nextjs:   # Next.js chat + admin panel (port 3000)

  # Monitoring (optional — started with --profile monitoring)
  prometheus:        # Metrics scraping (port 9090)
  grafana:           # Pre-configured dashboards (port 3002)

All services are connected to the openrag-network Docker bridge. Only the ports above are exposed to your host — everything else is internal.

First startup: Downloading Docker images and LLM model (4.9 GB). Allow 10-15 minutes depending on your connection.

3. Verify Everything is Started

# View the status of the 10 services
sudo docker-compose ps

You should see 8 services with Up status:

NAME                       STATUS
openrag-api                Up
openrag-orchestrator       Up
openrag-embedding          Up
openrag-postgres           Up
openrag-redis              Up
openrag-minio              Up
openrag-qdrant             Up
openrag-ollama             Up

4. Download the LLM Model

If you’re using Ollama (default configuration):

docker exec -it openrag-ollama ollama pull llama3.1:8b

Lightweight alternatives: llama3.1:3b (2GB), gemma:2b (1.5GB), phi3:mini (2.3GB)

Downloading the llama3.1:8b model takes 4.9 GB. Allow 5-10 minutes depending on your connection.

Access Web Interfaces

Open your browser and test the interfaces:

User Chat

Main interface - http://localhost:3000Next.js + ShadcnUI chat with markdown rendering

API Swagger

API Documentation - http://localhost:8000/docsTest the REST API interactively

Qdrant Dashboard

Vector database - http://localhost:6333/dashboardExplore indexed vectors

First Test

Option 1: Via Chat Interface (Recommended)

Open the user interface

Navigate to http://localhost:3000

Ask a test question

In the chat, type:

What is OpenRAG and how does it work?

Click “Send” or press Enter.

Observe the response

The system will:

Search in documents (100-200 ms)
Generate a response with the LLM (5-15 s after first load)
Display sources below with relevance scores

Important: The first query takes 70-90 seconds or more (loading LLM model into RAM — CPU mode is always slow).

Option 2: Via REST API (curl)

Check API health

curl http://localhost:8000/health | jq

Expected response:

{
  "status": "healthy",
  "timestamp": "2026-02-18T...",
  "version": "1.1.0",
  "services": {
    "database": "healthy",
    "redis": "healthy",
    "vector_store": "healthy",
    "orchestrator": "healthy"
  }
}

Do a simple search (without LLM)

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "configuration settings",
    "collection_id": "default",
    "max_results": 3,
    "use_llm": false
  }' | jq

Returns similar documents with relevance scores.

Make a query with LLM

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main features described in the documentation?",
    "collection_id": "default",
    "max_results": 5,
    "use_llm": true
  }' | jq -r '.answer'

Every query: 70-90 seconds or more (CPU-only, llama3.1:8b)

Upload Your Own Documents

Via Admin Interface (Recommended)

Open admin panel

http://localhost:3000/admin

Go to Upload

Click “Upload” in the sidebar

Select a PDF file

Click “Browse files”
Choose a PDF
Fill in metadata (optional)
Click “Upload”

Verify processing

Go to “Documents” section
Check status (processing → processed)
Allow 10-30 seconds per document depending on size

Via API

curl -X POST http://localhost:8000/documents/upload \
  -F "file=@my_document.pdf" \
  -F "collection_id=default" \
  -F "metadata={\"category\":\"guide\",\"source\":\"documentation\"}"

MinIO Access (File Storage)

URL: http://localhost:9001
Credentials: admin / admin123456

Important: Change this password before any production deployment!

Useful Commands

View Logs in Real-Time

# All services
sudo docker-compose logs -f

# A specific service
sudo docker-compose logs -f orchestrator
sudo docker-compose logs -f ollama

Restart a Service

sudo docker-compose restart orchestrator

Stop Everything

sudo docker-compose down

Clean Completely (Including Data)

sudo docker-compose down -v  # Also removes volumes

The -v option removes all volumes, including your documents and indexed data!

Next Steps

System Architecture

Understand OpenRAG’s internal workings

Detailed Requirements

GPU configuration, optimizations, production

Tests & Validation

Load tests, performance, quality

API Reference

Complete REST API documentation

Quick Troubleshooting

Services won’t start

# Check logs
sudo docker-compose logs -f

# Check disk space (minimum 50 GB)
df -h

# Check RAM (minimum 16 GB)
free -h

Ollama not responding

# Check if model is downloaded
docker exec -it openrag-ollama ollama list

# If absent, download it
docker exec -it openrag-ollama ollama pull llama3.1:8b

Queries very slow (>75s)

Solution: Use a GPU! See GPU Configuration to go from 70-90s to 1-3s per query.

No results for queries

# Check if documents are processed
curl http://localhost:8000/documents | jq '.documents[] | {filename, status}'

# Status "processed" = ready
# Status "processing" = in progress (wait 10-30s)

​Quick Start

​Prerequisites

Docker 26.0+

Docker Compose 2.26+

16 GB RAM minimum

50 GB storage

​Installation in 4 Steps

​1. Clone the Repository

​2. Launch All Services

​3. Verify Everything is Started

​4. Download the LLM Model

​Access Web Interfaces

User Chat

API Swagger

Qdrant Dashboard

​First Test

​Option 1: Via Chat Interface (Recommended)

​Option 2: Via REST API (curl)

​Upload Your Own Documents

​Via Admin Interface (Recommended)

​Via API

​MinIO Access (File Storage)

​Useful Commands

​View Logs in Real-Time

​Restart a Service

​Stop Everything

​Clean Completely (Including Data)

​Next Steps

System Architecture

Detailed Requirements

Tests & Validation

API Reference

​Quick Troubleshooting

​Services won’t start

​Ollama not responding

​Queries very slow (>75s)

​No results for queries

Quick Start

Prerequisites

Installation in 4 Steps

1. Clone the Repository

2. Launch All Services

3. Verify Everything is Started

4. Download the LLM Model

Access Web Interfaces

First Test

Option 1: Via Chat Interface (Recommended)

Option 2: Via REST API (curl)

Upload Your Own Documents

Via Admin Interface (Recommended)

Via API

MinIO Access (File Storage)

Useful Commands

View Logs in Real-Time

Restart a Service

Stop Everything

Clean Completely (Including Data)

Next Steps

Quick Troubleshooting

Services won’t start

Ollama not responding

Queries very slow (>75s)

No results for queries