Skip to main content
POST
/
query
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the refund policy?",
    "max_results": 3,
    "use_llm": true
  }'
{
  "query_id": "123e4567-e89b-12d3-a456-426614174000",
  "answer": "According to our documents, the refund policy allows returns within 30 days for all unused products. Refunds are processed within 7 business days to the original payment method. Return shipping costs are the customer's responsibility except for defective products.",
  "sources": [
    {
      "document_id": "doc-123",
      "filename": "refund_policy.pdf",
      "chunk_index": 2,
      "relevance_score": 0.94
    },
    {
      "document_id": "doc-456",
      "filename": "terms_conditions.pdf",
      "chunk_index": 8,
      "relevance_score": 0.87
    },
    {
      "document_id": "doc-123",
      "filename": "refund_policy.pdf",
      "chunk_index": 3,
      "relevance_score": 0.82
    }
  ],
  "execution_time_ms": 1234,
  "timestamp": "2024-02-17T10:30:45.123Z"
}

Process Query

Main endpoint for querying the RAG system. Performs semantic search in your documents and generates an intelligent response based on the found context.

Endpoint

POST /query

Request Body

query
string
required
The user’s question or query
collection_id
string
ID of the collection to query (default: all collections)
max_results
integer
default:"5"
Maximum number of source documents to retrieve (1-20)
use_llm
boolean
default:"true"
Use the LLM to generate a response. If false, returns only relevant sources.
metadata_filter
object
Metadata filters to refine the search
{
  "document_type": "pdf",
  "category": "finance"
}

Response

query_id
string
Unique query identifier
answer
string
Response generated by the LLM (null if use_llm=false)
sources
array
List of source documents used
execution_time_ms
integer
Execution time in milliseconds
timestamp
string
ISO 8601 query timestamp

Examples

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the refund policy?",
    "max_results": 3,
    "use_llm": true
  }'
{
  "query_id": "123e4567-e89b-12d3-a456-426614174000",
  "answer": "According to our documents, the refund policy allows returns within 30 days for all unused products. Refunds are processed within 7 business days to the original payment method. Return shipping costs are the customer's responsibility except for defective products.",
  "sources": [
    {
      "document_id": "doc-123",
      "filename": "refund_policy.pdf",
      "chunk_index": 2,
      "relevance_score": 0.94
    },
    {
      "document_id": "doc-456",
      "filename": "terms_conditions.pdf",
      "chunk_index": 8,
      "relevance_score": 0.87
    },
    {
      "document_id": "doc-123",
      "filename": "refund_policy.pdf",
      "chunk_index": 3,
      "relevance_score": 0.82
    }
  ],
  "execution_time_ms": 1234,
  "timestamp": "2024-02-17T10:30:45.123Z"
}

Error Codes

400
Bad Request
Invalid request (missing or incorrect parameters)
500
Internal Server Error
Server error (LLM unavailable, processing error)
504
Gateway Timeout
Request timeout (>60 seconds)

Best Practices

  • Adjust max_results according to your needs (less = faster)
  • Use use_llm=false for pure search without generation
  • Add metadata filters to refine the search
  • Formulate clear and precise questions
  • Use terms specific to your domain
  • Increase max_results for more context (5-10)
{
  "query": "What is the procedure?",
  "metadata_filter": {
    "department": "HR",
    "year": "2024"
  }
}
try:
    response = requests.post(url, json=data, timeout=60)
    response.raise_for_status()
    result = response.json()
except requests.Timeout:
    print("The request took too long")
except requests.HTTPError as e:
    print(f"HTTP Error: {e}")

Limitations

  • Timeout: 60 seconds maximum per query
  • Query length: 1000 characters maximum
  • LLM context: Limited by model’s context window (~2048-4096 tokens)

Technical Notes

Search Process

  1. Query embedding: Conversion to vector (384 dimensions)
  2. Vector search: K-nearest neighbors search in Qdrant (cosine similarity)
  3. Filtering: Application of metadata filters if provided
  4. Relevance threshold: Only results with score > 0.7 are kept
  5. Content retrieval: Getting full text of chunks
  6. LLM generation: Prompt construction and response generation

LLM Prompt Format

Provided context:
Document 1:
[Most relevant chunk content]

Document 2:
[2nd chunk content]

...

Question: [Your question]

Answer the question based ONLY on the context provided above.
If the context does not contain enough information to answer, say so clearly.
Cite the document numbers you use in your answer.

See Also