Skip to content

OpenAI Assistants API:enables intelligent question-answering over custom document corpora, PDFs, text manuals, engineering files, simulating how field operators can query technical data in real time. Designed with scalable microservices principles, the system can be containerized with Docker.

Notifications You must be signed in to change notification settings

ManibalaSinha/OpenAI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenAI RAG Pipeline – Intelligent Document Q&A System

Author: Manibala Sinha Tech Stack: Python, FastAPI, OpenAI API, FAISS / ChromaDB, LangChain, Docker


Overview

Retrieval-Augmented Generation (RAG) pipeline built using FastAPI and OpenAI’s API.

It enables intelligent question-answering over custom document corpora, such as PDFs, text manuals, or engineering files — simulating how field operators or engineers can query technical data in real time.

Designed with scalable microservices principles, the system can be containerized with Docker and deployed on Kubernetes / EKS or OpenShift.


Architecture

          ┌─────────────────────┐
          │  Document Loader    │  ← PDF, text, or well files
          └─────────┬───────────┘
                    │
                    ▼
          ┌─────────────────────┐
          │  Embedding Model    │  ← OpenAI / SentenceTransformer
          └─────────┬───────────┘
                    │
                    ▼
          ┌─────────────────────┐
          │  Vector Database    │  ← FAISS / ChromaDB / Milvus
          └─────────┬───────────┘
                    │
                    ▼
          ┌─────────────────────┐
          │   Retriever Layer   │
          └─────────┬───────────┘
                    │
                    ▼
          ┌─────────────────────┐
          │  OpenAI GPT (LLM)   │  ← Generates final contextual answer
          └─────────────────────┘

Features

Document Ingestion – Upload or load domain documents (PDF, TXT, CSV). Vector Store Indexing – Store embeddings using FAISS or Chroma for fast retrieval. Contextual Q&A – Ask domain-specific questions and get concise, source-aware answers. API Endpoint – Expose a /query endpoint via FastAPI for external integration. Configurable Models – Easily switch between OpenAI, Hugging Face, or local models. Scalable & Deployable – Dockerized for deployment on any cloud (AWS, GCP, Azure).


Tech Stack

Component Technology
Language Python 3.10+
Framework FastAPI
LLM Integration OpenAI GPT Models
Vector DB FAISS / ChromaDB (can extend to Milvus / OpenSearch)
Embeddings OpenAI Embeddings / Sentence Transformers
Containerization Docker
Infrastructure (Optional) Kubernetes, EKS, Terraform
CI/CD (Optional) GitHub Actions, ArgoCD

Quick Start

1️ Clone Repository

git clone https://github.com/ManibalaSinha/OpenAI.git
cd OpenAI
git checkout feature_branch

2️ Create Virtual Environment

python -m venv venv
source venv/bin/activate   # or venv\Scripts\activate on Windows
pip install -r requirements.txt

3️ Add Environment Variables

Create a .env file:

OPENAI_API_KEY=your_api_key_here
VECTOR_DB=chroma   # or faiss

4️ Run the Server

uvicorn main:app --reload

5️ Query the API

curl -X POST "http://127.0.0.1:8000/query" \
     -H "Content-Type: application/json" \
     -d '{"question": "What is the procedure for pump maintenance?"}'

Example Use Cases

  • Energy Operations: Ask about well files, safety manuals, or regulatory filings.
  • Industrial Applications: Query process documents or equipment SOPs.
  • Corporate Knowledge Base: Enable semantic Q&A across internal wikis or handbooks.

Deployment

Docker Build

docker build -t openai-rag-pipeline .
docker run -p 8000:8000 openai-rag-pipeline

Kubernetes Example

kubectl apply -f k8s/deployment.yaml

Future Enhancements

  • Integrate Milvus / OpenSearch for enterprise-scale retrieval
  • Add GPU inference optimization via VLLM or TensorRT
  • Incorporate RLHF / Agentic workflows for adaptive reasoning
  • Support Azure ML / AWS SageMaker deployments

👤 Author

Manibala Sinha 🔗 LinkedIn | GitHub | Blog

About

OpenAI Assistants API:enables intelligent question-answering over custom document corpora, PDFs, text manuals, engineering files, simulating how field operators can query technical data in real time. Designed with scalable microservices principles, the system can be containerized with Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published