PDF Pal (2024) — Chat with PDFs using RAG

A SaaS web app that lets users upload PDFs and ask questions in natural language using Retrieval-Augmented Generation (RAG) over document chunks.

View Live

Problem

PDFs are hard to search semantically, time-consuming to read, and don't support interactive Q&A grounded in the document.

Solution

A SaaS application where users: (1) Upload a PDF, (2) Ask questions in natural language, (3) Receive contextual answers grounded in the PDF using RAG.

Tech Stack

Frontend

Next.js 16 (App Router)

React 19

TypeScript

Tailwind CSS

Radix UI

Backend

Next.js API Routes

tRPC

Database

PostgreSQL + Prisma

Auth

Clerk

OpenAI (GPT-3.5-turbo + embeddings)

LangChain

Pinecone

File storage

UploadThing

Payments

PayPal Subscriptions API

Core Data Flows

Upload & Processing Pipeline

UploadThing receives the PDF
onUploadComplete triggers: (1) Create file record in DB with PROCESSING status, (2) Extract text pages via PDFLoader, (3) Chunk via RecursiveCharacterTextSplitter, (4) Generate embeddings via OpenAI, (5) Store embeddings in Pinecone (namespace per file), (6) Update status to SUCCESS or FAILED
Each PDF uses its own Pinecone namespace for isolation
Overlapping chunks used for better retrieval context
Status tracking supports real-time UI feedback

Chat / RAG Pipeline

User asks a question
POST /api/message: (1) Save user message to DB, (2) Embed question, (3) Similarity search Pinecone (top 4 chunks), (4) Build prompt including retrieved context + previous 6 messages (history) + current question, (5) Stream OpenAI response to client, (6) Save AI response to DB

Database Design

Three core entities: User (email + PayPal subscription fields), File (uploadStatus + URL/key + relations), and Message (text + isUserMessage + relations). UploadStatus enum tracks the pipeline: PENDING, PROCESSING, SUCCESS, FAILED.

Architecture Highlights

End-to-end type safety via tRPC
Vector isolation via Pinecone namespace per file
Streaming UX for responsiveness
Upload status polling with automatic UI updates
PostgreSQL for relational data + Pinecone for similarity search
Edge-ready Vercel deployment configuration

Key Features

PDF Viewer

Page navigation
Zoom (100%—300%)
Rotation
Fullscreen
Responsive split-pane layout

Chat

Streaming responses
Infinite scroll history
Markdown rendering
Optimistic updates
Context-aware answers

Subscription Tiers

Free: 5 files, 50 pages/PDF
Basic: $4.99/mo, 20 files, 100 pages/PDF
Standard: $9.99/mo, 50 files, 400 pages/PDF
Premium: $19.99/mo, 120 files, 1000 pages/PDF

Key Design Decisions

Used Pinecone namespaces per document instead of a shared index — eliminates cross-document leakage and simplifies deletion without index-wide operations
Chose tRPC over REST for the API layer — end-to-end type safety from database to UI eliminates an entire class of integration bugs
Implemented streaming responses from OpenAI rather than waiting for completion — improves perceived latency for long answers
Built a split-pane layout (chat + PDF viewer with zoom, rotate, fullscreen) — users need to verify AI answers against the source document
Used RecursiveCharacterTextSplitter with overlapping chunks over fixed-size chunking — maintains semantic coherence at chunk boundaries
Chose Clerk over custom auth — authentication is not a differentiator for this product

Tradeoffs

Per-document Pinecone namespaces limit cross-document querying but prevent the most common RAG failure mode: context bleed between unrelated documents
tRPC couples frontend and backend tightly, making it harder to expose a public API later, but the type safety benefits outweigh this for a SaaS product
Streaming adds complexity to error handling and message persistence, but the UX improvement justifies it
PayPal over Stripe limits payment method flexibility but provides broader international coverage for the target user base

Outcome

Production SaaS live at pdfpal.enkambale.com. Full ingestion-to-chat pipeline operating end-to-end with streaming AI responses, document isolation via vector namespaces, optimistic UI updates, infinite message scrolling, and subscription billing across four tiers.

Lessons Learned

RAG quality is determined by chunking and isolation strategy, not model selection
Streaming improves perceived UX quality significantly — the difference between responsive and broken is often just progressive rendering
Vector namespaces per document are non-negotiable for multi-tenant RAG — cross-document leakage destroys user trust
Shipping a full SaaS (auth, billing, file storage, AI, UX) reveals system constraints that no amount of prototyping can surface

All Projects