Back to Work

PDF Pal (2024) — Chat with PDFs using RAG

A SaaS web app that lets users upload PDFs and ask questions in natural language using Retrieval-Augmented Generation (RAG) over document chunks.

Problem

PDFs are hard to search semantically, time-consuming to read, and don't support interactive Q&A grounded in the document.

Solution

A SaaS application where users: (1) Upload a PDF, (2) Ask questions in natural language, (3) Receive contextual answers grounded in the PDF using RAG.

Tech Stack

Frontend

Next.js 16 (App Router)
React 19
TypeScript
Tailwind CSS
Radix UI

Backend

Next.js API Routes
tRPC

Database

PostgreSQL + Prisma

Auth

Clerk

AI

OpenAI (GPT-3.5-turbo + embeddings)
LangChain
Pinecone

File storage

UploadThing

Payments

PayPal Subscriptions API

Core Data Flows

Upload & Processing Pipeline

  • UploadThing receives the PDF
  • onUploadComplete triggers: (1) Create file record in DB with PROCESSING status, (2) Extract text pages via PDFLoader, (3) Chunk via RecursiveCharacterTextSplitter, (4) Generate embeddings via OpenAI, (5) Store embeddings in Pinecone (namespace per file), (6) Update status to SUCCESS or FAILED
  • Each PDF uses its own Pinecone namespace for isolation
  • Overlapping chunks used for better retrieval context
  • Status tracking supports real-time UI feedback

Chat / RAG Pipeline

  • User asks a question
  • POST /api/message: (1) Save user message to DB, (2) Embed question, (3) Similarity search Pinecone (top 4 chunks), (4) Build prompt including retrieved context + previous 6 messages (history) + current question, (5) Stream OpenAI response to client, (6) Save AI response to DB

Database Design

Three core entities: User (email + PayPal subscription fields), File (uploadStatus + URL/key + relations), and Message (text + isUserMessage + relations). UploadStatus enum tracks the pipeline: PENDING, PROCESSING, SUCCESS, FAILED.

Architecture Highlights

  • End-to-end type safety via tRPC
  • Vector isolation via Pinecone namespace per file
  • Streaming UX for responsiveness
  • Upload status polling with automatic UI updates
  • PostgreSQL for relational data + Pinecone for similarity search
  • Edge-ready Vercel deployment configuration

Key Features

PDF Viewer

  • Page navigation
  • Zoom (100%—300%)
  • Rotation
  • Fullscreen
  • Responsive split-pane layout

Chat

  • Streaming responses
  • Infinite scroll history
  • Markdown rendering
  • Optimistic updates
  • Context-aware answers

Subscription Tiers

  • Free: 5 files, 50 pages/PDF
  • Basic: $4.99/mo, 20 files, 100 pages/PDF
  • Standard: $9.99/mo, 50 files, 400 pages/PDF
  • Premium: $19.99/mo, 120 files, 1000 pages/PDF

Key Design Decisions

  • Used Pinecone namespaces per document instead of a shared index — eliminates cross-document leakage and simplifies deletion without index-wide operations
  • Chose tRPC over REST for the API layer — end-to-end type safety from database to UI eliminates an entire class of integration bugs
  • Implemented streaming responses from OpenAI rather than waiting for completion — improves perceived latency for long answers
  • Built a split-pane layout (chat + PDF viewer with zoom, rotate, fullscreen) — users need to verify AI answers against the source document
  • Used RecursiveCharacterTextSplitter with overlapping chunks over fixed-size chunking — maintains semantic coherence at chunk boundaries
  • Chose Clerk over custom auth — authentication is not a differentiator for this product

Tradeoffs

  • Per-document Pinecone namespaces limit cross-document querying but prevent the most common RAG failure mode: context bleed between unrelated documents
  • tRPC couples frontend and backend tightly, making it harder to expose a public API later, but the type safety benefits outweigh this for a SaaS product
  • Streaming adds complexity to error handling and message persistence, but the UX improvement justifies it
  • PayPal over Stripe limits payment method flexibility but provides broader international coverage for the target user base

Outcome

Production SaaS live at pdfpal.enkambale.com. Full ingestion-to-chat pipeline operating end-to-end with streaming AI responses, document isolation via vector namespaces, optimistic UI updates, infinite message scrolling, and subscription billing across four tiers.

Lessons Learned

  • RAG quality is determined by chunking and isolation strategy, not model selection
  • Streaming improves perceived UX quality significantly — the difference between responsive and broken is often just progressive rendering
  • Vector namespaces per document are non-negotiable for multi-tenant RAG — cross-document leakage destroys user trust
  • Shipping a full SaaS (auth, billing, file storage, AI, UX) reveals system constraints that no amount of prototyping can surface