case study • ai tool

AI Handbook Generator

A Python AI tool that generates 20,000-word structured handbooks from uploaded PDFs through a conversational Gradio interface — using a custom RAG engine and the LongWriter technique.

Overview

Project Summary

AI Handbook Generator lets you upload PDF documents, ask questions about them, and generate 20,000+ word structured handbooks through a chat interface. Built to solve the LLM output limit problem — standard models cap at a few thousand words per response, so I implemented the LongWriter technique to break generation into sections and assemble them into a full document.

Stack

Stack & Architecture

  • Python — application layer
  • Gradio — chat UI with streaming output
  • Groq API — Llama 3.3 70B for generation
  • sentence-transformers — local text embeddings
  • numpy — cosine similarity vector search
  • pdfplumber / pypdf — PDF text extraction
  • Supabase — optional persistent storage
  • Hugging Face Spaces — deployment
Features

Key Features

  • PDF upload & indexing — extract and embed text from any PDF in seconds
  • RAG-grounded chat — answers pulled from uploaded documents, not hallucinated
  • 20,000-word handbook generation — triggered via chat, streamed live to the browser
  • LongWriter technique — Plan → Write per section → Assemble avoids output limits
  • Markdown export — download the finished handbook as a .md file
  • Local embeddings — no embedding API key required; runs sentence-transformers on-device
Challenges

Challenges & Learnings

  • LLM output limits — solved with LongWriter: split generation into 12–16 sections, each ~1,500 words, then assembled
  • Slow RAG indexing — original LightRAG approach made LLM calls during indexing (slow); replaced with local numpy cosine similarity search
  • Dependency conflicts — sentence-transformers 5.x broke PyTorch; pinned to <4
  • Gradio breaking changes — v6 removed several parameters and changed chat history format; updated all affected code
  • API pivots — switched from xAI to Groq after hitting credit limits; OpenAI-compatible SDK made this a one-line change

Problem

Standard LLMs can't generate a full 20,000-word document in one API call — they hit output token limits and truncate. I wanted to build something that could take uploaded PDFs and produce a complete, structured handbook from them, not a summary, but a full long-form document grounded in the source material.

Approach

I implemented the LongWriter / AgentWrite technique from AI research: first generate a full table of contents with word-count targets per section, then write each section in a separate API call using relevant RAG context, then assemble the sections into a single document. For the RAG layer, I built a custom vector search engine using sentence-transformers for local embeddings and numpy cosine similarity for retrieval — no external vector database required.

What I Built

Core

Custom RAG Engine

Built a vector similarity search engine from scratch using sentence-transformers for local embeddings and numpy for cosine similarity. Chunks are embedded and saved to disk on upload, so indexing survives app restarts without any vector database.

Core

LongWriter Generation

Implemented the Plan → Write → Assemble pipeline. The LLM first creates a 12–16 section outline with word targets, then writes each section individually using retrieved context, then all sections are joined into one document. Each generation step streams live to the UI.

UI

Gradio Chat Interface

Built a tabbed Gradio interface with PDF upload, a streaming chat for both Q&A and handbook generation, and a one-click markdown export. Deployed to Hugging Face Spaces with the Groq API key set as a Space secret.

Result

The app can generate a 20,000+ word structured handbook from uploaded PDFs in 5–15 minutes, streamed live. This project pushed me into Python backend work, RAG architecture, real dependency debugging, and practical LLM engineering — closer to production AI tooling than anything I'd built before.