Work public
industrial-doc-rag
RAG over industrial MOSFET datasheets. Part-number rerank, source-cited answers, ground-truth eval loop on Cloudflare Workers and Qdrant.
The problem
Industrial documentation is the opposite of marketing copy. A MOSFET datasheet runs forty pages of tables, characteristic curves, and parameters with cryptic names. Engineers do not read these documents end to end. They look up specific values for specific part numbers and need the source row attached to the answer.
Generic RAG fails on this content. Chunk strategies tuned for prose blow apart table rows. Embeddings tuned for natural language collapse part numbers that differ by one character. The user needs an answer with the cited row, not a confidently wrong paragraph.
How it works
Retrieval runs in two layers. A lexical pass scores part-number matches and document-section anchors. A semantic pass scores natural-language queries against embedded chunks. The rerank step lifts part-number hits above semantic neighbors because in this domain the literal token is the load-bearing signal.
Every answer ships with a citation to the source chunk. The eval loop runs a ground-truth set of part-number lookups and parameter queries against the deployed pipeline and reports recall at top-k.
Stack
Cloudflare Workers for the API. Qdrant for the vector layer. TypeScript end to end. The ground-truth eval is the deploy gate.