CASE STUDY
Novique Health
Private clinical LLM pipeline, evaluation-gated before launch

THE SITUATION
A thousand pages of internal guideline, manually searched
Novique Health's clinical team had accumulated internal protocols, care pathways, and decision frameworks across a decade of practice. All of it existed as PDFs and internal documents. When a clinician needed to reference a specific guideline mid-shift, the current answer was keyword search across a network drive.
Off-the-shelf AI assistants were not an option. The data never leaves their environment, and the answers have to be defensible.
BEFORE
A decade of clinical guidelines, searched by keyword across a network drive.
AFTER
Grounded, evaluated answers on their own infrastructure, with the source passage on every reply.
WHAT I BUILT
A private retrieval pipeline, evaluated before it shipped
I built a retrieval-augmented system over the guideline corpus, deployed behind a private API. Hybrid retrieval, re-ranking tuned on a curated evaluation set, and a prompt pipeline that refuses to answer when retrieval does not reach a confidence threshold.
Before the clinical team used it, I wrote an eval suite against questions their own lead clinician had asked in the past year. I did not ship until recall on those questions was where it needed to be.
Private deployment
Models hosted in their environment. Nothing about a patient question leaves their network.
Evaluations before launch
Curated question set from the clinical lead, recall tracked per release. No vibes-based rollouts.
Grounded answers only
Refusal when retrieval is weak. Clinicians see the source passages behind every answer.
THE OUTCOMES
Shipped, measured, trusted
100%
of inference stays on their infrastructure
Evaluated
releases gated on a curated clinical question set, not vibes
Live
with the clinical team during active shifts
STACK
- Python
- FastAPI
- pgvector
- Anthropic
- Railway
- Evals
