2026-04-14 · Stijn Servaes · 2 min read
What ships and what does not
Before starting any AI build, I run a short checklist. Nothing surprising; the point is that skipping it is why most AI projects look strong in a demo and weak in production.
Is the task actually the task? Half the AI projects I have been asked to review were solving a slightly different problem than the one the business had. A readiness review catches this in two weeks.
Is there a ground-truth answer for most of the inputs? If nobody knows what the system is supposed to output, we cannot evaluate it, which means we cannot improve it, which means we are shipping vibes. I would rather spend a week building an evaluation set than skip it.
Does failure have a floor? "The model gets it wrong sometimes" is fine for a search box and career-ending for a clinical summary. Name the floor before writing the first prompt.
Who maintains this after we ship? Most AI systems fail six months after launch because the model provider changed something and nobody was paying attention. If the answer is "nobody", we are not ready to build.
Does the business actually want the thing we are building? Sometimes the honest answer surfaces late in discovery. Better late than in production.
The point of the list is not that it is clever. It is that it takes twenty minutes and saves the engagement.