How to Pick Your First AI Automation Project (So It Actually Ships)

Most companies don't fail at AI because the technology isn't ready. They fail because the first project was wrong: too broad, too risky, or aimed at a problem nobody measured. The pilot stalls, the demo gathers dust, and "we tried AI" becomes the reason not to try again.

After shipping automations for teams in insurance collections, construction, media, and recruiting, we've seen a pattern in the projects that make it to production. This post is that pattern, written down.

The three filters

Run every candidate project through these in order. A project that fails any one of them is a second project, not a first one.

1. The work is repetitive and already documented

If a human does the task today by following steps they could explain to a new hire in an afternoon, it's a candidate. If the task requires judgment that even your best people disagree about, it isn't — not yet.

Good first targets we've seen work:

Meeting-to-task capture. Calls happen, action items get lost. An automation that turns recordings into assigned tasks with due dates is narrow, measurable, and immediately felt. We built exactly this for NSB, a collections firm, and it stuck because the before/after was visible in the first week.
Inbox triage and routing. Classify incoming email or form submissions, extract the structured fields, route to the right queue. Boring. Extremely valuable.
Document intake. Invoices, policies, jobsite photos, resumes — anything that arrives messy and needs to become a clean record.

2. You can measure the baseline this week

"Save time" is not a metric. "Intake currently takes 22 minutes per document and we process 340 a week" is a metric. If you can't state the current cost in numbers within a few days, you won't be able to prove the automation worked — and unproven automations get cut at the next budget review.

Pick something where the baseline is either already in a dashboard or trivially countable. Tickets closed. Documents processed. Hours logged. Response time.

3. A wrong answer is cheap

Every AI system will sometimes be wrong. The question is what a wrong answer costs and who catches it.

Failure cost Example First project? A human reviews it anyway Drafted email reply, suggested task list Yes Caught downstream within hours Misrouted ticket, wrong tag Yes, with monitoring Reaches a customer unverified Auto-sent pricing, contract terms No Regulatory or financial exposure Claims decisions, compliance sign-off No

Design the first project so the model drafts and a human approves. You'll graduate to full automation once the error rate is measured, not assumed.

Scope it like a feature, not a transformation

The failure mode after picking the right problem is scoping it like a platform. You don't need an "AI strategy layer." You need one workflow, instrumented end to end:

One input source. Not "all our documents" — the intake inbox.
One output. A structured record, a draft reply, an assigned task.
One owner. A person who looks at the numbers weekly and can say "keep it" or "kill it."
A four-to-six week window. If the first version can't ship inside that, the scope is wrong, not the timeline.

This is also where data quality stops being abstract. The automation is only as good as what it can see — if the source data lives in screenshots and tribal knowledge, fixing that is the first project. (We wrote more about that in AI as an assistant.)

What "shipped" looks like

A first automation is done when:

It runs on real volume, not curated samples.
Someone non-technical can see its accuracy and throughput on a dashboard.
There's a documented way to correct it when it's wrong, and corrections feed back into the system.
The baseline metric moved, and you can say by how much.

That last point is what buys the second project. "The intake queue dropped from 22 minutes to 4 per document" is an argument no slide deck can make.

Where teams get stuck

Starting with a chatbot. A general-purpose internal chatbot is the most requested and least measurable first project. There's no baseline, no single owner, and no definition of done. Build it third, not first.

Automating the exception instead of the rule. Teams gravitate toward the hairy edge cases because they're memorable. Automate the 80% that's boring; route the 20% to humans, as it always was.

Skipping the human-review phase. Going straight to full automation means your first measured error happens in front of a customer. Draft-and-approve costs a little speed and saves the program.

A 30-minute exercise to find your project

List every task your team did more than twenty times last week. For each, score 1–5 on: documented steps, measurable baseline, cost of a wrong answer (inverted). Multiply. The top score is usually obvious in hindsight — it's the task everyone complains about and nobody has time to fix.

If you want a second set of eyes on that list, that's a conversation we have often — it's the first step of our AI consulting engagements, and you can also poke at our free browser-based AI tools to see how we think about scoped, measurable automation in practice.

Ship the boring thing first. The ambitious thing gets funded by it.