Guide
10 AI Development Red Flags: How to Avoid Bad Agencies (2025)
Quick Answer: Bad AI agencies reveal themselves fast. Red flags: won't share pricing, push 6-month "strategies" before building, have zero production deployments, promise unrealistic timelines, or use only one LLM. These predict failure 90%+ of the time. Here's how to spot them—and what to look for instead.
Published October 13, 2025 by Paul Gosnell
What This Guide Covers
I've evaluated 200+ AI agencies over 20 years. Seen amazing work—and catastrophic failures. The bad ones follow patterns. This guide shows you:
- 10 red flags that predict project failure (90%+ accuracy from real data)
- Why each red flag matters (the disaster scenarios that follow)
- What good agencies do instead (flip side of each red flag)
- How to test for red flags (questions that expose them)
- Warning combinations (3+ red flags = run immediately)
- Real failure stories (anonymized but painfully real)
Save yourself $50k-200k and 6-12 months of pain. Here's what to watch for.
🚩 Red Flag #1: "Contact Us for Pricing"
What It Looks Like
- Website has no pricing information
- "Every project is unique, we need to discuss"
- Won't give even a ballpark range upfront
- Requires sales call before mentioning numbers
Why It's a Problem
Translation: "We charge whatever we think you'll pay."
- No standard pricing = no repeatable process
- They price based on your perceived budget, not actual work
- Often overcharge 2-5x market rates
- Indicates they don't know their costs (amateur hour)
The Disaster Scenario
Real Example: SaaS company asked for AI chatbot. Agency quoted $180k after learning their Series A size. Same agency quoted $35k to another founder with less runway. Identical scope.
What Good Agencies Do Instead
- Show pricing ranges on website ($5k pilots, $25k-50k production, etc.)
- Give ballpark in first conversation (before sales pitch)
- Explain what impacts cost (complexity, integrations, compliance)
- Transparent pricing model (fixed, T&M, or hybrid explained)
How to Test
Ask: "What's a typical AI agent cost for [your use case]?"
- ✅ Good Answer: "$8k-15k depending on integrations, typically 2-3 weeks"
- ❌ Bad Answer: "We'd need a discovery call to determine that"
🚩 Red Flag #2: "Let's Do a 6-Month AI Strategy First"
What It Looks Like
- "We need to understand your AI readiness"
- "Phase 1 is a comprehensive AI assessment"
- "Let's map your AI transformation roadmap first"
- Deliverable: PowerPoints, not working code
Why It's a Problem
Translation: "We're consultants who don't code."
- Strategy without execution = expensive PDFs
- By month 6, tech landscape has changed
- Your competitors shipped while you strategized
- No skin in game—they get paid either way
The Disaster Scenario
Real Example: Healthcare company spent $250k on 8-month "AI strategy." Got beautiful slide deck with recommendations. Hired different agency to actually build it—original consultants ghosted. No code, no production system, just recommendations that were outdated by completion.
What Good Agencies Do Instead
- Start with pilot/MVP (4-8 weeks max)
- Strategy happens DURING building, not before
- Show working code within week 1-2
- Iterate based on real usage, not theoretical frameworks
- "Learn by shipping" approach
How to Test
Ask: "When will I see working code?"
- ✅ Good Answer: "Week 1-2, we'll have a basic version you can test"
- ❌ Bad Answer: "After we complete the strategy phase in month 3-6"
🚩 Red Flag #3: Zero Production Deployments
What It Looks Like
- "We've built many AI prototypes"
- "Can't share due to NDAs" (for every single project)
- Only show demos or POCs
- No live URLs, no real user metrics
Why It's a Problem
Translation: "We build demos that die in Slack, not production systems."
- Demos ignore hard stuff: scale, errors, edge cases, monitoring
- Production is 10x harder than demo
- No battle scars = will learn on your dime
- Can't handle real users, real data, real problems
The Disaster Scenario
Real Example: E-commerce company hired agency with impressive AI demo. Demo worked perfectly in controlled environment. In production: crashed under load, gave wrong product info 30% of time, cost $8k/month in API fees (vs projected $800). Agency had never dealt with production issues—founder had to hire someone else to fix it.
What Good Agencies Do Instead
- Show 3+ live production systems handling real users
- Share metrics (uptime, calls handled, cost per interaction)
- Discuss production problems solved (latency, hallucination, scale)
- Give you live URLs to test right now
How to Test
Ask: "Show me a live AI agent handling real users right now"
- ✅ Good Answer: *Shares live URL with real traffic* + explains metrics
- ❌ Bad Answer: "All our work is under NDA" or only shows local demos
🚩 Red Flag #4: One-LLM-Only Approach
What It Looks Like
- "We're an OpenAI partner" (or Anthropic, or Google)
- "GPT-4 is best for everything"
- Can't articulate strengths/weaknesses of different models
- Default answer: use their partnership model
Why It's a Problem
Translation: "We have a partnership deal, so you get what's convenient for us."
- No model is best for everything
- Partnership bias over your best outcome
- Missing 30-40% performance gains from multi-model approach
- Lack of deep AI understanding
The Disaster Scenario
Real Example: Financial services company needed fast, accurate data lookups. Agency used GPT-4 for everything (their partnership model). System was slow (2-3s responses) and hallucinated numbers. Different agency used Gemini for lookups (0.4s, accurate), Claude for complex analysis, GPT-4 only for creative explanations. Performance improved 5x, cost dropped 40%.
What Good Agencies Do Instead
- Multi-model approach (Claude for reasoning, GPT-4 for creativity, Gemini for speed)
- Choose based on your use case, not their deals
- Explain model tradeoffs clearly (cost, latency, quality)
- Can pivot if one model doesn't work
How to Test
Ask: "Which LLM would you use for my use case and why? What are the alternatives?"
- ✅ Good Answer: Compares 2-3 models with specific reasoning + tradeoffs
- ❌ Bad Answer: Defaults to one model without comparing options
🚩 Red Flag #5: Unrealistic Timeline Promises
What It Looks Like
- "We can build that in 3 days"
- "Production-ready AI agent in one week, guaranteed"
- Promises 3-5x faster than market standard
- No mention of what's included in timeline
Why It's a Problem
Translation: "We'll rush it, cut corners, or massively under-scope."
- Fast timelines = missing critical features (security, error handling, monitoring)
- Technical debt piles up (expensive to fix later)
- Or they just miss deadline and blame "unforeseen complexity"
- Quality suffers—you get what you pay for
The Disaster Scenario
Real Example: Real estate company promised "voice agent in 3 days." Got something on day 3, but: no CRM integration, couldn't handle multiple callers, no error handling, responses took 8-12 seconds. Required another $15k and 3 weeks to make production-ready—original agency disappeared after getting paid.
What Good Agencies Do Instead
- Give realistic ranges (simple: 6-10 days, complex: 3-6 weeks)
- Break down what's included in each phase
- Explain why it takes that long (not hand-waving)
- Under-promise, over-deliver (ship in 8 days when quoted 10)
How to Test
Ask: "What's your typical timeline for [describe your project briefly]?"
- ✅ Good Answer: "10-14 days for MVP including X, Y, Z. Production adds A, B, C for 3 weeks total"
- ❌ Bad Answer: "We can do that in 2-3 days" (for complex system)
🚩 Red Flag #6: "We Do Everything"
What It Looks Like
- Claim expertise in AI, blockchain, AR/VR, quantum, IoT...
- "Full-service agency" doing design, dev, marketing, legal, HR...
- No clear specialization
- "Yes" to every technology or service you mention
Why It's a Problem
Translation: "We're mediocre at many things, excellent at none."
- No one is world-class at everything
- Jack of all trades, master of none
- Likely outsourcing to cheaper contractors (quality varies wildly)
- No deep expertise where it matters
The Disaster Scenario
Real Example: SaaS company hired "full-service" agency for AI + design + marketing + devops. AI was amateur (junior outsourced dev), design was stock templates, marketing was generic. Ended up hiring specialists for each—original agency was just project manager taking 50% margin.
What Good Agencies Do Instead
- Clear focus ("We build AI agents, that's it")
- Honest about what they DON'T do ("We don't do design, here's who we recommend")
- Deep expertise in their lane
- Partner network for adjacent needs (not pretend to do it)
How to Test
Ask: "What do you NOT do? What do you outsource or recommend partners for?"
- ✅ Good Answer: Specific gaps + trusted partner recommendations
- ❌ Bad Answer: "We handle everything in-house"
🚩 Red Flag #7: No Code to Show
What It Looks Like
- "We can't share code due to IP concerns"
- Won't discuss technical implementation details
- No GitHub, no code samples, no architecture diagrams
- Sales team does all talking (no engineers present)
Why It's a Problem
Translation: "We don't actually write code, or it's so bad we're embarrassed."
- Real engineers want to talk tech
- Good code is shareable (anonymize if needed)
- No code review = no quality assurance
- Likely using no-code tools and charging development rates
The Disaster Scenario
Real Example: Company hired agency, never saw code during development. At handoff: spaghetti code, no tests, hardcoded API keys, no documentation. Cost $45k to refactor what should have been $15k to build correctly. Agency refused to share code during project—now obvious why.
What Good Agencies Do Instead
- Show code samples (even if anonymized)
- Discuss technical architecture openly
- Engineers present in sales process
- Code review process explained
- GitHub or code repository you can access during project
How to Test
Ask: "Can you show me a code sample from a recent AI project?"
- ✅ Good Answer: *Shows code* (anonymized if needed) + explains approach
- ❌ Bad Answer: "All code is proprietary" or changes subject
🚩 Red Flag #8: Vague Deliverables
What It Looks Like
- "We'll build a working AI agent"
- "Deliverable: AI-powered chatbot"
- No specifics on features, performance, or acceptance criteria
- "We'll figure it out as we go"
Why It's a Problem
Translation: "We'll deliver whatever we want and call it done."
- No clear success criteria = constant disputes
- You expected X, they delivered Y (both call it "AI agent")
- Scope creep nightmare (everything is extra)
- No accountability to quality standards
The Disaster Scenario
Real Example: Contract said "AI chatbot for customer support." Agency delivered: answered 40% of questions correctly, responded in 8 seconds, crashed under >10 concurrent users. Claimed it met contract ("it's a chatbot"). Founder expected 80% accuracy, <2s response, 100+ concurrent users. Legal battle ensued.
What Good Agencies Do Instead
- Detailed scope document (features, integrations, performance targets)
- Acceptance criteria defined upfront (80% accuracy, <2s response, etc.)
- Phased deliverables with checkpoints
- Clear definition of "done"
How to Test
Ask: "What exactly will I get? What are the acceptance criteria?"
- ✅ Good Answer: Detailed spec with measurable criteria
- ❌ Bad Answer: "A working AI system" (no specifics)
🚩 Red Flag #9: No Post-Launch Support Plan
What It Looks Like
- "We deliver, then you own it"
- "Support? We can discuss that later"
- No SLA, no maintenance plan, no bug fix guarantee
- "Call us if something breaks" (but no formal agreement)
Why It's a Problem
Translation: "We're ghosting you the moment we get final payment."
- AI systems need ongoing refinement (it's not set-and-forget)
- Bugs always emerge in production (who fixes them?)
- Model updates, API changes, integration breaks happen
- No support = you're stuck
The Disaster Scenario
Real Example: Agency built voice agent, no support contract. Week 3 post-launch: OpenAI deprecated API version used, agent stopped working. Agency quoted $12k to "upgrade"—took 2 hours of work. Founder had no recourse, no SLA, no support agreement. Lost week of revenue while scrambling.
What Good Agencies Do Instead
- 30-90 day bug fix warranty (included)
- Clear SLA for critical issues (4-hour response for downtime)
- Monthly retainer options for ongoing improvements
- Monitoring + alerts included
- Support terms in contract upfront
How to Test
Ask: "What happens if something breaks in month 2 post-launch?"
- ✅ Good Answer: "90-day warranty covers bugs, here's our SLA, here's retainer pricing"
- ❌ Bad Answer: "We can look at it, but that's a separate engagement"
🚩 Red Flag #10: "Trust Us, We're Experts"
What It Looks Like
- "We know best, just let us work"
- Dismissive of your input or questions
- "You wouldn't understand the technical details"
- Pushback when you want to see progress
Why It's a Problem
Translation: "We're hiding something or don't want accountability."
- Good builders welcome questions (shows you care)
- "Trust me" = hiding incompetence or cutting corners
- Your input is valuable (you know your business)
- Collaboration beats dictatorship
The Disaster Scenario
Real Example: Founder kept asking to see in-progress work. Agency said "trust our process, you'll see it when it's done." After 6 weeks and $40k: completely wrong product built. They built what they thought was needed, ignored all the founder's input. Had to scrap and start over.
What Good Agencies Do Instead
- Weekly progress demos (show working software)
- Welcome your questions and input
- Explain technical decisions in plain English
- Collaborative approach (you're a partner, not a wallet)
- Transparent about challenges and blockers
How to Test
Ask: "How often will I see progress? Can I give input during development?"
- ✅ Good Answer: "Weekly demos, your input is critical, we're building this together"
- ❌ Bad Answer: "We'll show you when it's done" or "Just trust our process"
Warning Combinations (When to Run Immediately)
3+ Red Flags = Guaranteed Disaster
If agency hits multiple red flags, walk away. Here are toxic combinations:
The Consultant Trap:
- 🚩 6-month strategy + 🚩 No production track record + 🚩 Vague deliverables
- Result: $100k-300k spent on PowerPoints, zero working code
The Overpriced Amateur:
- 🚩 No pricing transparency + 🚩 One-LLM-only + 🚩 No code to show
- Result: Overpay 3-5x for mediocre work from junior outsourced devs
The Build-and-Ghost:
- 🚩 Unrealistic timelines + 🚩 No post-launch support + 🚩 "Trust us"
- Result: Rushed, broken system delivered, then they disappear
The Jack of All Trades:
- 🚩 "We do everything" + 🚩 No production deployments + 🚩 Vague deliverables
- Result: Outsourced mess, no expertise, constant issues
The Green Flags (What to Look For Instead)
Signs of a Great AI Agency
✓ Transparent Pricing (ranges on website, ballpark in first call)
✓ Production Track Record (3+ live systems you can test now)
✓ Multi-Model Expertise (Claude, GPT-4, Gemini used strategically)
✓ Realistic Timelines (with detailed breakdown of phases)
✓ Clear Specialization (focused on what they're best at)
✓ Code Quality Standards (shows samples, discusses testing/reviews)
✓ Detailed Deliverables (acceptance criteria defined upfront)
✓ Support Plan (warranty + ongoing options clear)
✓ Collaborative Approach (welcomes your input, weekly demos)
✓ Challenges Your Brief (improves your idea, not just executes)
The Perfect Agency Response
You: "We need an AI customer support agent"
Bad Agency: "Great! That'll be around $150k, let's start with a 6-month discovery phase to build your AI roadmap."
Good Agency: "Nice. Before we jump in—what's your biggest support pain point? Ticket volume? After-hours coverage? Complex troubleshooting? Typical AI support agent is $12k-20k for production-ready system, 2-3 weeks. But let's make sure we solve the right problem first. Walk me through your current support flow?"
How to Score Agencies (Simple System)
Red Flag Scoring
| Red Flags | Score | Action |
|---|---|---|
| 0 red flags | Excellent | Strong candidate, proceed with confidence |
| 1-2 red flags | Caution | Dig deeper, ask clarifying questions, address concerns |
| 3-4 red flags | High Risk | Only proceed if no other options, protect yourself legally |
| 5+ red flags | Disaster | Run. Seriously, just walk away. |
Key Takeaways
- 10 Red Flags: No pricing, 6-month strategies, zero production, one-LLM-only, unrealistic timelines, "we do everything," no code, vague deliverables, no support, "trust us"
- Failure Prediction: 3+ red flags = 90%+ chance of disaster
- Most Dangerous: "6-month strategy" + "no production" + "vague deliverables" = consultant trap ($100k-300k wasted)
- Instant Dealbreakers: Won't show live production systems, won't discuss pricing, dismissive of your input
- Green Flags: Transparent pricing, production track record, multi-model approach, challenges your brief, collaborative
- Test Questions: Ask for pricing, timelines, live production examples, code samples, support plan—watch how they respond
- Warning Combinations: Multiple red flags compound risk exponentially
- Trust Your Gut: If something feels off, it usually is