TrueAdvertize
May 27, 202623 min readhow to evaluate a B2B outbound agency

How to Evaluate a B2B Outbound Agency Before You Sign (2026 Founder's Checklist)

B2B outbound agencies fail B2B SaaS founders the same way every time. Here are the 8 questions, 6 red flags, and the ownership test that separates a real partner from another template farm.

Samuel Roa
Samuel Roa
Founder, TrueAdvertize

If you're a B2B SaaS founder running 50 to 300 customer accounts, you've probably had a version of the same conversation three times this quarter. An agency partner gets warm-introed in by someone in your network. Their first call sounds smart. Their second call sounds tailored. By the third call you're being asked to sign a 6 to 12 month retainer at $8K, $12K, sometimes $15K per month, with the promise that pipeline will start moving by week 4.

You've been burned before, so you ask the questions you remember being burned by last time. They have good answers. You sign.

Ninety days later, the sequences look like sequences. The pipeline is up by a few meetings. The reply rate is below 2%. Your team can't tell you what's working without the agency present. The agency wants to extend.

This is the failure mode for almost every B2B SaaS founder who hires an outbound agency between $1M and $5M ARR. And the reason it keeps happening is that the questions you remember to ask are the ones the agency has rehearsed answers to. The questions you need to ask are different.

I run TrueAdvertize. We build GTM systems for B2B SaaS founders who've outgrown hustle, on a fixed-timeline engagement instead of a retainer. I sit on the other side of these evaluations almost every week. Some founders sign with us. Some sign with someone else. Some decide to build in-house. The framework below works regardless of who you pick. If we lose a deal because a founder used it on us and decided we weren't the fit, that's a result we can live with. The point is that you stop signing with template farms.

This is what I'd want a founder to read before getting on a call with me, with an agency, or with anyone selling outbound services.

Why most B2B outbound agencies fail (three failure modes)

Before the questions, the categories. Almost every agency that disappoints a B2B SaaS founder falls into one of three buckets. Knowing which bucket you're looking at on the first call is more useful than any specific question.

The template farm. Sends similar sequences across every client. The "ICP research" is a 30-minute call where they ask you who your customer is, then they write copy that sounds like every other agency's copy because it is every other agency's copy with names swapped. The giveaway: their pitch deck has 12 logos of companies in different industries, and the case studies all show the same three metrics. You're paying for distribution of generic copy at scale, with reply rates in the 1 to 2% range, which is what the industry baseline produces.

The retainer-dependency model. Charges $5K to $15K a month with no compounding asset on the client side. The agency owns the Clay workflows. The agency owns the sequences inside their Smartlead or Instantly account. The agency owns the data enrichment subscriptions. The day you cancel, you have nothing. The "system" lived in their tooling under their logins, and they're not handing you the keys because the keys are the only thing keeping you on the retainer. This is the model the industry runs on because it's the most profitable, not because it's the most honest.

The strategy deck. Charges $20K to $80K upfront for a 40-page Notion doc that describes the GTM motion you should build. Then they walk. The deck is rigorous, the slides are pretty, and at no point did anyone build anything. You now have an excellent description of the system that you still have to build yourself, with a third of your runway gone.

Each of these models has its place somewhere in the market. None of them is the right fit for a B2B SaaS founder at $1 to $5M ARR who needs a working outbound motion in the next quarter. Spot the bucket on call one, and you've already filtered out most of the noise.

The eight questions to ask before signing

These are the questions that separate the agency that's actually building a system from the three failure modes above. Each one has a clean answer if you're talking to the right firm. If the answer is hedged, vague, or "let me get back to you on that," treat it as a red flag.

1. "Who owns the Clay workflows, sequences, and data on day 90?"

The single highest-leverage question on this list. If the answer is anything other than "you do, 100%, in your own accounts, with documented SOPs and admin access," you're signing for a dependency, not a system. Ask to see the standard handoff document. Real partners have one. Template farms don't.

2. "What's your reply rate floor, and what list do you measure it against?"

Any agency can claim "8 to 12% reply rate" because the number is decoupled from list quality. The honest version of this answer specifies two things: the reply rate floor they design toward (8% is reasonable; under 5% is the industry baseline and not interesting; over 20% is suspicious or it's a tiny test list), and the list construction (Apollo scrape with no filters versus a hand-curated TAM with multi-provider enrichment). If they can't articulate list discipline, the reply rate number is theater.

3. "What does week 1 of the build actually look like?"

You want to hear specifics. "Day 1 we kick off and do ICP discovery. Day 2 we map your current funnel and identify gaps. Day 3 we draft the system architecture in a shared doc. Day 5 you get the blueprint." If the answer is "we'll send you a project plan after kickoff," they don't have a repeatable process, which means they're inventing it on your engagement and they'll miss timelines.

4. "Can I talk to a client whose engagement just ended in the last 90 days?"

Not a logo. A reference call with a founder whose engagement is recent enough that the system is still running. Two flags on this one. First, agencies that only show you 18-month-old case studies are showing you the only one that worked. Second, the question filters for whether the agency lets clients leave with a working system in the first place. Retainer-dependency shops can't produce this reference because their clients either churned (bad) or are still on retainer (which means they don't really own anything).

5. "What's your refund policy if the build doesn't ship on time or doesn't hit the floor reply rate?"

The answer should be concrete. "30-day money-back on the build phase, keep all artifacts shipped to that point" or similar. If the answer is "we don't offer refunds because every engagement is different," the firm doesn't believe in its own process enough to put real money on it. The risk reversal isn't about the money. It's about whether the firm has skin in the game.

6. "What tools are you using on this engagement, and whose account do they live in?"

If the agency runs the campaign through their Clay seat, their Instantly campaign, their Inboxkit setup, you're renting infrastructure. The handoff at engagement end requires migrating all of that into your accounts. The honest agency builds inside your tools from day 1, with their team as collaborators on your seats, so the migration on day 90 is a transfer of admin rights, not a rebuild.

7. "Who will I actually be working with day to day?"

The pitch usually involves a senior partner. The work usually gets done by an account manager and an offshore junior. The disconnect is where most engagements quietly die. Ask for the name and LinkedIn of the person who will run your weekly call in week 3. If they can't tell you, the org chart isn't built around accountability.

8. "What happens at the end of the engagement? What's the post-handoff structure?"

The retainer-dependency agency will pitch you on "continued optimization" at month 4. The build agency will hand you a 30-page SOP library, recorded training videos, a documented playbook, and offer optional monthly check-ins that are scoped and paid hourly, not on retainer. The post-handoff structure is where the agency's actual model reveals itself.

Six red flags that should kill the deal on the spot

Some answers are not just hedged. They're disqualifying. If you hear any of these on the first or second call, end the evaluation.

Red flag 1: They can't define your ICP back to you by the second call. If they're still asking who your customer is after a 90-minute discovery call, they're going to spend your first three weeks doing what should have been done before they pitched you. The agencies worth hiring have either done in-niche work before or built their process around fast, rigorous ICP definition. Either way, by call two, they should be reflecting your ICP back to you in language sharper than the way you usually describe it.

Red flag 2: The case studies are all logos with no numbers. A logo wall says "this company once paid us." A real case study says "this founder at this stage ran this play for this period with this reply rate against this list size." If the deck is logos, ask for the case studies in narrative form. If they can't produce them, the engagement didn't go well enough to write down.

Red flag 3: The pricing model is a fixed monthly retainer with no scope cap. "We charge $8K per month for outbound services" means the scope is whatever they choose to ship each month, and your incentive to keep paying is decoupled from outcomes. Pricing tied to the build (one number for the build phase, one number for the optimize phase, then optional check-ins) creates the right alignment.

Red flag 4: They want to start sending in week 1. Real systems take 2 to 4 weeks to build before the first send. Anyone offering to start sending in week 1 is sending generic copy against a generic list, which is the industry baseline (1 to 2% reply rate) by design. Speed-to-send is not a feature.

Red flag 5: They can't tell you which CRM, sending tool, or enrichment provider they prefer and why. "We work with whatever you have" sounds flexible but usually means they don't have opinions because they haven't built enough engagements to develop them. The agencies that work have a default stack they recommend (Clay, Instantly or Smartlead, Apollo, HubSpot or Salesforce) and a defensible reason for each tool's role.

Red flag 6: They downplay your team's involvement during the build. "Don't worry, we'll take care of everything" is a marketing line that means you'll get a system you don't understand and can't run when they leave. The honest pitch is the opposite: "we need 3 to 5 hours from you each week during the 4 to 8 week build, including team training in week 7, because if we hand you a system your team can't operate, the engagement failed."

The ownership question (and why it's the only one that matters in 12 months)

If I had to compress this entire framework into one question, it would be this:

"On day 91, what do I own that I didn't own on day 0?"

The answer separates every real partner from every retainer. Here's what good looks like in concrete artifacts:

  • The full GTM Blueprint document, with ICP definition, vertical splits, and rebuild plan
  • Working Clay tables, enrichment waterfall, and scoring logic in your Clay seat
  • Sequences live in your Instantly or Smartlead account, under your admin
  • A 30+ page SOP library (Notion, Confluence, whatever your team uses)
  • Recorded training videos walking your team through how to run the system
  • Documented attribution: how the CRM tracks pipeline back to source
  • Direct admin access to every credential the system uses

Notice what's not on this list: a deliverable that lives inside the agency's tooling, requires their account to run, or comes with an "ongoing support contract" without which it stops working. The ownership test is binary. Either you can run the system without the agency on day 91, or you can't. There's no middle path. Most retainer-dependency engagements fail this test, which is why so many clients churn at the 12-month mark feeling like they paid $96K for nothing.

The pricing models you'll encounter (and what each one signals)

Five pricing models dominate the B2B outbound agency market in 2026. Each one tells you something about how the firm runs.

Monthly retainer, no scope cap. $5K to $15K per month, scope defined by the agency. Highest profit margin for the firm. Lowest accountability for outcomes. Usually 6 to 12 month minimum commitment, with cancellation clauses requiring 90 days notice. Signal: the firm makes money whether you grow or not.

Monthly retainer with scope of work. $5K to $15K per month, deliverables specified per month. Better than the previous, because the SOW gives you something to enforce. Still aligned to billable hours, not outcomes. Signal: the firm has been burned by clients pushing scope, so they've defended the model. Reasonable for ongoing-management engagements, not for builds.

Performance-based with monthly minimum. $3K to $8K monthly retainer plus per-meeting or per-qualified-lead bonus. Aligned to outcomes on paper. In practice, the "per-meeting" definition gets debated every month, and the agency optimizes for whatever the metric counts, not for the metric you actually care about. Signal: alignment in theory, friction in practice.

Fixed-fee build, no retainer. $15K to $80K for a defined build phase (4 to 8 weeks), then an optional optimization phase priced separately. The client owns everything at the end. Signal: the firm has a repeatable process, knows the unit economics, and is confident enough in delivery to fixed-fee it. This is the TrueAdvertize model.

Pure performance, % of pipeline or revenue share. No upfront. Agency takes 10 to 25% of attributed revenue or pipeline. Sounds founder-friendly. Reality: works only when the firm has unilateral control over the funnel, which means they're not going to teach your team or hand you the system. Signal: the firm is building their own asset on your customer base, not yours.

There is no universally correct model. There is a correct model for your stage. At 50 to 300 customers and $1 to $5M ARR, fixed-fee build is almost always the right answer, because you're at the stage where you need a working system you own, not ongoing management of someone else's.

How to evaluate case studies (the questions behind the numbers)

Every agency shows you the same case study format: company X, problem Y, our team did Z, results were W. The format hides almost everything that matters. Here are the questions that turn a case study into a real signal.

What was the list size and how was it built? A 12% reply rate on a 400-lead hand-curated list of in-ICP accounts is a real result. A 12% reply rate on a 40-lead pilot list from the founder's personal network is a vanity number. The list construction is most of the story.

How long did the engagement run before the reported metric? Reply rates in week 2 are not reply rates in week 12. Pipeline volume in month 3 is not pipeline volume in month 9. Ask for the time-series, not the headline.

What was the client's baseline before the engagement? A company that came in at 0.8% reply rate and ended at 9% is a real story. A company that came in at 6% and ended at 9% is a smaller intervention. Without baseline, the lift is unreadable.

What did the client retain after the engagement ended? If the answer is "we still run their outbound for them," the case study is a retention story, not a build story. The interesting version is: client X took over the system on day 91, has run it independently for the last 14 months, and is now at Y reply rate without us in the room.

Can I talk to the named contact at company X? If yes, schedule it. If no, ask why. The agencies worth hiring have clients who will take a 20-minute reference call. The agencies that aren't worth hiring have clients who won't.

The reply rate question (and how to spot the lies)

Reply rate is the most reported and most misreported metric in B2B outbound. A few quick filters.

Industry baseline is 1 to 2% on Apollo-style cold lists without enrichment, without personalization, without segmentation. This is what the average B2B SaaS company sees from in-house SDRs running default plays. If an agency is quoting numbers below 3%, they're at parity with what you could do yourself, and you're paying for distribution, not optimization.

The 8 to 12% range is achievable with rigorous ICP definition, multi-provider enrichment, segment-specific copy, and multi-touch sequences. This is the design target for any serious agency engagement. If an agency is quoting numbers in this range, they're plausible. Verify with the case study and reference questions above.

Numbers above 20% reply rate should make you skeptical. Either it's a tiny test list (20% of 30 is 6 replies), or the reply definition is loose (counting auto-replies, out-of-office, unsubscribe-with-message as a reply), or the engagement was a single hand-crafted A/B test, not a sustained motion. Ask for the cohort definition.

The most useful question: "what's your reply-to-meeting conversion rate?" Reply rates can be juiced. Meeting conversion is harder to fake. If the agency can answer "we typically see 25 to 40% of replies convert to meetings," they're tracking the funnel. If they can't, they're tracking vanity.

What money-back guarantees actually mean

Most agencies don't offer them. The ones that do, the guarantee usually has more fine print than a credit card agreement. A few questions cut through this.

What triggers the refund? "Build doesn't ship on time" is a clean trigger. "We didn't hit the agreed outcome" is more subjective and usually litigated. The trigger should be measurable from the outside.

How much is refundable? "100% of the build fee" is meaningful. "Up to 50% pro-rated against unbilled work" is mostly a marketing line.

What do you keep if you trigger it? "Keep all artifacts shipped to that point" is the answer that matters. If the answer is "you keep nothing," the guarantee is theater because you'll have to start the rebuild from scratch anyway.

When does the window close? A 30-day window from kickoff is honest. A 7-day window is performative. A window that runs until first send is built to expire before anything can go wrong.

The reason this matters isn't that you're planning to invoke the refund. It's that an agency willing to put real money behind a clean trigger has confidence in its own delivery, which is a structural signal about whether the firm will treat your engagement seriously.

When NOT to hire a B2B outbound agency

This article is from someone who runs one. So the failure mode of this whole frame is that the answer is always "hire a good agency." It isn't.

There are at least three situations where hiring an agency is the wrong move, and you should know which one you're in before you start evaluating.

You don't yet have product-market fit. If you're under 50 customers and you're still iterating on positioning, ICP, and core message, outbound at scale is premature. You'll spend $40K to $80K running plays that need to be rewritten in three months because your ICP shifted. The right move is to keep selling founder-led until the message is sharp, then bring in a build team once the ICP is locked.

You have an in-house team that just needs better playbooks. If you've got 2+ SDRs, an SDR manager, and a RevOps person, and they're producing 1 to 2% reply rates, you don't need an agency. You need a 4 to 6 week consultancy engagement to ship the playbooks and train the team. The pricing is closer to $20K to $40K, and the ownership question is moot because your team is the one running the system from day 1. This is a different product than a full agency build.

Your problem is closing, not generating. If your pipeline is healthy but your close rate is broken, an outbound agency will produce more top-of-funnel that gets lost in the same broken close motion. The right intervention is sales enablement, deal review, and probably hiring an AE, not pouring more leads into a leaky funnel.

If you're not in one of those three situations, and you've got 50 to 300 customers, $1 to $5M ARR, founder still in every demo, and a pipeline that's stuck, then yes, the build-with-a-partner model is probably the right move. Just run the framework above on whoever you're considering.

The reference call playbook

When you get a reference call with a former client, you have 20 to 30 minutes to extract the real story. Most founders waste it on small talk and surface questions. Three categories of question that actually surface signal:

Operational. "Walk me through what your team was doing in week 4." "Who on their team was on your weekly calls, and were they the same people from week 1 to week 12?" "What's something you had to push back on the agency about, and how did they respond?"

Outcome. "What was your reply rate at week 4 vs. week 12?" "How many meetings were on the calendar at handoff?" "What's the reply rate now, six months after the engagement ended?"

Structural. "On day 91, what did you own that you didn't own on day 0?" "Has the system kept running without them in the room?" "If you had to do it again, would you sign with them?"

The last question is the one that breaks through. If the answer is yes, you've got real social proof. If the answer is "I think so, but..." you've got a signal worth probing. If the answer is "honestly, no," you just saved yourself a year.

FAQ

How long should the evaluation process take?

Three to four weeks is reasonable. Call one is intro and qualification. Call two is process and team. Call three (after they've sent a draft scope of work) is structure, pricing, and reference calls. Anyone pushing you to sign in under two weeks is selling, not evaluating.

How many agencies should I evaluate in parallel?

Three is the sweet spot. One forces you to compare against your own assumptions. Two creates a coin flip. Three lets you triangulate. More than four diffuses the energy across too many discovery calls and you end up with surface knowledge of seven firms instead of deep knowledge of three.

What's a fair budget for a B2B SaaS outbound build at our stage?

For 50 to 300 customer companies in the $1 to $5M ARR range, fixed-fee builds typically run $15K to $80K depending on the complexity of the ICP, the number of vertical splits, the CRM integration scope, and whether enterprise motions are included. Anything under $15K is a template farm. Anything over $80K should come with a co-CRO arrangement or it's overpriced.

Should I ask for a paid pilot before the full engagement?

Yes, if the agency offers one. A 2 to 3 week paid discovery and architecture phase (often $5K to $10K) lets you see how they work without committing the full build budget. Agencies that refuse paid pilots usually do so because their process front-loads sales and back-loads delivery, which is the inverse of what you want.

What's the single biggest mistake founders make when hiring outbound agencies?

Optimizing for speed-to-pipeline instead of ownership-at-handoff. The agencies that promise pipeline by week 4 are running templates that produce 1 to 2% reply rates. The agencies that take 4 to 8 weeks to build before sending are building systems you'll still own at month 12.

How do I know if an agency has actually run their own GTM before, or if they're just selling consulting?

Ask them what their own outbound stack looks like, how they source their own leads, and what their inbound-to-outbound mix is. Agencies that have actually run a GTM motion can answer with specifics in 30 seconds. Agencies that have only consulted on GTM hedge or pivot.

Key takeaways

If you skim everything above and only remember six things, remember these:

  • Most outbound agencies fail B2B SaaS founders the same way: templates dressed as systems, retainer-dependency, or strategy decks that never get built. Spot the bucket on call one.
  • The ownership question is the single highest-leverage filter. On day 91, do you own the system or are you still renting it? Binary test, no middle path.
  • Fixed-fee builds align incentives better than retainers at the $1 to $5M ARR stage. Retainers make sense for ongoing management once a system exists. Builds make sense for getting to that system in the first place.
  • Case studies without baselines, list sizes, and time series are vanity metrics. Push for the questions behind the numbers.
  • Real money-back guarantees signal real confidence in delivery. The 30-day window with full refund and artifact retention is the version that means something.
  • You don't always need an agency. Pre-PMF, in-house with playbooks, and broken-close-rate scenarios all require different interventions. Make sure you're in the right bucket before you start evaluating firms.

The framework above doesn't tell you which agency to hire. It tells you how to filter the ones that aren't worth hiring at all. The agencies that survive this framework are the ones whose model is built around your outcomes, not theirs. That's a small fraction of the market. Once you have your three finalists, the rest is reference calls and gut.

If you want to use this framework on TrueAdvertize, we'd actually prefer it. Send the eight questions in advance. We'll send back the answers in writing, schedule a reference call with a client whose engagement ended last quarter, and you can run the same process on two other agencies in parallel. Whichever firm passes the framework most cleanly is the one you should sign with. We're happy to compete on that basis.

If you'd rather skip the evaluation and just see how we'd build the system for your company, book a Blueprint Call. Thirty minutes, founder-led, no sales pitch. We'll show you the framework above applied to your stage, and if we're not the right fit, we'll tell you who is.