How to Set Up a Clay Enrichment Waterfall for Outbound
A Clay enrichment waterfall calls multiple data providers in sequence, each a fallback for the last, to maximize match rate and cut cost. Full setup inside.
If your outbound is underperforming and you've already rewritten the copy twice, the problem is almost never the copy. It's the list. And the list is only as good as the data behind it. A Clay enrichment waterfall is the single highest-impact fix here: instead of betting your whole list on one data provider, you chain several so that where one is blind, another can see. I'm a former data scientist, and I've spent three years building these systems for B2B SaaS founders with 50 to 300 customers. The waterfall is the piece that turns a mediocre list into a clean one, at lower cost than the single-provider approach most people default to.
This is an operator-level walkthrough. By the end you'll know what a waterfall is, why multi-provider beats a single source on both coverage and cost, and the exact seven-step build: source accounts, enrich the company, find the contact, run the email waterfall, verify, score, sync. No fabricated prices, no magic. Just the engineering.
Clay is the orchestration layer. It is not itself a data provider. Think of it as the conductor that decides which provider to call, in what order, and what to do with each answer. The waterfall is a pattern you build inside Clay: a sequence of provider calls where each one is a fallback for the previous one. Clay teaches this directly in their own waterfalls lesson, and it's the pattern every serious operator builds on first.
Here's the mechanic that makes it work. You line up providers in a row. Clay calls the first one. If it returns the field you wanted, say a work email, the waterfall stops there and the remaining providers never run. If the first provider comes back empty, Clay calls the second. Empty again, it calls the third. And so on until something hits or you run out of providers.
That single rule, run the next provider only if the previous one missed, is the whole game. It's why this is called a waterfall and not a fan-out. Water flows down one level at a time and stops where it pools. You're not calling five providers on every record and paying for five. You're calling them in priority order and paying for exactly as many as it takes to get the answer, then stopping.
Two properties fall out of that design, and both matter for outbound:
- Coverage compounds. Provider A covers some slice of your list. Provider B covers a different, overlapping slice. The union of three or four providers covers far more of your list than any one of them alone. Misses become rare because every provider gets a shot at the records the previous ones couldn't crack.
- Cost stays controlled. You sequence cheap-and-broad first. Those providers clear most of the list for almost nothing. The expensive, accurate provider sits at the bottom and only fires on the residual set, the hard records nobody else found. You pay premium pricing on a fraction of your list instead of all of it.
If you want the broader context on why founders should be building data systems like this in-house rather than renting them, I wrote that up in GTM engineering for B2B SaaS. The waterfall is one concrete instance of that thesis.
The honest reason is coverage. No B2B data provider has the whole market. Every one of them is strong in some dimension and weak in others. One has great coverage of US tech but thin data in EMEA. Another nails senior titles but misses individual contributors. Another is strong on companies above 200 headcount and sparse below it. A fourth has fresh mobile numbers but stale emails.
A single provider against a cold list misses in a biased way. The records it can't match aren't randomly distributed: they cluster in whatever region, seniority band, or company-size bucket that provider is weak in. So a single source doesn't just lose part of your list, it quietly cuts out entire segments of your ICP.
A waterfall fixes both problems. By stacking providers with different strengths, the gaps in one get filled by the others. In practice a well-ordered three-to-five-provider waterfall pushes coverage from the 40 to 70% a single source gives you toward 80 to 90% on a clean list. Treat those as targets, not guarantees: your real numbers depend on your ICP and the providers you pick. But the direction is reliable. More diverse sources, more coverage, less segment bias. Clay's own breakdown of why data waterfalls beat single sources lands on the same logic from the vendor side.
The cost argument is the part people miss. Intuitively, calling more providers sounds more expensive. It's the opposite, because of the run-if gating. If your cheapest provider matches 55% of the list on its own, the next provider only runs on the remaining 45%. The one after that only runs on whatever's left after that. Your most expensive provider, the one you'd never want to run on 100% of records, ends up firing on maybe 20 to 30% of the list, the genuinely hard records. You get the accuracy of the premium source exactly where you need it and you don't pay for it where you don't.
Here's the full build, in order. Each step is self-contained and feeds the next. I'll go deep on the email waterfall in step four because that's where the real engineering lives, but every step matters.
- Source or import your accounts. Get a list of target companies into a Clay table, one row per company.
- Enrich the company. Fill in firmographics and signals: domain, headcount, industry, funding, tech stack.
- Find the right contact. Resolve the specific person you want at each account, by title and seniority.
- Run the email waterfall. Chain email providers in priority order with run-if gating to get a work email.
- Verify the email. Run a dedicated verification step and treat it as the source of truth.
- Score and qualify. Apply a fit score so you send to the best-fit records first.
- Sync to CRM. Push clean, deduplicated, verified, scored rows to your CRM and outreach tool.
Start with companies, not people. You want a table where each row is a target account and the key column is the company domain. Domain is your join key for everything downstream, so get it clean and normalized early. Strip www, lowercase everything, drop trailing slashes.
You can source accounts a few ways. Import a list you already have as a CSV. Pull from a company-search provider inside Clay using your ICP filters: industry, headcount range, region, funding stage. Or feed in a list built from a buying signal, which is always the strongest input, companies that just raised, just hired for a relevant role, or just shipped something that maps to your product.
Whatever the source, the moment the rows land you have a dedup problem. The same company shows up as acme.com, acme.io, and Acme, Inc. across different sources. Normalize the domain first, then dedup on the normalized domain. Doing this now, before you spend a single enrichment credit, is the cheapest dedup you'll ever do. Dedup after enrichment and you've paid to enrich the same company twice.
Now run company enrichment off the domain. This is itself a small waterfall if you want it to be, but company-level data is cheaper and more available than contact data, so a single strong provider often suffices here. Pull the fields that drive both targeting and personalization:
- Headcount and headcount growth
- Industry and sub-industry
- Headquarters location and operating regions
- Funding stage, last round, and date
- Tech stack signals relevant to your product
- Revenue band where available
Two reasons this comes before contact-finding. First, you qualify at the company level cheaply before you spend money finding people. If a company falls out of your ICP on headcount or region, you drop it here and never pay to enrich a contact you'd never email. Second, the company data is the raw material for personalization later. The funding date, the recent hire, the tech signal, those are the hooks that make outbound land instead of bounce. Thin personalization is one of the top reasons reply rates stay low, which I broke down in why B2B SaaS cold email reply rates are low.
With qualified companies in hand, resolve the specific person. This is a find-contacts step keyed on the company plus a title or seniority filter. You're telling Clay: at this company, find me the VP of Sales, or the Head of Growth, or whoever your buyer actually is.
Be specific about seniority and function. A vague "anyone in marketing" returns noise. A tight "Director-or-above in Revenue, Sales, or GTM" returns the person who can actually buy. If your ICP has more than one buyer persona, run this step once per persona rather than trying to catch them all in one query.
This step usually returns the person's name and LinkedIn URL but not always a reliable email. That's expected and it's exactly why the next step exists. Treat the contact-finding step as identity resolution: who is the human. Treat the email waterfall as the separate problem of how to reach them.
This is the core. You have a person, identified by name plus LinkedIn URL plus company domain. You want their work email. You're going to ask several email providers in sequence, stopping at the first one that returns a result.
Set it up as a row of provider columns in Clay. The first column calls your cheapest broad-coverage email provider. The second column has a run-if condition: only execute if the first column came back empty. The third column runs only if the second is empty. And so on. Each column inherits the same inputs, name, LinkedIn URL, domain, so every provider gets a fair shot at the same record.
Here's the ordering logic laid out as a table. This is illustrative, not a ranking of specific vendors, your calibration determines the actual order:
| Waterfall step | Provider role | When it fires | What it returns |
|---|---|---|---|
| 1 | Cheap, broad coverage | Always, on every record | Work email for the easy majority |
| 2 | Mid-tier, different coverage | Only if step 1 returned empty | Emails for records the broad source missed |
| 3 | Specialist (region or seniority) | Only if steps 1 and 2 missed | Emails in the niche the earlier steps are weak in |
| 4 | Premium, high accuracy | Only on the residual hard records | Emails nobody else could find |
The ordering principle is cost-adjusted hit rate. You want, at each position, the provider that returns the most emails per dollar on the records that reached that position. The cheapest broad provider goes first because it clears the bulk of the list for almost nothing. Your most accurate and most expensive provider goes last because by the time a record reaches it, the easy answers are gone and you genuinely need the premium source, but only on what's left.
One caveat the cheap-first rule hides: run-if gating fires the next provider only when the previous one returns empty, not when it returns something wrong. A cheap provider that hands back a plausible but incorrect email stops the waterfall early, and that bad address rides through to the end. So order on cost-adjusted accuracy, not cost alone, and for high-value accounts verify between steps rather than only at the end. A confident wrong answer is worse than a blank, because a blank lets the next provider try.
A few operator details that make this reliable:
- Consolidate the output into one column. After the waterfall runs, you have an email scattered across four provider columns depending on which one hit. Add a single "final email" column with a coalesce: take the first non-empty result in priority order. Downstream steps read only this column, never the individual provider columns.
- Capture which provider hit. Add a "source" column that records which step found the email. This is your calibration data. After a few hundred records you can see exactly what share each provider is carrying and reorder if the economics have shifted.
- Mind the credits. Every provider call costs credits. The run-if gating is what keeps that bill sane, so check your conditions carefully. A broken run-if that fires the premium provider on every record instead of only on misses is the single most expensive mistake in this build. Test on a small batch and watch the credit burn before you run the full list.
- Recalibrate quarterly. Provider coverage drifts. The cheap one that carried 55% last quarter might carry 45% now. Run a calibration batch of a few hundred known-good records once a quarter, look at the per-step hit rates, and reorder if needed. The waterfall is a system you tune, not a thing you set and forget.
A provider returning an email is not proof that email is deliverable. Providers guess, especially on the harder records, and a confident-looking guess that bounces will hurt your sender reputation just as much as a typo. So you run a dedicated verification step on the final email column and you treat the verifier, not the email provider, as the source of truth on deliverability.
Verification returns a status, usually some version of valid, invalid, catch-all, or risky. Handle them like this:
- Valid: keep, this is your sendable list.
- Invalid: drop, do not send, no exceptions. Sending to known-invalid addresses is how you tank deliverability.
- Catch-all: hold in a separate bucket. The domain accepts everything so the verifier can't confirm the specific mailbox. These aren't worthless, but they carry bounce risk, so segment them and send to them carefully or not at all on a sensitive domain.
- Risky: treat like catch-all, separate bucket, lower priority.
Your outreach tool should only ever ingest the valid bucket. The discipline of send-to-valid-only is one of the cheapest, highest-impact deliverability moves there is, and the waterfall makes it easy because verification is just one more column at the end of the chain.
You now have verified emails attached to identified people at qualified companies. Last filter before you spend reputation: score for fit so you send to the best records first.
Build a simple score from the data you already enriched. Points for being in the core ICP industry. Points for headcount in your sweet spot. Points for a recent buying signal, funding, a relevant hire, a tech-stack match. Points for the contact being the right seniority. Sum it, bucket it into tiers, and sequence the highest tier first.
This isn't about excluding people for the sake of it. It's about order of operations on a finite sending capacity. You can only send so many emails a day without hurting deliverability, so the records you send first should be the ones most likely to convert. A score turns "email everyone" into "email the best-fit accounts first," which both protects your domain and front-loads your pipeline.
Keep the scoring transparent and rule-based, not a black box. You want to be able to look at a record and understand why it scored what it did, because you'll tune the weights as you learn which signals actually predict replies. Data-driven means the weights come from outcomes, not vibes.
The last step pushes clean rows out of Clay into your CRM and outreach tool. One row per contact, carrying: company context, verified email, fit score, and the source provider for traceability.
The thing to get right here is dedup on write. Your CRM probably already holds some of these people from past campaigns or inbound. If you sync naively you create duplicate contacts, which corrupts reporting and can double-email someone. Match on a stable key, normalized email first, then company domain plus name as a fallback, and update existing records instead of creating new ones when there's a match.
Decide the contract between Clay and the CRM explicitly. Which system owns which field. Usually Clay owns the enrichment and verification fields and the CRM owns the relationship and activity history. Write that down so a re-run of the waterfall updates the enrichment data without clobbering the notes and stage your team maintains. You own this system, so you get to define those rules, and defining them clearly is what keeps the system trustworthy three months in.
The ordering question deserves its own pass because it's where the savings live and where people get it wrong.
The wrong way to order providers is by brand reputation or by accuracy alone. If you put your most accurate, most expensive provider first because it's "the best," it runs on 100% of your records and you pay premium pricing on the easy ones a cheap provider would have caught for a fraction of the cost. You've turned a cost-control mechanism into a cost-amplifier.
The right way is by cost-adjusted hit rate at each position. Ask: of the records that reach this slot, which provider returns the most valid emails per dollar. That answer changes by position. The broad cheap provider has a great cost-adjusted rate on the full list, so it goes first. By the time you're at slot four, the cheap providers have already failed on these records, so cheapness is irrelevant, and what matters is raw ability to find a hard email, which is where the premium provider earns its price.
This is why the calibration batch matters. You can't reason your way to the right order from vendor marketing. You run a few hundred records you already have verified emails for, send them through each provider independently, and measure each provider's true hit rate and the overlap between them. A provider that only finds emails the cheap provider already found adds nothing to the waterfall and should be cut. A provider that finds emails no one else finds earns a slot even if its overall hit rate is mediocre, because its contribution is the records it uniquely covers.
Order by marginal contribution, not by standalone hit rate. The best second provider is the one whose misses least overlap with the first provider's hits. That's the engineering insight most people skip, and it's the difference between a waterfall that genuinely lifts coverage and one that's three providers all finding the same easy half of your list.
Dedup isn't a single step, it's a discipline you apply at three points, and each catch is cheaper than the one after it.
- At import (step 1): dedup companies on normalized domain before any enrichment. Cheapest possible catch. You haven't spent a credit yet.
- At contact resolution (step 3): dedup people. The same person can surface from two account rows if your company list had near-duplicates, or one company can return the same contact twice. Dedup on LinkedIn URL where you have it, since it's the most stable person identifier, falling back to email plus name.
- At CRM sync (step 7): dedup against records already in the CRM so you update rather than duplicate.
The general principle is dedup as early as the data allows, because every downstream step on a duplicate is wasted spend, and a duplicate that reaches your outreach tool means someone gets emailed twice, which reads as spam and burns the relationship. Normalize aggressively, pick stable join keys, and treat dedup as a property of the whole system, not a one-time cleanup.
I'll say it plainly because it's the step people cut to save credits: verification is not optional, and cutting it is false economy. The credits you save skipping verification are trivial next to the deliverability damage from sending to unverified addresses. A list that's 85% matched but unverified is more dangerous than a list that's 60% matched and verified, because the unverified list will bounce, and bounces compound into spam folder placement, and spam folder placement makes your entire program invisible regardless of how good the copy is.
A waterfall you haven't measured is a guess with extra steps. Before you point it at a real list and start burning sender reputation, run it against a calibration batch: a few hundred records where you already know the correct, verified email. These are your ground truth. Pull them from closed deals, past replies, or any segment you've manually confirmed, so you can grade every provider against an answer key instead of against each other.
Run the batch and measure three things per provider.
First, hit rate at position: of the records that actually reached this provider, what share did it return an email for. This tells you whether each slot is earning its place or just adding latency.
Second, false-positive rate: of the emails a provider returned, what share were wrong against your known-good answer. This is the number cheap-first ordering hides. A provider with an 80% hit rate and a 15% false-positive rate is quietly poisoning your list, because every wrong answer it returns stops the waterfall early and rides through to send. Verify the calibration batch end to end and compare each provider's output to truth, not to "the verifier said valid."
Third, marginal contribution: of the emails a provider found, how many did no earlier provider find. A provider that only re-finds what the cheap broad source already had adds cost and zero coverage. Cut it. A provider that uniquely cracks a slice of your ICP earns its slot even on a mediocre standalone rate.
Stack those three numbers next to each provider's price and the ordering writes itself: cost-adjusted accuracy first, unique coverage last, anything redundant gone. Re-run this calibration once a quarter, because provider coverage drifts and the order that was right in Q1 is rarely right in Q3. Testing isn't a one-time gate. It's the loop that keeps the waterfall honest.
A Clay enrichment waterfall is a sequence of data providers where each one is a fallback for the last, gated so the next provider only fires when the previous one misses. That single rule gives you two things a single source can't: coverage that climbs from 40 to 70% toward 80 to 90% because every provider gets a shot at the records the others missed, and cost control because your expensive provider only runs on the hard residual instead of the whole list. The blended cost per usable record comes in below what you'd pay running your premium provider on everything, and the list your outreach tool ingests is one you can trust: real people, verified emails, company context for personalization, ranked by fit.
None of this is exotic. It's data engineering applied to your list, and your list is almost certainly the real reason your outbound is underperforming, not your copy. You can stand up version one yourself in an afternoon. Making it reliable enough to run unattended every week, with provider ordering tuned to your ICP and verification thresholds that protect deliverability, is the harder part, and it's exactly the kind of system worth getting right once. Build it once, calibrate it quarterly, and it stops being the thing holding your pipeline back. You own it. That's the point.
- A Clay waterfall calls providers in sequence and stops at the first hit, so the next provider fires only when the previous one returns empty.
- Order by cost-adjusted hit rate: cheap-and-broad first to clear the easy majority, premium-and-accurate last to crack the residual 20 to 30%.
- Stacking three to five providers pushes coverage from the 40 to 70% a single source gives you toward 80 to 90% on a clean list.
- Gate on empty, not on wrong: a cheap provider's confident bad email stops the waterfall early, so order on accuracy and verify between steps for high-value accounts.
- Verification is not optional. Send to valid only, calibrate the order quarterly against known-good records, and dedup before you spend a credit.
If you want the waterfall built and tuned to your ICP, you can book a Blueprint Call: 30 minutes, founder-led, no pitch.