05Enrichment

Custom Claygent Research: Per-Lead Skill for Claude Code

Custom claygent research prompts that return typed JSON per lead. Recent funding, tech stack, CEO background, anything. Free Claude Code skill.

The arbitrary-research layer of the Clay Alt Platform. Takes one lead plus a natural-language research prompt (e.g. "recent funding round", "what tech stack"), runs web research via Exa, and returns strict JSON that slots into the lead's enrichment field. Output shape is defined per call.

Arbitrary per-lead research, defined by your prompt
Output shape is yours to decide per call
Stashes JSON in the lead's enrichment field

Install

$ mkdir -p ~/.claude/skills/clay-alt-claygent && curl -sSL https://trueadvertize.com/skill-files/clay-alt-claygent/SKILL.md -o ~/.claude/skills/clay-alt-claygent/SKILL.md

Drops SKILL.md into ~/.claude/skills/clay-alt-claygent/. Reload Claude Code and the skill auto-activates on its triggers.

Source

SKILL.md

---
name: clay-alt-claygent
description: Clay Alt Platform's answer to Clay's Claygent. A prompt-configurable research agent, takes one lead row plus a natural-language research prompt, performs web research via Exa, and returns a strict JSON result that gets stashed in leads.enrichment JSONB under a caller-chosen key. Unlike the fixed-shape waterfall enrichers, the output shape is defined by the caller per prompt. Use when a campaign needs arbitrary per-lead research columns (e.g. "recent funding", "tech stack", "CEO background").
allowed-tools: mcp__exa__web_search_exa mcp__exa__crawling_exa Bash
---

# clay-alt-claygent

Prompt-configurable research agent. The arbitrary-research layer that differentiates Clay from a dumb enrichment waterfall. Each call answers ONE research question for ONE lead and returns a typed JSON result that slots into the lead's `enrichment` JSONB.

## When to use

- Campaign config lists `claygent_prompts`: run once per lead per prompt, after waterfall enrichment succeeds
- User asks "find X about each lead" / "research Y for every company" / "add a column that says whether Z"
- A reply came in and you need to research the sender before drafting the follow-up

## When NOT to use

- The data is a fixed-shape field that a dedicated enricher already returns (company, domain, LinkedIn URL, title): use `clay-alt-enrich-via-exa` instead, cheaper and more deterministic
- Lead has `status='scraped'` (no company context yet): run waterfall first so research is grounded
- The research prompt requires real-time data (stock prices, today's weather): Exa indexes are minutes-to-hours old
- More than 5 prompts per lead: that's a signal you should build a dedicated skill with a fixed schema

## Input contract

Caller passes one JSON object on stdin:

```json
{
  "lead": {
    "company": "Stripe",
    "domain": "stripe.com",
    "first_name": "Patrick",
    "title": "CEO",
    "enrichment": { ... }
  },
  "prompt": {
    "key": "recent_funding",
    "question": "Find this company's most recent funding round. Return the amount in USD (number), date (YYYY-MM-DD), lead investor, and source URL. Null any field you cannot verify.",
    "output_schema": {
      "amount_usd": "number|null",
      "date": "string|null",
      "lead_investor": "string|null",
      "source_url": "string|null"
    },
    "max_searches": 3
  }
}
```

- `key`: where the result will be stored (`enrichment.claygent.<key>`)
- `question`: the actual research prompt. Write it like a precise Slack message to a smart intern.
- `output_schema`: informal JSON shape hint. Keep it flat; nested objects make failures harder to debug.
- `max_searches`: cap on Exa calls. Default 3 if omitted. Keeps cost predictable.

## Output contract

**Print exactly one JSON object to stdout.** Schema:

```json
{
  "key": "recent_funding",
  "value": {
    "amount_usd": 6500000000,
    "date": "2023-03-15",
    "lead_investor": "GIC",
    "source_url": "https://..."
  },
  "confidence": 0.82,
  "sources": ["https://...", "https://..."],
  "searches_used": 2,
  "notes": "Amount and lead investor cross-referenced across two sources. Date is approximate (announcement vs close)."
}
```

- `value`: matches `output_schema`. Nullable fields use `null`, not empty string.
- `confidence`: 0.0 to 1.0, self-reported. Calibrate: 0.9+ = multiple independent sources agreed, 0.6–0.8 = single source but authoritative, 0.3–0.5 = inferred from weak signals, <0.3 = unreliable, probably set `value` to nulls.
- `sources`: URLs actually consulted to answer the question. Caller may audit these.
- `searches_used`: actual Exa calls spent. Never exceeds `max_searches`.
- `notes`: 1–2 sentences on caveats, ambiguities, or why fields are null.

If the question cannot be answered at all, emit:
```json
{"key": "<key>", "value": null, "confidence": 0, "sources": [], "searches_used": <n>, "notes": "<why>"}
```

No markdown fences, no preamble. The routine pipes this directly into `jq` and writes to Supabase.

## How

### Step 1: Ground the question

Read the lead's `company`, `domain`, and any relevant enrichment fields. The research query should reference the specific company: generic searches return garbage.

### Step 2: Plan the searches

Before calling Exa, decide which 1–3 searches will most efficiently answer the question. Examples of good search phrasings:
- `"Stripe funding round 2024"` (for funding)
- `"stripe.com tech stack engineering blog"` (for stack)
- `"Patrick Collison biography education"` (for CEO background)

Bad search phrasings: the raw question itself, overly generic terms, queries without the company name.

### Step 3: Execute searches in parallel

Call `mcp__exa__web_search_exa` in one parallel tool call for all planned queries (up to `max_searches`). If a crawl is needed for deep content, use `mcp__exa__crawling_exa` on one specific URL: count it toward `searches_used`.

### Step 4: Synthesize

Extract only the fields named in `output_schema`. If a field cannot be found, set it to `null`: **never** fabricate. If two sources disagree, pick the more authoritative one (official announcement > blog post > secondary coverage) and note the discrepancy in `notes`.

### Step 5: Calibrate confidence

Be honest. If you only saw one source, confidence is at most 0.7. If you inferred a field from context rather than seeing it stated, confidence drops to 0.4 and you should flag it in `notes`.

### Step 6: Emit

One JSON object to stdout. Exit 0 even on "couldn't find anything": the routine decides how to handle nulls.

## Cost awareness

Default `max_searches=3` caps the Exa and Claude usage of a single claygent call. Cost follows both providers' current usage-based pricing; check current rates before running at scale. Cost scales linearly with leads × prompts: watch out with 100+ leads and 5+ prompts.

Cache hit: if `lead.enrichment.claygent.<key>` is already populated and fresher than 30 days, the routine should skip this call entirely (enforced at the routine level, not here).

## Anti-patterns

- **Don't search for the raw question.** Rephrase to company-specific keywords.
- **Don't hallucinate dates or numbers.** If Exa doesn't surface it, null it.
- **Don't exceed `max_searches`.** Even if the answer is tantalizingly close, the caller set a budget.
- **Don't output anything but the JSON object.** No explanations, no trailing text, no markdown fences. The routine is piping stdout directly into `jq`.
- **Don't cache results inside the skill.** The routine handles cache-check before invoking you: you're always called fresh.

## Comparison to Clay's Claygent

| Aspect | Clay's Claygent | This skill |
|---|---|---|
| Orchestration | Clay's UI, one row at a time | Bash loop in a routine, any row count |
| Model | GPT-4 or Claude | Claude (whatever Claude Code is using) |
| Web search | Proprietary + Google-style | Exa (neural search) |
| Cost per call | Clay credits (opaque bundle pricing) | Direct provider usage rates (transparent, itemized) |
| Prompt configuration | Clay's spreadsheet columns | `campaigns/<name>/config.yml` → `claygent_prompts` |
| Output schema | Free-text by default, typed optional | Typed by caller-provided schema, enforced in `notes` |
| Caching | Per-row, Clay-managed | Per-`(lead_id, key)`, routine-managed (`enrichment` JSONB) |

Trigger phrases

Claude Code auto-activates this skill when prompts contain phrases like:

research [topic] for every leadadd a column that says [X]find out [Y] about each companycustom research per lead