KnownByLLM

How-to · 6 min read

Has AI search picked up your site?

A month-one checklist.

You published llms.txt, ran the validator, made sure your robots.txt isn’t blocking the crawlers. Now what? The honest answer is that the first month is mostly about verifying the plumbing works — not celebrating citation share.

This is the checklist we use ourselves. It works without paid tools, doesn’t require coding, and surfaces the issues that actually break AI search for small sites.

The four-check pattern

Run all four checks once a week for the first month. After that, monthly is enough. Each check answers one specific question.

  1. 01

    Is the file reachable?

    Confirms /llms.txt loads correctly and is being served with the right headers. The most common cause of zero AI traffic is a broken file, not bad content.

  2. 02

    Are AI bots fetching it?

    Confirms ChatGPT's GPTBot, Anthropic's ClaudeBot, and Perplexity's PerplexityBot are actually visiting. If they aren't, fix that before anything else.

  3. 03

    Are you cited?

    A direct manual check: ask the AI assistants the questions your customers ask, and look at the source list.

  4. 04

    Is anyone arriving from AI?

    Referral traffic from chat.openai.com, claude.ai, perplexity.ai, and similar — the proof that citation translated into a visit.

Check 1. Is the file reachable?

What to do: Open https://yoursite.com/llms.txt in a private/incognito browser window. Open developer tools (F12), go to the Network tab, and reload.

What to verify:

  • Status code 200 — not 301, not 404, not a redirect chain.
  • Content-Type header is text/plain or text/markdown. If you see application/octet-stream, some AI crawlers will skip the file.
  • The Markdown renders as plain text in the window — not as HTML, not as a download.
  • The validator on this site reports zero spec violations. Run it before declaring the file fixed.

Common breakages: a 301 redirect from http:// to https:// that AI crawlers sometimes don’t follow; a misconfigured CDN that strips the Content-Type; an SPA that intercepts the request and returns the HTML shell instead of the file.

Check 2. Are AI bots fetching it?

What to do: Look at your server access logs (or CDN logs) and grep for these User-Agent strings.

GPTBot/         (OpenAI / ChatGPT browsing)
ChatGPT-User/   (logged-in ChatGPT users browsing through the assistant)
ClaudeBot/      (Anthropic / Claude)
PerplexityBot/  (Perplexity)
Google-Extended (Google AI products: Bard, Gemini, AI Overviews)
Applebot-Extended (Apple Intelligence)

What to verify:

  • At least one fetch from each major bot within 7–14 days of publishing. GPTBot tends to be first; ClaudeBot and PerplexityBot follow.
  • The bots are also fetching the URLs inside your llms.txt, not just the file itself. This is the signal that AI is consuming your curated list.
  • 2xx responses. 4xx or 5xx to AI bots quietly destroy your citation funnel.

If no bots are fetching after two weeks:

  • Check robots.txt for accidental Disallow: / rules under User-agent: GPTBot or User-agent: ClaudeBot.
  • Check Cloudflare or your firewall for “Block AI Scrapers” rules — many WAFs ship one enabled by default.
  • Check whether you’re behind a paywall, geofence, or login wall that AI bots can’t bypass.
  • Check the served Content-Type one more time. AI bots are stricter than browsers about MIME types.

Check 3. Are you cited?

What to do: Pick 5–10 of your real customer queries. In a fresh logged-out browser, run each one against:

  • ChatGPT (with browsing on)
  • Claude (with web search on)
  • Perplexity
  • Google AI Overviews (just search Google as normal)
  • Optional: Copilot, Brave AI, You.com

For each query × assistant, record:

  • Whether your site appears in the citation list (yes/no).
  • Position in the citation list (1st, 2nd, etc.).
  • Whose words got picked up — yours, a competitor’s, or a paraphrase.

What “normal” looks like in month one:

  • For brand-name queries (“<your business name>”, “<your business name> pricing”), you should appear immediately. If you don’t, your llms.txt or your page titles are probably the problem, not citation share.
  • For mid-tail queries (“accountants in Austin”, “CRM for landscaping companies”), expect 0–3 citations out of 4–5 assistants in month one. Citation builds with content depth and time.
  • For broad head queries (“best CRM”, “tax preparation software”), don’t expect citation in month one. Established brands dominate these slots; mid-tail is your battleground.

Tip: Use a private/incognito window and a different IP if possible. Logged-in sessions personalise answers and skew the audit.

Check 4. Is anyone arriving from AI?

What to do: In your analytics tool (GA4, Plausible, etc.), look at sessions by referrer. Filter for these domains:

chat.openai.com    chatgpt.com
claude.ai
perplexity.ai      www.perplexity.ai
copilot.microsoft.com
gemini.google.com
search.brave.com
you.com

What to verify:

  • Trend, not absolute number. Five sessions in week one and 25 in week four is a strong signal. 25 sessions in week one and 25 in week four is flat.
  • Landing pages. AI traffic almost never lands on your home page — it lands deep, on the specific page that answered the query. The pages that win are the ones you curated in llms.txt. If they’re not getting the traffic, your descriptions are likely too generic.
  • Behaviour. AI traffic is short-session, high-intent. A 1.2 pages-per-session and a 60-second time-on-page is normal — these visitors already had context from the AI’s answer.

Diagnosing common patterns

Bots are fetching, but no citations

Most often a content problem, not a plumbing problem. The pages you’ve listed in llms.txt are technically reachable but don’t answer the questions your audience is asking. Compare the citation snippets on the assistants that did cite a competitor — what do those pages have that yours don’t? Specific numbers? Named comparisons? A clear location and audience? Add what’s missing.

Citations, but no traffic

Two possibilities. (1) The AI is answering the question inline and users don’t need to click — in which case citation is still valuable for brand awareness, even without the click. (2) The citation is buried in a long source list and users only click the top result. In (2), the path forward is to be cited at position 1 or 2, which usually comes from being the most specific page on the internet for that query.

Traffic, but no conversions

Almost always a landing-page mismatch. The AI is sending visitors to a page that answers the question they asked the assistant, but doesn’t lead them to a next step. Add a clear call to action on the pages getting AI traffic — booking link, signup form, contact details — and re-measure.

Nothing is moving at all

Re-run check 1. The most common cause of total silence is that /llms.txt isn’t actually being served, or is being served with the wrong Content-Type. We’ve seen this many times: the file exists in the repo, the Cloudflare cache says 200, the bots are returning 404 because of a path conflict with a Next.js dynamic route. Always verify from the AI bot’s perspective — fetch the file with curl -A "GPTBot" and look at the actual response.

FAQ

How do I tell if ChatGPT has actually fetched my site?

Look at your server access logs for the User-Agent string 'GPTBot' (the OpenAI crawler) and 'ChatGPT-User' (used when a logged-in user explicitly browses to your site through ChatGPT). Most sites see at least one GPTBot fetch within a week of publishing llms.txt. If you don't, check that GPTBot isn't blocked in robots.txt and that /llms.txt returns HTTP 200 with Content-Type text/plain or text/markdown.

Why is my site cited on Perplexity but not on ChatGPT (or vice versa)?

Each assistant uses different retrieval pipelines. Perplexity is closer to a real-time AI search engine and tends to cite recently-published, well-structured content. ChatGPT's browsing path is more conservative and favours sites with established authority signals. Differences across assistants are normal — track each separately and don't expect parity in the first six months.

I'm cited but the AI is paraphrasing, not quoting. Is that OK?

Yes — that's the most common form of citation. The AI ingested your page, summarised it in its own words, and added you to the source list. You still get the link-out and the citation credit. Direct quotes happen but are rarer; they tend to occur when your wording is unusually specific (statistics, definitions, named approaches).

Will Google Search Console show whether AI Overviews cited my page?

Not directly as of 2026. AI Overview impressions and clicks are bundled into ordinary Search Console data for the underlying query. Cited pages typically show a higher impressions count without a proportional clicks increase (the answer is rendered inline). Compare your trend lines for queries you know AI Overviews answers, and you can usually back out the impact.

How often should I run a citation check?

Monthly is the right cadence for almost every site. AI answers shift more slowly than Google rankings, and noisier than search positions, so weekly checks just generate noise. If you make a substantial content change — new pricing page, new product launch, restructured docs — run an extra check 2–3 weeks after publishing.

What if my site never appears in AI answers?

First, verify the basics: llms.txt is reachable, AI bots aren't blocked in robots.txt, and the relevant pages are crawlable. Then question the queries you're testing — if you're searching for very generic terms ("best CRM"), citation against entrenched competitors is tough. Move down the funnel to specific, mid-tail queries ("CRM for landscaping companies in Texas") where the citation slot is more winnable. Citation share is built one mid-tail query at a time.