KnownByLLM

Complete guide · 15 min read

What is llms.txt?

The complete guide for site owners (2026).

llms.txt is a plain-text file you place on your website so that AI tools like ChatGPT, Claude, and Perplexity can read and understand your business quickly. The spec was published in 2024, and major SaaS companies — Stripe, Anthropic, Cloudflare, Vercel — have already adopted it.

This guide is written for site owners who manage their own website and want to handle this themselves, without hiring a developer. We’ll walk through every step, and we’ll also answer the honest questions: “Does this actually do anything?” and “Will AI just scrape my content without permission?”

1. Why a new file, and why now

For the past 20 years, getting found online meant one thing: ranking on Google. You optimized your pages, built links, and tried to land in those 10 blue links on the first page. That model is being disrupted fast.

When someone asks ChatGPT or Perplexity a question, they don’t get a list of links — they get an answer. That answer typically cites 2–7 sources, and the sites that get cited see real traffic from it. This is what people mean by “AI search.”

For a site owner, two things have changed.

  • Some of your potential customers are now arriving via AI answers, not search results pages.
  • Whether AI cites you or ignores you now directly affects how many people find your business.

The problem is that AI tools read websites differently than Google does. An AI has to load your entire HTML, then wade through navigation menus, JavaScript, cookie banners, ads, and footers just to extract the useful content. That’s expensive for the AI, and the result is that sites with cluttered structure get cited less often.

llms.txt solves that from the website’s side. It was proposed by Jeremy Howard (co-founder of Answer.AI, creator of fast.ai) and published in September 2024. Adoption among high-traffic sites has been growing steadily since.

2. The jargon decoded — GEO, AEO, and llms.txt

If you’ve been reading marketing blogs lately, you’ve probably seen a cluster of new acronyms. Here’s what they actually mean.

GEO (Generative Engine Optimization)
The most widely used term in English-speaking SEO circles. It refers to optimizing your web content so it gets picked up and cited by generative AI tools — ChatGPT, Perplexity, Google AI Overviews, and similar. Think of it as SEO, but for AI answers instead of search rankings.
AEO (Answer Engine Optimization)
A synonym for GEO used by some agencies and publications. The emphasis is on being the source an AI chooses when it constructs a direct answer to a user's question. The two terms are interchangeable in practice.
llms.txt
The concrete file format that sits at the foundation of any GEO or AEO strategy. It's a Markdown file you place at the root of your website that tells AI tools: here's who we are, here's what we do, and here are the pages worth reading. Without it, AI has to figure all of that out by crawling hundreds of pages.

Put simply: GEO and AEO are the strategy, and llms.txt is one of the most practical first steps in executing that strategy. For the rest of this guide we’ll just say “AI optimization” to keep things clear.

3. The specific problem llms.txt solves

Let’s make this concrete. Say your business is a small accounting firm based in Austin, Texas.

A potential client types into ChatGPT: “Who are the best accountants in Austin?” ChatGPT tries to read several local accounting websites to build its answer. For each site, it has to process:

  • The full HTML of multiple pages (often hundreds of KB)
  • JavaScript-rendered content (which sometimes fails to load entirely)
  • Navigation, footers, cookie banners, and ad slots
  • Any gated pages that require a login (which it can’t access)

That overhead is real, and the outcome is that clean, readable sites get cited more often. A site with llms.txt in place gives the AI:

  • A complete picture of the site in a single request
  • Roughly one-tenth the tokens compared to crawling the full HTML (based on reported figures)
  • A higher chance of being cited as a result

The simplest way to think about llms.txt is that it’s like handing a well-designed business card to someone you just met. You could give them a stack of brochures instead, but the card is what they’ll actually remember.

4. Anatomy of the file — a real example

llms.txt has just four building blocks. Here’s what a complete file looks like for our example accounting firm.

# Lone Star Accounting

> Independent accounting and tax practice serving clients across central Texas since 2008.

## Services
- [Bookkeeping](https://example.com/services/bookkeeping): Monthly and quarterly bookkeeping, starting at $200/month.
- [Tax prep](https://example.com/services/tax): Federal and state tax preparation for LLCs, S-corps, and sole proprietors.
- [Payroll](https://example.com/services/payroll): Full-service payroll processing and compliance for businesses with 1–50 employees.
- [Advisory consulting](https://example.com/services/advisory): Ongoing CFO-style advisory on a monthly retainer.

## Company info
- [About us](https://example.com/about): Firm history, team bios, and credentials.
- [Pricing](https://example.com/pricing): Fee schedules for all services.
- [Contact](https://example.com/contact): Book a free 30-minute consultation.

## Optional
- [Blog](https://example.com/blog): Tax tips and finance guidance, updated monthly.
- [Client stories](https://example.com/cases): Real outcomes from clients across Austin and the Hill Country.

Each of the four building blocks has a specific job.

  1. 01

    # H1 title

    Your business name on line one. This is the single place where the AI learns who owns the site. It must be the very first line of the file.

  2. 02

    > Summary (blockquote)

    A one- or two-sentence description of your whole business. When an AI cites you, this often becomes the brief label it attaches to your name. Pack in concrete details: industry, location, who you serve, and how long you've been doing it. "Independent accounting and tax practice serving clients across central Texas since 2008" is far more useful than "Trusted professionals dedicated to your success."

  3. 03

    ## Sections (one or more)

    Group your most important pages by topic. "Services," "Company info," and "Optional" are natural groupings for most sites. "Optional" is a special heading defined in the spec — it signals to AI that these pages are lower priority but fair to consult if relevant.

  4. 04

    - [Link text](URL): description

    The entries within each section. The description after the colon is the most important part. The AI reads it to decide which page answers a given user question. Be specific: include prices, service details, or target audience — whatever makes this page different from the others.

The spec also defines rules around line breaks, blank lines, encoding, and file size — but you don’t need to memorize any of that. The validator tool (covered in Step 5) will flag anything that’s off.

5. Common misconceptions answered

Here are the questions we hear most often from site owners thinking about adding llms.txt.

Q1. Will one file really make a difference?

Honestly, dropping the file alone won’t overnight flood your inbox with leads. llms.txt is to AI what sitemap.xml is to Google — it’s the foundation, not the whole building. Without it, even great content may not reach AI tools efficiently.

The case for doing it is the effort-to-reward ratio: setup takes 30–60 minutes and costs nothing, while AI-driven traffic is only going to grow. The cost of waiting is higher than the cost of acting.

The real driver of citations is content quality. llms.txt builds the road; you still need something worth arriving at.

Q2. Will AI use this to train on my content without permission?

This is worth separating carefully. llms.txt is for retrieval — helping AI cite your site in real-time answers — not for training.

  • Training: The process of feeding text into an AI model when it’s being built. If you want to block that, use robots.txt to Disallow GPTBot, ClaudeBot, and similar crawlers.
  • Retrieval: When a user asks a question, the AI reads your site in real time to construct its answer. That’s the use case llms.txt is optimizing for.

Adding llms.txt does not grant training permission. If you want to say “citation welcome, training not welcome,” the right place to say it is robots.txt. Here’s an example:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Allow: /

One caveat: some newer AI systems don’t separate their training crawler from their retrieval crawler. Perfect “retrieval only” control isn’t guaranteed by any current standard. It’s worth knowing that going in.

Q3. How long before I see results?

AI citation doesn’t follow the Google crawl cycle of “indexed in a few days.” Many AI tools read your site on demand — the moment someone asks a relevant question. The day after you publish, your file could already be influencing answers.

That said, there’s no industry-standard dashboard for tracking AI citations yet. In the meantime, watch these signals:

  • Check your server access logs for visits from GPTBot, ClaudeBot, and PerplexityBot. Log the monthly counts.
  • Once a month, search ChatGPT, Claude, and Perplexity for queries your customers would ask (“accountants in Austin”). See if you appear as a cited source.
  • Watch Google Search Console referrals for traffic from AI-related domains.

Q4. Does this affect my Google rankings?

Not directly. Google’s ranking algorithm does not use llms.txt (as of May 2026).

Indirectly, though, there’s a positive side effect. Writing a good llms.txt forces you to clarify your site structure, sharpen your page titles, and tighten your summaries. All of that is good for regular SEO too — for the same reason that a clean sitemap.xml helps Google find your pages.

Q5. How is this different from robots.txt or sitemap.xml?

The three files do three different things.

  • robots.txt: Tells crawlers where not to go. Access control.
  • sitemap.xml: Tells search engines every URL that exists on your site. Exhaustive.
  • llms.txt: Tells AI what your site is about and which pages matter most. A curated table of contents.

They don’t compete — they complement each other. Ideally you have all three. For a deeper comparison, see our separate article “llms.txt vs robots.txt vs sitemap.xml”.

Q6. My site is small. Is this even worth my time?

Small sites often benefit more, not less. Here’s why:

  • Large corporate sites are structurally complex. A small site site with a clean llms.txt is actually easier for AI to parse than a Fortune 500 site with no file at all.
  • In a local or niche query (“accountants in Austin”), there are fewer competitors, so the bar for being cited is lower.
  • Early movers have an advantage. As of 2026, an SE Ranking study of roughly 300,000 domains found that about 10% of websites have adopted llms.txt, while adoption among top-traffic sites remains under 1%.

Q7. Do I need to hire someone, or can I do this myself?

You can absolutely do this yourself. It’s a single text file placed in a specific folder on your server — easier than editing a page of HTML.

The next two sections walk through the exact steps to write the file and then publish it on WordPress, Shopify, Wix, Squarespace, Webflow, and static HTML setups. If you already update your site yourself, you have everything you need.

6. Build and publish your own — step by step

This is the practical section. Budget 30–60 minutes.

Step 1. Pick your 5–15 most important pages

Start by writing a list. If an AI were going to introduce your business to a potential customer, which pages would you most want it to reference?

  • Your services or products list, plus individual service pages
  • Your About page (who you are, where you’re located, credentials)
  • A pricing or fees page
  • Your contact or booking page
  • Key case studies or testimonials (if you have them)
  • Your two or three most-read blog posts (if applicable)

If the list tops 15 pages, cut it down. llms.txt is not meant to list every URL on your site — that’s what sitemap.xml is for. This file is a curated shortlist of the pages you most want AI to understand.

Step 2. Write a one-sentence description for each page

This is the most important part of the whole exercise. The AI reads these descriptions to decide which of your pages is relevant to a given user question. Good descriptions:

  • Use specific language — not “world-class service” but “monthly bookkeeping for Austin small businesses, starting at $200/month”
  • Fit in one sentence (aim for under 60 words)
  • Mention something unique to that page — differentiate it from your other pages
  • Lead with facts, not taglines

Step 3. Write a one-line summary of your whole business

This goes in the > blockquote at the top. It’s the single sentence the AI is most likely to use when it mentions your business by name. Aim to include your industry, location, who you serve, and tenure.

Weak example: “Committed to helping our clients succeed.”

Strong example: “Independent accounting and tax practice serving small businesses across central Texas since 2008.”

Step 4. Save it as a plain-text file

Any plain-text editor works — Notepad, VS Code, TextEdit (in plain text mode), or even Google Docs (export as .txt). Use the structure from the example above and save the file as llms.txt. Make sure the encoding is set to UTF-8 — on Windows Notepad, you can choose this in the Save dialog.

Step 5. Validate before you upload

Before uploading, run the file through a validator to catch any formatting mistakes — missing H1, wrong link syntax, encoding issues, and so on. Errors like these can prevent AI tools from reading the file at all.

llms.txt validator

Paste your file content and run a 12-point spec compliance check. Any issues are flagged with clear instructions on how to fix them.

Open the validator →

Step 6. Upload it to your site’s root

The file must live at your site’s root — meaning it’s accessible at https://yoursite.com/llms.txt. Subdirectory paths like /docs/llms.txt or /.well-known/llms.txt are not recognized by the spec.

Platform-specific instructions are in the next section.

7. How to add it on the major platforms

Below are the key steps for the most common platforms. For screenshots and more detailed walkthroughs, see our platform-by-platform guides.

WordPress (self-hosted)

  1. Open an FTP client (FileZilla is free) or use your host's file manager — most cPanel and Plesk hosts have one built in.
  2. Navigate to your site's document root (usually public_html or htdocs).
  3. Upload your llms.txt file there.
  4. Visit https://yoursite.com/llms.txt in a browser to confirm it loads as plain text.

Note: WordPress.com (the managed hosted version) does not allow root file uploads. This method only works on self-hosted WordPress.org installs.

Shopify

  1. In your Shopify admin, go to Content → Files and upload llms.txt.
  2. Shopify stores uploaded files under /cdn/shop/files/, not the site root — so a direct file upload won't work on its own.
  3. The practical workaround is to use a dedicated Shopify app (search "LLMs.txt" in the Shopify App Store) that handles the routing automatically.

Note: Shopify doesn't support arbitrary root-level files natively. A dedicated app is the most reliable solution.

Wix

  1. Wix does not currently offer a native way to serve a custom file from your domain root (as of May 2026).
  2. A common workaround is to create a dedicated page at a path like /llms-txt and paste your content there — it's not full spec compliance, but some AI crawlers will pick it up.
  3. For full compliance, consider pointing a subdomain or secondary domain to a static host (Netlify, Vercel) where you can place the file at the root.

Note: Wix is currently the most difficult major platform for llms.txt deployment. If this is a dealbreaker, migrating to Wix Studio or a different platform may be worth considering.

Squarespace

  1. Squarespace does not support uploading arbitrary files to the site root.
  2. One workaround: use Squarespace's URL forwarding feature to redirect /llms.txt to a raw file hosted on a service like GitHub (raw.githubusercontent.com) or Pastebin.
  3. Another option is to inject a custom script via Settings → Advanced → Code Injection that serves the content — but this approach depends on the AI crawler executing JavaScript, which is not guaranteed.

Note: Neither workaround is a perfect solution. Full spec compliance on Squarespace currently requires a third-party hosting layer.

Webflow

  1. In your Webflow project, go to the Hosting tab.
  2. Find the "Custom files" section and upload your llms.txt file.
  3. Webflow will serve it from your domain root automatically.
  4. Republish your site and verify at https://yoursite.com/llms.txt.

Note: Webflow's custom files feature handles this cleanly — no workarounds needed.

Static HTML / GitHub Pages / Netlify / Vercel / S3

  1. Drop llms.txt into the public root of your project (the same folder as your index.html).
  2. Commit and push (for GitHub Pages, Netlify, or Vercel), or upload to your S3 bucket root.
  3. Verify at https://yoursite.com/llms.txt.

Note: This is the simplest case. Five minutes start to finish.

8. Checking that everything works after you publish

Right after publishing, run through these three checks. Don’t skip them — a file that exists but isn’t served correctly is the same as no file at all.

  1. 01

    The URL loads correctly

    Open https://yoursite.com/llms.txt in your browser. The raw Markdown content should appear as plain text in the window. If you get a 404, the file is in the wrong location. If a download starts, the Content-Type header needs fixing.

  2. 02

    The Content-Type header is correct

    Open your browser's developer tools (F12), go to the Network tab, reload the page, and click the llms.txt request. The Content-Type response header should be text/plain or text/markdown. If it says application/octet-stream, some AI crawlers will skip the file. You'll need to add a MIME type rule in your server or .htaccess config.

  3. 03

    The file passes the spec validator

    Run it through our validator once a month, and again any time you update your site structure. The 12-point check catches issues that are easy to miss by eye.

For ongoing monitoring:

  • Check your server logs monthly for visits from GPTBot, ClaudeBot, and PerplexityBot.
  • Once a month, ask ChatGPT, Claude, and Perplexity the kinds of questions your customers would ask. Check whether your site appears as a cited source. If it doesn’t, revisit your content rather than your llms.txt — the file is the path, but the content is what earns the citation.

9. What to keep doing — and what to skip

Keep doing

  • Review the file quarterly: Update it when you add or discontinue services, change pricing, or restructure your site.
  • Run a citation check monthly: Ask the major AI tools the questions your customers ask, and see where you show up.
  • Add new important pages when they go live: Keep the total under 15 — swap out a lower-priority page rather than just appending.

Skip

  • Daily or weekly updates. llms.txt describes your site’s structure, not today’s news. It doesn’t need to change often.
  • Listing every page on your site. That’s sitemap.xml’s job. Keep these two files separate.
  • Producing AI-specific content in bulk. AI cites content that is genuinely useful to humans. Write for your customers first, then use llms.txt to point AI toward the best of what you’ve already written.

10. Your next step

You’ve got everything you need. The fastest path from here is two steps.

  1. 01

    Generate a draft in 30 seconds

    Enter your site's URL in our generator. It crawls your site and produces a spec-compliant draft. It'll get you 60–80% of the way there instantly.

  2. 02

    Spend 5–15 minutes refining it

    Open the draft and apply what you learned in Steps 2 and 3 above: sharpen the page descriptions, rewrite the summary in your own words. That last 20% is what makes the difference.

Generate your llms.txt draft

Paste in your site URL and we'll crawl it and produce a spec-compliant llms.txt on the spot. Free, no account required.

Open the generator →

We hope this guide helps your business get the visibility it deserves in the AI search era.

Keep reading