Guide
How to validate an llms.txt file
The llms.txt spec was published by Jeremy Howard in 2024 as a way for websites to give large language models a short, structured pointer to their most important content. Like robots.txt or sitemap.xml, it lives at the root of your site — except it’s written in Markdown for AI to read, not crawlers to follow.
A valid file isn’t complicated, but small mistakes (missing H1, malformed link bullets, wrong Content-Type) cause tools and crawlers to silently skip it. This validator runs twelve checks against the spec and tells you exactly what, if anything, to fix.
What a valid llms.txt looks like
# Acme Corp > Open-source database for full-text search across structured documents. ## Docs - [Quickstart](https://acme.example/docs/quickstart): Get a cluster running in 5 minutes. - [API reference](https://acme.example/docs/api): Full HTTP API. - [SDKs](https://acme.example/docs/sdks): Official Python, Go, and JS clients. ## Optional - [Architecture](https://acme.example/blog/architecture): How the index is sharded. - [Benchmarks](https://acme.example/blog/benchmarks): Performance vs. competitors.
Four ingredients: an H1 with the site name, a one-line blockquote summary, one or more ## sections, and link list items in the form - [name](url): optional description.
The 12 checks, explained
1. Served at /llms.txt
The file must live at the root of your domain — so https://yoursite.com/llms.txt, not /docs/llms.txt or /.well-known/llms.txt. Subdirectories break discoverability because crawlers look at the root by convention.
2. Correct Content-Type
Serve the file as text/plain or text/markdown. Some hosts (looking at you, Netlify with default settings) serve .txt files as application/octet-stream, which can trip up parsers and trigger downloads instead of rendering.
3. Non-empty file
An empty file is treated the same as no file at all. Even a five-line stub beats nothing.
4 & 5. Has exactly one H1 title
The first line must be # Your Site Name. The spec uses this as the canonical site title. Multiple H1s confuse the structure; zero H1s mean the file fails validation against the official llms.txt parser.
6. H1 is the first line
Cosmetic but important: most parsers expect the H1 at the very top, with no preceding blank lines or comments.
7. Has a summary blockquote
Place a one-line summary in a > blockquote directly under the H1. This is the single line that AI systems quote when they describe your site. Make it declarative and concrete: “Open-source database for full-text search” beats “Welcome to our site.”
8. Has section headings
Group your links under ## sections like ## Docs, ## Pages, or ## Optional. The ## Optional section in particular is a spec convention for content the LLM can fetch on demand but skip if context budget is tight.
9, 10, 11. Link list items
The spec is strict about link format: - [Display name](https://url): optional description. Use absolute URLs (relative URLs break when the file is consumed out of context). The validator counts how many bullets follow this format, how many don’t, and whether the URLs parse as valid HTTP(S).
12. Reasonable size & trailing newline
Keep llms.txt small — under 100 KB is a reasonable cap. If you have a lot of detail to expose, put it in a separate llms-full.txt instead. Ending the file with a newline is a Unix convention some parsers rely on; it’s a one-keystroke fix.
FAQ
Is llms.txt the same as robots.txt?
No. robots.txt tells crawlers what they may not crawl. llms.txt actively tells AI systems where the canonical version of your content lives, in a Markdown format optimized for them to read. They’re complementary.
What happens if my llms.txt fails validation?
Some AI systems will silently ignore it; others will fall back to crawling your full HTML, navigation and all, which is what you wanted to avoid. There’s no error log — that’s why a validator matters.
Do I need llms-full.txt as well?
Optional. llms-full.txt is the full Markdown of your most important pages concatenated together, served at /llms-full.txt. Big documentation sites (Stripe, Anthropic, Cloudflare) ship both. For most sites, just llms.txt is enough.
How often should I revalidate?
Whenever you change site structure (rename pages, drop sections). Otherwise quarterly is fine. Spec compliance is binary — once it passes, it stays passing unless you actively edit the file.
Why is my Content-Type wrong even though the file is fine?
Almost always a host config issue. Netlify, Vercel, and Cloudflare Pages all let you set Content-Type per file via _headers, vercel.json, or the dashboard. Check our setup guides for platform-specific snippets.