What exactly is llms.txt and what does the file look like?

llms.txt is a plain text file, written in Markdown, that a website publishes to give large language models a clean, curated map of its most important content. Instead of forcing an AI crawler to wade through navigation menus, cookie banners, ads, and JavaScript, llms.txt hands it a concise, machine readable summary of what the site is and where the good stuff lives. It was proposed by Jeremy Howard of Answer.AI in September 2024 as a way to make sites easier for LLMs to read and reason over.

The format is deliberately simple. It opens with a single H1 that names the project or site. Below that sits an optional blockquote that summarizes, in a sentence or two, what the site offers and who it serves. After the summary you list curated links grouped under H2 headings, where each link points to a key page and carries a short description of what that page covers. A common convention is an H2 named Optional that holds secondary links an LLM can skip when it needs to save tokens or stay focused.

How is llms.txt like robots.txt but different?

The easiest mental model is robots.txt for LLMs, with one crucial difference. robots.txt tells crawlers what they are not allowed to access, so it is a file about restriction and exclusion. llms.txt does the opposite. It is a file about invitation and curation, telling AI systems here is exactly what matters and here is how to find it fast.

The two files also serve different audiences. robots.txt is a decades-old standard read mainly by search engine bots and governed by crawl directives. llms.txt is aimed at the reasoning layer of generative engine optimization, meaning the models behind ChatGPT, Perplexity, Claude, Gemini, and AI Overviews that try to understand and summarize your content rather than simply index it. They coexist happily. You keep robots.txt to manage crawl access and add llms.txt to guide comprehension.

Where do you host llms.txt and what is llms-full.txt?

Host llms.txt at the root of your domain so it resolves at https://yourdomain.com/llms.txt. The root location is part of the convention, the same way robots.txt lives at the root, so AI crawlers and tools know exactly where to look without guessing. Serve it as plain text and keep the links absolute so a model can follow them without resolving relative paths.

There is a companion file called llms-full.txt. Where llms.txt is a lean index of links and descriptions, llms-full.txt inlines the actual content of those pages into one long Markdown document. The idea is that a model can ingest your full documentation or knowledge base in a single fetch rather than crawling page by page. It is popular with software and documentation sites. For most marketing sites the lighter llms.txt is the practical starting point, and you add llms-full.txt only when you genuinely want the entire corpus available in one file.

Does llms.txt actually help, and is it a real standard yet?

Honest answer first. llms.txt is an emerging proposal, not an official standard, and as of 2025 the major AI providers have not publicly confirmed that their crawlers read it. Google's John Mueller has been openly skeptical, comparing it to the long-ignored keywords meta tag, and noted that no AI system he was aware of used it. So you should treat it as a low cost bet rather than a guaranteed ranking lever.

That said, adoption is climbing. Documentation platforms like Mintlify generate llms.txt automatically, and a growing list of developer-focused companies including Anthropic, Cloudflare, Stripe, and many others now publish one. Tools such as Perplexity-adjacent crawlers and several open source agents already look for it. The realistic view is that llms.txt costs little to add, signals that your site is AI-aware, and positions you for a convention that may harden over the next year or two. It will not rescue thin content, but it removes friction for any model that chooses to use it.

How does llms.txt fit with schema and strong content?

llms.txt is one signal in a stack, not a replacement for the fundamentals. Structured data, meaning JSON-LD schema like Article, FAQPage, Organization, and Product, still does the heavy lifting of telling machines what each entity on a page means. Schema lives inside individual pages and describes them in detail. llms.txt sits above all of that and describes the site as a whole, pointing models toward the pages worth reading.

Underneath both, content quality remains the deciding factor. AI engines cite sources that answer a question clearly, lead with a direct answer, and back claims with specifics. llms.txt and schema make that strong content easier for a machine to find and parse, but they cannot manufacture authority on their own. The winning combination is genuinely useful, answer-first pages, marked up with accurate schema, and surfaced through a clean llms.txt index.

How do you create an llms.txt file?

You can write one by hand in a few minutes or generate it. The manual route is straightforward and gives you the most control over what models see first.

  • Start with an H1 that names your site or brand, for example a single line reading your company name.
  • Add a blockquote summary directly below it that states what you do and who you serve in one or two plain sentences.
  • Create H2 sections for your most important content groups, such as Services, Guides, Docs, or About.
  • Under each H2, list links as Markdown with a short description after each one, using absolute URLs that a crawler can follow directly.
  • Add an H2 named Optional for secondary links a model can safely skip when it needs to conserve tokens.
  • Save the file as llms.txt in plain text and upload it to the root of your domain so it resolves at /llms.txt.
  • Optionally generate the file with a tool or a docs platform like Mintlify, then review the output so the curation reflects your real priorities.

What are the most common llms.txt mistakes?

Most problems come from treating llms.txt like a sitemap dump or an afterthought rather than a curated guide.

  • Listing every URL on the site, which buries your best pages and defeats the point of curation.
  • Placing the file in a subfolder instead of the root, so tools that expect /llms.txt never find it.
  • Using relative links that a model cannot resolve, rather than full absolute URLs.
  • Writing vague or missing descriptions, which strips away the context that helps a model decide what to read.
  • Letting it go stale, so the links point to retired pages and the summary no longer matches the business.
  • Stuffing it with keywords, which adds noise and signals manipulation rather than clarity.
  • Assuming it replaces robots.txt, schema, or quality content, when it only complements them.