Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOHigh

Publish a robots.txt file

Checks if robots.txt exists at the root, is accessible, and contains valid directives.

Utilities
Quick take
Typical fix time 10 min
  • Serve a valid `robots.txt` at `/robots.txt` on the production domain, returning HTTP 200
  • Include a `Sitemap:` directive pointing to your XML sitemap
  • Never disallow crawling of CSS/JS assets that render your pages
  • Avoid blocking all crawlers with `Disallow: /` on a live site
Why it matters: robots.txt is the first file crawlers fetch; misconfigured directives can silently block search engines from crawling your entire site, killing organic visibility.

Rule Details

robots.txt is a plain-text file at the root of your domain that tells crawlers which paths they are allowed to access. All major search engines respect it before crawling any other resource.

Code Example

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
 
Sitemap: https://www.example.com/sitemap.xml

Why It Matters

robots.txt is the first file crawlers fetch; misconfigured directives can silently block search engines from crawling your entire site, killing organic visibility.

Common Mistakes

❌ Blocking the entire site

User-agent: *
Disallow: /

This prevents any page from being crawled and indexed. Remove or replace with specific paths.

❌ Blocking CSS and JavaScript

User-agent: Googlebot
Disallow: /assets/

Googlebot renders pages with JavaScript. Blocking /assets/ prevents it from seeing your actual content.

✅ Blocking admin and internal paths only

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Allow: /
 
User-agent: Googlebot-Image
Disallow: /images/internal/
 
Sitemap: https://www.example.com/sitemap.xml

Rules

  • The file must be served at exactly /robots.txt on the production domain
  • Return HTTP 200; a 404 or 5xx tells Google there are no restrictions (Google treats 4xx as "no restrictions"; 5xx blocks crawling)
  • User-agent: * applies to all crawlers; use specific bot names for targeted rules
  • Disallow: with an empty value means "allow everything" — this is the default
  • The Sitemap: directive must be an absolute URL

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against Google: robots.txt introduction before treating the rule as satisfied.
  • Check the implementation against Robots Exclusion Protocol (RFC 9309) before treating the rule as satisfied.

Verification

Automated Checks

  • Use Google Search Console (opens in new tab) → Settings → Robots.txt tester
  • Fetch /robots.txt directly in a browser on the production domain
  • Use curl -I "$ORIGIN/robots.txt" against the live host to confirm the production file returns HTTP 200

Manual Checks

  • Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Fetch `/robots.txt` on the live domain and verify it returns HTTP 200, uses correct `User-agent` / `Disallow` / `Allow` syntax, and includes a `Sitemap:` directive pointing to the XML sitemap. Check for accidental `Disallow: /` directives.

Fix

Auto-fix issues

Create or update `robots.txt` at the web root with valid directives. Add a live `Sitemap:` line for the production sitemap URL. Remove any `Disallow: /` rules that block the whole site or resources needed for rendering.

Explain

Learn more

Explain how robots.txt controls crawler access, why an accidental `Disallow: /` can delist a site, and why CSS/JS must remain accessible for rendering-based indexing.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to Publish a robots.txt file. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

Create and submit an XML sitemap

An XML sitemap is available at /sitemap.xml and includes all important pages.

SEO
Avoid conflicting indexability signals

Detects conflicting signals between robots.txt, meta robots, X-Robots-Tag headers, and canonical tags

SEO
Noindex in Sitemap

Checks for noindexed pages listed in sitemap

SEO
Robots Meta Conflict

Detects pages blocked by robots.txt that also carry noindex meta tags, creating a paradox where the directive is never read.

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385