Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOMedium

Avoid conflicting indexability signals

Detects conflicting signals between robots.txt, meta robots, X-Robots-Tag headers, and canonical tags

Utilities
Quick take
Typical fix time 10 min
  • robots.txt blocks crawling; `noindex` blocks indexing — they are different mechanisms and should not be applied together
  • A page blocked in robots.txt cannot receive a `noindex` directive because crawlers never read the page
  • Canonical tags pointing to a `noindex` page create an unresolvable conflict — canonicalise to an indexable URL instead
Why it matters: Conflicting indexability directives create unpredictable crawling and indexing behaviour. The most dangerous combination is robots.txt blocking a page that also has a `noindex` tag — the `noindex` is never read, but the URL is still known to Google, leaving it in a limbo state that wastes crawl budget.

Rule Details

Indexability signals such as robots.txt, meta robots, X-Robots-Tag headers, and canonicals each control a different part of crawling and indexing. Google's robots-meta documentation (opens in new tab) is explicit that these signals do different jobs, so conflicts between them create the same ambiguity addressed in indexability checks.

Code Examples

❌ Avoid — robots.txt blocking a page that has noindex

# robots.txt
User-agent: *
Disallow: /private/   # Blocks crawling
<!-- /private/page.html — never read because robots.txt blocked it -->
<meta name="robots" content="noindex">

✅ Correct — use only noindex (remove robots.txt block)

# robots.txt — no Disallow for /private/
User-agent: *
Disallow: /admin/   # Only block what truly must not be crawled
<!-- /private/page.html — crawler reads noindex correctly -->
<meta name="robots" content="noindex, follow">

❌ Avoid — canonical pointing to a noindex page

<!-- /product?color=red — the canonical-url page -->
<link rel="canonical" href="/product">
 
<!-- /product — the canonical-url destination is noindex! -->
<meta name="robots" content="noindex">

✅ Correct — canonical-url points to an indexable page

<!-- /product?color=red -->
<link rel="canonical" href="/product">
 
<!-- /product — indexable, no noindex -->
<!-- (meta robots either absent or set to "index, follow") -->

✅ Consistent signal for pages to exclude

<!-- For pages you want excluded from search: -->
<!-- Option A: noindex only (preferred — lets Google read the tag) -->
<meta name="robots" content="noindex, follow">
 
<!-- Option B: robots.txt block only (if the page must not be fetched) -->
<!-- Use this for pages with sensitive data or high crawl cost -->

Why It Matters

  • Blocked + noindex = unresolvable: If robots.txt blocks a URL, the noindex tag on that page is never read. Google knows the URL but can neither confirm nor deny it should be excluded.
  • Canonical to noindex: Canonicalising to a noindex page tells Google "this is the preferred URL" while also saying "don't index it" — a contradiction.
  • Crawl budget waste: Conflicting signals cause Google to repeatedly attempt to resolve the conflict by re-crawling the page, which is why the robots.txt specification (opens in new tab) should be reviewed alongside page-level directives.

Common Conflict Types

ConflictEffect
robots.txt blocks + noindex on pagenoindex is never read
noindex in meta + index in X-Robots-TagMost restrictive wins (noindex)
Canonical → noindex pageUndefined behaviour; canonical-url may be ignored
Sitemap includes noindex URLsConflicting inclusion/exclusion signals

How to Audit

  1. Parse your robots.txt to extract all Disallow patterns.
  2. Crawl your site and, for each URL, check whether it matches a Disallow rule.
  3. For matching URLs, attempt to fetch the page (to simulate what happens before the block) and check for noindex in the HTML or X-Robots-Tag header.
  4. Use Google Search Console's Coverage report to find URLs in "Blocked by robots.txt" that are also receiving "noindex" signals.

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against Google Search Central: Robots meta tag, data-nosnippet, and X-Robots-Tag before treating the rule as satisfied.
  • Check the implementation against Google Search Central: Robots.txt specification before treating the rule as satisfied.

Verification

Automated Checks

  • Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
  • Test the affected URL with Google Search Console or equivalent tooling where relevant.
  • Re-crawl a representative page set after deployment.

Manual Checks

  • Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

For each page, collect four signals: (1) Is the URL path blocked by robots.txt? (2) Does the page HTML contain `<meta name='robots' content='noindex'>`? (3) Does the HTTP response include an `X-Robots-Tag: noindex` header? (4) Does the page's `<link rel='canonical'>` point to a different URL? Flag: pages blocked in robots.txt that also have noindex directives, pages with canonical pointing to a noindex URL, and pages with conflicting index/noindex signals from meta and HTTP header.

Fix

Auto-fix issues

1. Identify all pages where robots.txt blocks crawling AND the page also has `noindex`: - If you want the page excluded from the index: remove the robots.txt rule; keep the `noindex` so crawlers can read it. - If you want to block all crawling: remove `noindex` (irrelevant if not crawled); keep the robots.txt block. 2. Identify canonical tags pointing to `noindex` pages: - The canonical-url destination must be an indexable page. - Change the canonical-url to point to an indexable URL, or remove `noindex` from the destination. 3. Identify pages with both `index` and `noindex` in meta robots (from different tags or sources): - Google uses the most restrictive directive; resolve to a single clear intent. 4. Verify after fixing using Google Search Console URL Inspection for each affected page.

Explain

Learn more

robots.txt is a crawl directive; `noindex` is an indexing directive. They operate at different stages of Google's pipeline. A page blocked in robots.txt is never fetched, so its `noindex` tag is never read — yet the URL is still known from sitemaps or links, keeping it in a crawl ambiguity state. Google's documentation explicitly warns against blocking pages in robots.txt that you also want to declare as `noindex`.

Review

Code review

Programmatically fetch robots.txt and parse its Disallow rules. For each page URL, determine if it matches a Disallow pattern. If yes, fetch the page HTML and check for `<meta name='robots'>` tags — flag if noindex is present. Also check the `X-Robots-Tag` HTTP response header for conflicts with the meta tag. Report the specific conflict type for each flagged URL.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Спецификации метатегов robots | Центр Google Поиска  |  Documentation  |  Google for Developers

Узнайте, как с помощью метатегов robots и настроек на уровне страницы или текста управлять показом вашего контента в результатах поиска Google.

Google for DevelopersGuide
Robots.txt Dosyası Oluşturma ve Gönderme | Google Tarama Altyapısı  |  Crawling infrastructure  |  Google for Developers

Robots.txt dosyası, sitenizin kök dizininde bulunur. Robots.txt dosyası oluşturmayı öğrenin, örneklere bakın ve robots.txt kurallarını inceleyin.

Google for DevelopersGuide

Rules that often go hand-in-hand with this one.

Make important pages indexable

Identifies important pages blocked from search engine indexing by noindex, robots.txt, or other directives

SEO
Set robots meta directives correctly

Checks robots meta tag for valid indexing directives in the page head.

SEO
Robots Meta Conflict

Detects pages blocked by robots.txt that also carry noindex meta tags, creating a paradox where the directive is never read.

SEO
Schema + Noindex Conflict

Detects pages that carry rich result schema markup but are blocked from indexing via noindex or robots.txt.

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385