Avoid conflicting indexability signals
Detects conflicting signals between robots.txt, meta robots, X-Robots-Tag headers, and canonical tags
- robots.txt blocks crawling; `noindex` blocks indexing — they are different mechanisms and should not be applied together
- A page blocked in robots.txt cannot receive a `noindex` directive because crawlers never read the page
- Canonical tags pointing to a `noindex` page create an unresolvable conflict — canonicalise to an indexable URL instead
Rule Details
Indexability signals such as robots.txt, meta robots, X-Robots-Tag headers, and canonicals each control a different part of crawling and indexing. Google's robots-meta documentation (opens in new tab) is explicit that these signals do different jobs, so conflicts between them create the same ambiguity addressed in indexability checks.
Code Examples
❌ Avoid — robots.txt blocking a page that has noindex
# robots.txt
User-agent: *
Disallow: /private/ # Blocks crawling<!-- /private/page.html — never read because robots.txt blocked it -->
<meta name="robots" content="noindex">✅ Correct — use only noindex (remove robots.txt block)
# robots.txt — no Disallow for /private/
User-agent: *
Disallow: /admin/ # Only block what truly must not be crawled<!-- /private/page.html — crawler reads noindex correctly -->
<meta name="robots" content="noindex, follow">❌ Avoid — canonical pointing to a noindex page
<!-- /product?color=red — the canonical-url page -->
<link rel="canonical" href="/product">
<!-- /product — the canonical-url destination is noindex! -->
<meta name="robots" content="noindex">✅ Correct — canonical-url points to an indexable page
<!-- /product?color=red -->
<link rel="canonical" href="/product">
<!-- /product — indexable, no noindex -->
<!-- (meta robots either absent or set to "index, follow") -->✅ Consistent signal for pages to exclude
<!-- For pages you want excluded from search: -->
<!-- Option A: noindex only (preferred — lets Google read the tag) -->
<meta name="robots" content="noindex, follow">
<!-- Option B: robots.txt block only (if the page must not be fetched) -->
<!-- Use this for pages with sensitive data or high crawl cost -->Why It Matters
- Blocked + noindex = unresolvable: If robots.txt blocks a URL, the
noindextag on that page is never read. Google knows the URL but can neither confirm nor deny it should be excluded. - Canonical to noindex: Canonicalising to a
noindexpage tells Google "this is the preferred URL" while also saying "don't index it" — a contradiction. - Crawl budget waste: Conflicting signals cause Google to repeatedly attempt to resolve the conflict by re-crawling the page, which is why the robots.txt specification (opens in new tab) should be reviewed alongside page-level directives.
Common Conflict Types
| Conflict | Effect |
|---|---|
robots.txt blocks + noindex on page | noindex is never read |
noindex in meta + index in X-Robots-Tag | Most restrictive wins (noindex) |
Canonical → noindex page | Undefined behaviour; canonical-url may be ignored |
Sitemap includes noindex URLs | Conflicting inclusion/exclusion signals |
How to Audit
- Parse your robots.txt to extract all Disallow patterns.
- Crawl your site and, for each URL, check whether it matches a Disallow rule.
- For matching URLs, attempt to fetch the page (to simulate what happens before the block) and check for
noindexin the HTML orX-Robots-Tagheader. - Use Google Search Console's Coverage report to find URLs in "Blocked by robots.txt" that are also receiving "noindex" signals.
Exceptions
- Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
- Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
- When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.
Standards
- Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
- Check the implementation against Google Search Central: Robots meta tag, data-nosnippet, and X-Robots-Tag before treating the rule as satisfied.
- Check the implementation against Google Search Central: Robots.txt specification before treating the rule as satisfied.
Verification
Automated Checks
- Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
- Test the affected URL with Google Search Console or equivalent tooling where relevant.
- Re-crawl a representative page set after deployment.
Manual Checks
- Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
For each page, collect four signals: (1) Is the URL path blocked by robots.txt? (2) Does the page HTML contain `<meta name='robots' content='noindex'>`? (3) Does the HTTP response include an `X-Robots-Tag: noindex` header? (4) Does the page's `<link rel='canonical'>` point to a different URL? Flag: pages blocked in robots.txt that also have noindex directives, pages with canonical pointing to a noindex URL, and pages with conflicting index/noindex signals from meta and HTTP header.
Fix
Auto-fix issues
1. Identify all pages where robots.txt blocks crawling AND the page also has `noindex`: - If you want the page excluded from the index: remove the robots.txt rule; keep the `noindex` so crawlers can read it. - If you want to block all crawling: remove `noindex` (irrelevant if not crawled); keep the robots.txt block. 2. Identify canonical tags pointing to `noindex` pages: - The canonical-url destination must be an indexable page. - Change the canonical-url to point to an indexable URL, or remove `noindex` from the destination. 3. Identify pages with both `index` and `noindex` in meta robots (from different tags or sources): - Google uses the most restrictive directive; resolve to a single clear intent. 4. Verify after fixing using Google Search Console URL Inspection for each affected page.
Explain
Learn more
robots.txt is a crawl directive; `noindex` is an indexing directive. They operate at different stages of Google's pipeline. A page blocked in robots.txt is never fetched, so its `noindex` tag is never read — yet the URL is still known from sitemaps or links, keeping it in a crawl ambiguity state. Google's documentation explicitly warns against blocking pages in robots.txt that you also want to declare as `noindex`.
Review
Code review
Programmatically fetch robots.txt and parse its Disallow rules. For each page URL, determine if it matches a Disallow pattern. If yes, fetch the page HTML and check for `<meta name='robots'>` tags — flag if noindex is present. Also check the `X-Robots-Tag` HTTP response header for conflicts with the meta tag. Report the specific conflict type for each flagged URL.

