Schema + Noindex Conflict
Detects pages that carry rich result schema markup but are blocked from indexing via noindex or robots.txt.
- Rich result schema (Review, Product, FAQ, etc.) has no effect on pages Google cannot index
- A page with `noindex` will not earn rich results even if it has valid structured data
- Pages blocked in robots.txt are never fetched, so their schema is never processed
- Audit all pages with schema markup to confirm they are crawlable and indexable
Rule Details
Structured data markup only influences search appearance on pages that Google can crawl and index. A noindex directive or a robots.txt block renders schema markup ineffective, which is why schema work should be reviewed together with indexability before investing more effort in validation details.
Code Example
<!-- Page: /product/headphones-pro -->
<head>
<meta name="robots" content="noindex"> <!-- ← Prevents indexing -->
</head>
<script type="application/ld+json">
{
"@type": "Product",
"name": "Headphones Pro",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.8",
"reviewCount": "203"
}
}
</script>
<!-- ↑ AggregateRating schema has zero effect — page is noindexed -->Why It Matters
Investing in rich result schema on pages that are blocked from indexing wastes development effort. Google's structured data policies (opens in new tab) make it clear that rich results depend on pages being eligible for indexing in the first place.
Three Types of Conflict
1. Meta Robots Noindex + Schema
<meta name="robots" content="noindex">
<script type="application/ld+json">{ "@type": "Recipe", ... }</script>Google fetches and processes the page but does not index it. Schema markup is parsed but no rich result is generated.
2. robots.txt Block + Schema
# robots.txt
User-agent: *
Disallow: /products/Google never fetches /products/ URLs, so schema inside those pages is never seen at all.
3. Non-Self Canonical + Schema
<!-- /products/headphones?color=red -->
<link rel="canonical" href="https://example.com/products/headphones">
<script type="application/ld+json">{ "@type": "Product", ... }</script>Google treats this page as a duplicate. Rich results are attributed to the canonical URL, which must have its own valid schema.
✅ Correct Pattern
<!-- Page is indexable: no noindex, not blocked, self-canonical-url -->
<head>
<link rel="canonical" href="https://example.com/products/headphones">
<!-- No noindex meta tag -->
</head>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Headphones",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "150"
}
}
</script>Audit Approach
- Crawl the site and collect all pages with JSON-LD or Microdata schema
- For each schema page, check
meta[name=robots]fornoindex - Check whether the URL path matches any
robots.txtDisallow rules - Check the
link[rel=canonical]href matches the current URL - Flag any page where schema exists but one of the above conditions blocks indexing
Exceptions
- Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
- Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
- When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.
Standards
- Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
- Check the implementation against Google: Structured data general guidelines before treating the rule as satisfied.
- Check the implementation against Google: robots meta tag before treating the rule as satisfied.
Verification
Automated Checks
- Google Search Console → Rich Results: schema-carrying pages should appear here if indexed
- Google Rich Results Test (opens in new tab): run it on the live URL. If the tool reports "not indexable," no rich result will be generated
Manual Checks
- Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
For every page containing a `<script type='application/ld+json'>` block with a rich result schema type, check: (1) Is the page blocked by robots.txt? (2) Does `<meta name='robots'>` contain `noindex`? (3) Does the canonical tag point to a different URL? Flag any conflicts.
Fix
Auto-fix issues
For pages where rich results are desired: remove the `noindex` directive, unblock the URL in robots.txt, and ensure the canonical tag is self-referencing. If the page must remain noindexed, remove the schema markup — it serves no purpose.
Explain
Learn more
Explain why Google does not process structured data on pages it cannot index, how to identify schema+noindex conflicts in a large site, and how to prioritize which pages need indexing to unlock rich results.
Review
Code review
Review metadata generation, rendered HTML, structured data, and response headers related to Schema + Noindex Conflict. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.