Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOHigh

Schema + Noindex Conflict

Detects pages that carry rich result schema markup but are blocked from indexing via noindex or robots.txt.

Utilities
Quick take
Typical fix time 10 min
  • Rich result schema (Review, Product, FAQ, etc.) has no effect on pages Google cannot index
  • A page with `noindex` will not earn rich results even if it has valid structured data
  • Pages blocked in robots.txt are never fetched, so their schema is never processed
  • Audit all pages with schema markup to confirm they are crawlable and indexable
Why it matters: Investing in rich result schema on pages that are blocked from indexing wastes development effort — Google explicitly states it does not process structured data on noindexed pages.

Rule Details

Structured data markup only influences search appearance on pages that Google can crawl and index. A noindex directive or a robots.txt block renders schema markup ineffective, which is why schema work should be reviewed together with indexability before investing more effort in validation details.

Code Example

<!-- Page: /product/headphones-pro -->
<head>
  <meta name="robots" content="noindex">  <!-- ← Prevents indexing -->
</head>
 
<script type="application/ld+json">
{
  "@type": "Product",
  "name": "Headphones Pro",
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.8",
    "reviewCount": "203"
  }
}
</script>
<!-- ↑ AggregateRating schema has zero effect — page is noindexed -->

Why It Matters

Investing in rich result schema on pages that are blocked from indexing wastes development effort. Google's structured data policies (opens in new tab) make it clear that rich results depend on pages being eligible for indexing in the first place.

Three Types of Conflict

1. Meta Robots Noindex + Schema

<meta name="robots" content="noindex">
<script type="application/ld+json">{ "@type": "Recipe", ... }</script>

Google fetches and processes the page but does not index it. Schema markup is parsed but no rich result is generated.

2. robots.txt Block + Schema

# robots.txt
User-agent: *
Disallow: /products/

Google never fetches /products/ URLs, so schema inside those pages is never seen at all.

3. Non-Self Canonical + Schema

<!-- /products/headphones?color=red -->
<link rel="canonical" href="https://example.com/products/headphones">
<script type="application/ld+json">{ "@type": "Product", ... }</script>

Google treats this page as a duplicate. Rich results are attributed to the canonical URL, which must have its own valid schema.

✅ Correct Pattern

<!-- Page is indexable: no noindex, not blocked, self-canonical-url -->
<head>
  <link rel="canonical" href="https://example.com/products/headphones">
  <!-- No noindex meta tag -->
</head>
 
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Headphones",
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.7",
    "reviewCount": "150"
  }
}
</script>

Audit Approach

  1. Crawl the site and collect all pages with JSON-LD or Microdata schema
  2. For each schema page, check meta[name=robots] for noindex
  3. Check whether the URL path matches any robots.txt Disallow rules
  4. Check the link[rel=canonical] href matches the current URL
  5. Flag any page where schema exists but one of the above conditions blocks indexing

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against Google: Structured data general guidelines before treating the rule as satisfied.
  • Check the implementation against Google: robots meta tag before treating the rule as satisfied.

Verification

Automated Checks

  • Google Search Console → Rich Results: schema-carrying pages should appear here if indexed
  • Google Rich Results Test (opens in new tab): run it on the live URL. If the tool reports "not indexable," no rich result will be generated

Manual Checks

  • Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

For every page containing a `<script type='application/ld+json'>` block with a rich result schema type, check: (1) Is the page blocked by robots.txt? (2) Does `<meta name='robots'>` contain `noindex`? (3) Does the canonical tag point to a different URL? Flag any conflicts.

Fix

Auto-fix issues

For pages where rich results are desired: remove the `noindex` directive, unblock the URL in robots.txt, and ensure the canonical tag is self-referencing. If the page must remain noindexed, remove the schema markup — it serves no purpose.

Explain

Learn more

Explain why Google does not process structured data on pages it cannot index, how to identify schema+noindex conflicts in a large site, and how to prioritize which pages need indexing to unlock rich results.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to Schema + Noindex Conflict. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

Robots Meta Conflict

Detects pages blocked by robots.txt that also carry noindex meta tags, creating a paradox where the directive is never read.

SEO
Avoid conflicting indexability signals

Detects conflicting signals between robots.txt, meta robots, X-Robots-Tag headers, and canonical tags

SEO
Make important pages indexable

Identifies important pages blocked from search engine indexing by noindex, robots.txt, or other directives

SEO
Add structured data markup

Schema.org structured data (JSON-LD) is implemented for rich search results.

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385