Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOHigh

Noindex in Sitemap

Checks for noindexed pages listed in sitemap

Utilities
Quick take
Typical fix time 15 min
  • Never include noindexed pages in your XML sitemap
  • The sitemap and noindex directive send contradictory signals to crawlers
  • Sitemaps should only list canonical-url, indexable, 200-status URLs
  • Remove noindexed URLs from sitemap or remove the noindex directive—pick one
Why it matters: Listing noindexed pages in your sitemap sends contradictory signals to Googlebot—it wastes crawl budget and may confuse Google into ignoring the noindex directive or spending crawl time on pages you don't want indexed.

Rule Details

An XML sitemap tells search engines which pages to prioritise for crawling and indexing. A noindex directive tells them not to index a page. Google's sitemap guidance (opens in new tab) and page-level robots-meta controls should never point in opposite directions on the same URL.

Code Examples

<!-- sitemap.xml — says "please index this" -->
<urlset>
  <url>
    <loc>https://example.com/thank-you</loc>
  </url>
</urlset>
<!-- /thank-you page — says "do NOT index this" -->
<head>
  <meta name="robots" content="noindex, nofollow" />
</head>

Google will follow the noindex directive, but the URL still gets crawled — wasting crawl budget.

Why It Matters

Listing noindexed pages in your sitemap sends contradictory signals to Googlebot. It wastes crawl budget and usually shows up next to the same contradictions flagged in indexability-conflicts.

Decision Tree

Is this URL listed in the sitemap?

Does it have a noindex directive?
         ↓ YES
Do you WANT it indexed?
   ↓ YES                    ↓ NO
Remove noindex          Remove from sitemap

What Belongs in a Sitemap

<!-- ✅ Good: Only indexable, canonical-url, 200 URLs -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/products/shoes</loc>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Exclude from sitemap:

  • Pages with noindex meta tag or X-Robots-Tag: noindex
  • Pages that return non-200 status codes (301, 404, 410)
  • Duplicate pages (use canonical-url on the original)
  • Paginated pages beyond page 1 (optional, depends on strategy)
  • Admin, login, checkout thank-you pages

Automated Detection

Use Screaming Frog (opens in new tab) or a similar crawler:

  1. Crawl the site
  2. Export the sitemap URL list
  3. Filter for URLs with noindex in the meta robots column
  4. Remove each match from the sitemap or remove the noindex directive

Next.js Sitemap Example

// app/sitemap.ts
import { MetadataRoute } from 'next'
import { getAllPages } from '@/lib/pages'
 
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const pages = await getAllPages()
 
  // Only include indexable pages
  return pages
    .filter(page => !page.noindex)
    .map(page => ({
      url: `https://example.com${page.path}`,
      lastModified: page.updatedAt,
    }))
}
Google prioritises noindex over sitemap

Per Google's documentation, if a URL appears in both the sitemap and has a noindex directive, Google will follow the noindex and not index the page. However, it will still crawl it — removing it from the sitemap reduces unnecessary crawl budget consumption.

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Verification

Automated Checks

  • Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
  • Test the affected URL with Google Search Console or equivalent tooling where relevant.
  • Re-crawl a representative page set after deployment.

Manual Checks

  • Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Fetch the XML sitemap(s) and check each listed URL. For each URL, retrieve the page and check for <meta name='robots' content='noindex'> in the <head> or an X-Robots-Tag: noindex HTTP header. Report any URLs present in the sitemap that also carry a noindex directive.

Fix

Auto-fix issues

For each URL that has a noindex directive AND appears in the sitemap: decide whether the page should be indexed. If yes — remove the noindex directive. If no — remove the URL from the sitemap. Never leave both in place.

Explain

Learn more

The XML sitemap is a recommendation to search engines: 'please crawl and index these pages'. A noindex directive is an instruction: 'do not index this page'. Including noindexed pages in the sitemap creates a contradiction—Google will resolve it by following noindex, but the URL still gets crawled, wasting crawl budget.

Review

Code review

Fetch each URL in the XML sitemap. For each URL, check the HTTP response for X-Robots-Tag: noindex header, and check the rendered <head> for <meta name='robots' content='noindex'>. Report any URL that appears in the sitemap AND carries a noindex directive. Also check for sitemap entries returning non-200 status codes.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comGuide
Screaming Frog SEO Spider Website Crawler

The industry leading website crawler for Windows, macOS and Ubuntu, trusted by thousands of SEOs and agencies worldwide for technical SEO site audits.

Screaming FrogGuide

Rules that often go hand-in-hand with this one.

Set robots meta directives correctly

Checks robots meta tag for valid indexing directives in the page head.

SEO
Create and submit an XML sitemap

An XML sitemap is available at /sitemap.xml and includes all important pages.

SEO
Set canonical URLs for all pages

A canonical URL tag is present to prevent duplicate content issues.

SEO
Avoid conflicting indexability signals

Detects conflicting signals between robots.txt, meta robots, X-Robots-Tag headers, and canonical tags

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385