An XML sitemap tells search engines which pages to prioritise for crawling and indexing. A noindex directive tells them not to index a page. Google's sitemap guidance (opens in new tab) and page-level robots-meta controls should never point in opposite directions on the same URL.

Code Examples

<!-- sitemap.xml — says "please index this" -->
<urlset>
  <url>
    <loc>https://example.com/thank-you</loc>
  </url>
</urlset>

<!-- /thank-you page — says "do NOT index this" -->
<head>
  <meta name="robots" content="noindex, nofollow" />
</head>

Google will follow the noindex directive, but the URL still gets crawled — wasting crawl budget.

Why It Matters

Listing noindexed pages in your sitemap sends contradictory signals to Googlebot. It wastes crawl budget and usually shows up next to the same contradictions flagged in indexability-conflicts.

Decision Tree

Is this URL listed in the sitemap?
         ↓
Does it have a noindex directive?
         ↓ YES
Do you WANT it indexed?
   ↓ YES                    ↓ NO
Remove noindex          Remove from sitemap

What Belongs in a Sitemap

<!-- ✅ Good: Only indexable, canonical-url, 200 URLs -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/products/shoes</loc>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Exclude from sitemap:

Pages with noindex meta tag or X-Robots-Tag: noindex
Pages that return non-200 status codes (301, 404, 410)
Duplicate pages (use canonical-url on the original)
Paginated pages beyond page 1 (optional, depends on strategy)
Admin, login, checkout thank-you pages

Automated Detection

Use Screaming Frog (opens in new tab) or a similar crawler:

Crawl the site
Export the sitemap URL list
Filter for URLs with noindex in the meta robots column
Remove each match from the sitemap or remove the noindex directive

Next.js Sitemap Example

// app/sitemap.ts
import { MetadataRoute } from 'next'
import { getAllPages } from '@/lib/pages'
 
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const pages = await getAllPages()
 
  // Only include indexable pages
  return pages
    .filter(page => !page.noindex)
    .map(page => ({
      url: `https://example.com${page.path}`,
      lastModified: page.updatedAt,
    }))
}

Google prioritises noindex over sitemap

Per Google's documentation, if a URL appears in both the sitemap and has a noindex directive, Google will follow the noindex and not index the page. However, it will still crawl it — removing it from the sitemap reduces unnecessary crawl budget consumption.

Exceptions

Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Verification

Automated Checks

Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
Test the affected URL with Google Search Console or equivalent tooling where relevant.
Re-crawl a representative page set after deployment.

Manual Checks

Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.

Noindex in Sitemap

Code Examples

Why It Matters

Decision Tree

What Belongs in a Sitemap

Automated Detection

Next.js Sitemap Example

Exceptions

Verification

Automated Checks

Manual Checks

Use with AI

Sources

Further Reading

Was this rule helpful?

Rule Details

Code Examples

Why It Matters

Decision Tree

What Belongs in a Sitemap

Automated Detection

Next.js Sitemap Example

Exceptions

Verification

Automated Checks

Manual Checks

Use with AI

Sources

Further Reading

Related rules

Was this rule helpful?