- Never include noindexed pages in your XML sitemap
- The sitemap and noindex directive send contradictory signals to crawlers
- Sitemaps should only list canonical-url, indexable, 200-status URLs
- Remove noindexed URLs from sitemap or remove the noindex directive—pick one
Rule Details
An XML sitemap tells search engines which pages to prioritise for crawling and indexing. A noindex directive tells them not to index a page. Google's sitemap guidance (opens in new tab) and page-level robots-meta controls should never point in opposite directions on the same URL.
Code Examples
<!-- sitemap.xml — says "please index this" -->
<urlset>
<url>
<loc>https://example.com/thank-you</loc>
</url>
</urlset><!-- /thank-you page — says "do NOT index this" -->
<head>
<meta name="robots" content="noindex, nofollow" />
</head>Google will follow the noindex directive, but the URL still gets crawled — wasting crawl budget.
Why It Matters
Listing noindexed pages in your sitemap sends contradictory signals to Googlebot. It wastes crawl budget and usually shows up next to the same contradictions flagged in indexability-conflicts.
Decision Tree
Is this URL listed in the sitemap?
↓
Does it have a noindex directive?
↓ YES
Do you WANT it indexed?
↓ YES ↓ NO
Remove noindex Remove from sitemapWhat Belongs in a Sitemap
<!-- ✅ Good: Only indexable, canonical-url, 200 URLs -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/products/shoes</loc>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>Exclude from sitemap:
- Pages with
noindexmeta tag orX-Robots-Tag: noindex - Pages that return non-200 status codes (301, 404, 410)
- Duplicate pages (use canonical-url on the original)
- Paginated pages beyond page 1 (optional, depends on strategy)
- Admin, login, checkout thank-you pages
Automated Detection
Use Screaming Frog (opens in new tab) or a similar crawler:
- Crawl the site
- Export the sitemap URL list
- Filter for URLs with
noindexin the meta robots column - Remove each match from the sitemap or remove the noindex directive
Next.js Sitemap Example
// app/sitemap.ts
import { MetadataRoute } from 'next'
import { getAllPages } from '@/lib/pages'
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const pages = await getAllPages()
// Only include indexable pages
return pages
.filter(page => !page.noindex)
.map(page => ({
url: `https://example.com${page.path}`,
lastModified: page.updatedAt,
}))
}Per Google's documentation, if a URL appears in both the sitemap and has a noindex directive, Google will follow the noindex and not index the page. However, it will still crawl it — removing it from the sitemap reduces unnecessary crawl budget consumption.
Exceptions
- Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
- Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
- When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.
Verification
Automated Checks
- Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
- Test the affected URL with Google Search Console or equivalent tooling where relevant.
- Re-crawl a representative page set after deployment.
Manual Checks
- Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
Fetch the XML sitemap(s) and check each listed URL. For each URL, retrieve the page and check for <meta name='robots' content='noindex'> in the <head> or an X-Robots-Tag: noindex HTTP header. Report any URLs present in the sitemap that also carry a noindex directive.
Fix
Auto-fix issues
For each URL that has a noindex directive AND appears in the sitemap: decide whether the page should be indexed. If yes — remove the noindex directive. If no — remove the URL from the sitemap. Never leave both in place.
Explain
Learn more
The XML sitemap is a recommendation to search engines: 'please crawl and index these pages'. A noindex directive is an instruction: 'do not index this page'. Including noindexed pages in the sitemap creates a contradiction—Google will resolve it by following noindex, but the URL still gets crawled, wasting crawl budget.
Review
Code review
Fetch each URL in the XML sitemap. For each URL, check the HTTP response for X-Robots-Tag: noindex header, and check the rendered <head> for <meta name='robots' content='noindex'>. Report any URL that appears in the sitemap AND carries a noindex directive. Also check for sitemap entries returning non-200 status codes.
