Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOMedium

Include indexable pages in your sitemap

Checks for canonical-url, indexable pages that are missing from the XML sitemap.

Utilities
Quick take
Typical fix time 10 min
  • Every canonical-url, indexable page should appear in the sitemap
  • Pages with `noindex` or non-self canonical tags must NOT be included in the sitemap
  • Paginated pages, filtered variants, and faceted URLs are usually excluded unless they have unique content
  • Use Google Search Console Coverage report to find indexable pages not in the sitemap
Why it matters: Pages absent from the sitemap rely entirely on crawl discovery via links, which can delay indexing of new content, especially on large sites or pages with few inbound links.

Rule Details

Sitemap coverage measures how well your sitemap represents the pages you want indexed. Google's sitemap best practices (opens in new tab) and your noindex-in-sitemap policy should align so search engines are not forced to guess which URLs matter.

Code Examples

<!-- sitemap.xml includes a noindex page — wrong -->
<url>
  <loc>https://example.com/thank-you</loc>
</url>
<!-- /thank-you carries noindex — contradicts sitemap inclusion -->
<meta name="robots" content="noindex">

Including a noindex page in the sitemap creates conflicting signals. Google's best practice is: sitemaps should only list pages you want indexed.

Why It Matters

Pages absent from the sitemap rely entirely on crawl discovery via links, which can delay indexing of new content, especially on large sites or pages with few inbound links. Google Search Console (opens in new tab) is usually the quickest place to confirm whether the delay is sitemap coverage or broader crawl issues.

Include in Sitemap

  • ✅ Canonical pages returning HTTP 200
  • ✅ Pages with <meta name="robots" content="index, follow"> (or no robots tag)
  • ✅ Pages with self-referencing canonical tags (<link rel="canonical" href="[same URL]">)
  • ✅ New pages published in the last 7 days (high-priority for timely indexing)

Exclude from Sitemap

  • ❌ Pages with <meta name="robots" content="noindex">
  • ❌ Pages blocked by robots.txt (they cannot be crawled anyway)
  • ❌ Pages with canonical tags pointing to a different URL
  • ❌ Redirect pages (3XX responses)
  • ❌ Paginated subpages (e.g., /category/?page=2) unless they have unique, indexable content
  • ❌ Faceted/filtered URLs that duplicate canonical-url category pages
  • ❌ Login, checkout, and other private pages

✅ Automated Coverage

Generate sitemaps from your CMS or database by querying only published, indexable content:

// Next.js sitemap.ts
export default async function sitemap() {
  const posts = await fetchPublishedPosts() // Only published posts
  return posts.map(post => ({
    url: `https://example.com/blog/${post.slug}`,
    lastModified: post.updatedAt,
  }))
}

Finding Gaps

  1. Google Search Console → Coverage: Pages marked "Discovered – currently not indexed" may need sitemap entry
  2. Crawl your site: Compare all crawled 200-OK pages against sitemap URLs
  3. Log analysis: Check server logs for URLs Googlebot is visiting that are not in your sitemap

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against Google: Sitemap best practices before treating the rule as satisfied.
  • Check the implementation against Google: Build and submit a sitemap before treating the rule as satisfied.

Verification

Automated Checks

  • Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
  • Test the affected URL with Google Search Console or equivalent tooling where relevant.
  • Re-crawl a representative page set after deployment.

Manual Checks

  • Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Compare the list of all canonical-url, indexable URLs on the site against the URLs in the XML sitemap. Flag any page that returns HTTP 200, has no `noindex` directive, has a self-referencing canonical, but is absent from the sitemap.

Fix

Auto-fix issues

Add missing indexable pages to the sitemap. Remove pages that carry `noindex`, redirect, or non-self canonical tags from the sitemap. Automate sitemap generation so newly published content is included immediately.

Explain

Learn more

Explain how missing sitemap coverage slows crawl discovery, which types of pages should and should not be included, and how to use Google Search Console to identify gaps.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to Include indexable pages in your sitemap. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

4XX Pages in Sitemap

Checks for sitemap URLs that return 4XX HTTP status codes, indicating broken or removed pages.

SEO
Noindex in Sitemap

Checks for noindexed pages listed in sitemap

SEO
Keep sitemap URLs on the correct domain

Checks that all URLs in the sitemap belong to the same domain and protocol as the sitemap itself.

SEO
Create and submit an XML sitemap

An XML sitemap is available at /sitemap.xml and includes all important pages.

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385