Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOHigh

Keep XML sitemaps valid

Validates sitemap XML structure against the sitemaps.org protocol, URL limits, and encoding requirements.

Utilities
Quick take
Typical fix time 10 min
  • The root element must be `<urlset>` with the correct `xmlns` namespace
  • Each `<url>` must contain exactly one `<loc>` element with a fully qualified, percent-encoded URL
  • Maximum 50,000 URLs and 50 MB per sitemap file; use a sitemap index for larger sites
  • Encode XML special characters: `&` → `&amp;`, `<` → `&lt;`, `>` → `&gt;`, `"` → `&quot;`, `'` → `&apos;`
Why it matters: An invalid or malformed sitemap is silently ignored by search engines, leaving newly published or orphaned pages undiscovered by crawlers.

Rule Details

A valid XML sitemap must conform to the sitemaps.org protocol. Invalid sitemaps cause submission errors in Google Search Console and prevent crawlers from processing the included URLs.

Code Example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/page/</loc>
    <lastmod>2025-03-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Why It Matters

An invalid or malformed sitemap is silently ignored by search engines, leaving newly published or orphaned pages undiscovered by crawlers.

Element Reference

ElementRequiredValues
<loc>YesAbsolute URL, max 2048 chars
<lastmod>NoW3C Datetime format (e.g., 2025-03-01)
<changefreq>Noalways, hourly, daily, weekly, monthly, yearly, never
<priority>No0.0 to 1.0 (default: 0.5)

Note: Google ignores changefreq and priority — include lastmod as it is used for crawl scheduling.

Common Validation Errors

❌ Missing or wrong namespace

<!-- Wrong -->
<urlset>
 
<!-- Correct -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

❌ Unencoded ampersand in URL

<!-- Wrong — XML parse error -->
<loc>https://example.com/search?q=shoes&color=red</loc>
 
<!-- Correct -->
<loc>https://example.com/search?q=shoes&amp;color=red</loc>

❌ Relative URLs

<!-- Wrong -->
<loc>/about</loc>
 
<!-- Correct -->
<loc>https://www.example.com/about</loc>

❌ Exceeding URL limits

A sitemap with more than 50,000 URLs should be split into multiple files referenced by a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/sitemap-1.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-2.xml</loc>
  </sitemap>
</sitemapindex>

Validation Tools

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Verification

Automated Checks

  • Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
  • Test the affected URL with Google Search Console or equivalent tooling where relevant.
  • Re-crawl a representative page set after deployment.

Manual Checks

  • Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Fetch the sitemap and validate it against the sitemaps.org schema. Check that the `xmlns` attribute is `http://www.sitemaps.org/schemas/sitemap/0.9`, all `<loc>` values are absolute URLs, no file exceeds 50,000 URLs or 50 MB, and special characters are properly XML-encoded.

Fix

Auto-fix issues

Re-generate the sitemap using a validated sitemap library. Encode all special characters in URLs (`&` → `&amp;`). Split oversized sitemaps into multiple files and reference them from a sitemap index. Resubmit to Google Search Console.

Explain

Learn more

Explain the sitemaps.org XML schema requirements, what causes validation errors in Google Search Console, and how encoding errors in `<loc>` URLs prevent crawlers from fetching those pages.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to Keep XML sitemaps valid. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

Keep sitemap URLs on the correct domain

Checks that all URLs in the sitemap belong to the same domain and protocol as the sitemap itself.

SEO
Create and submit an XML sitemap

An XML sitemap is available at /sitemap.xml and includes all important pages.

SEO
URL Special Characters

Checks for problematic special characters in URL paths that can cause crawling, parsing, or canonicalization issues.

SEO
Noindex in Sitemap

Checks for noindexed pages listed in sitemap

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385