Keep XML sitemaps valid
Validates sitemap XML structure against the sitemaps.org protocol, URL limits, and encoding requirements.
- The root element must be `<urlset>` with the correct `xmlns` namespace
- Each `<url>` must contain exactly one `<loc>` element with a fully qualified, percent-encoded URL
- Maximum 50,000 URLs and 50 MB per sitemap file; use a sitemap index for larger sites
- Encode XML special characters: `&` → `&`, `<` → `<`, `>` → `>`, `"` → `"`, `'` → `'`
Rule Details
A valid XML sitemap must conform to the sitemaps.org protocol. Invalid sitemaps cause submission errors in Google Search Console and prevent crawlers from processing the included URLs.
Code Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com/page/</loc>
<lastmod>2025-03-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>Why It Matters
An invalid or malformed sitemap is silently ignored by search engines, leaving newly published or orphaned pages undiscovered by crawlers.
Element Reference
| Element | Required | Values |
|---|---|---|
<loc> | Yes | Absolute URL, max 2048 chars |
<lastmod> | No | W3C Datetime format (e.g., 2025-03-01) |
<changefreq> | No | always, hourly, daily, weekly, monthly, yearly, never |
<priority> | No | 0.0 to 1.0 (default: 0.5) |
Note: Google ignores changefreq and priority — include lastmod as it is used for crawl scheduling.
Common Validation Errors
❌ Missing or wrong namespace
<!-- Wrong -->
<urlset>
<!-- Correct -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">❌ Unencoded ampersand in URL
<!-- Wrong — XML parse error -->
<loc>https://example.com/search?q=shoes&color=red</loc>
<!-- Correct -->
<loc>https://example.com/search?q=shoes&color=red</loc>❌ Relative URLs
<!-- Wrong -->
<loc>/about</loc>
<!-- Correct -->
<loc>https://www.example.com/about</loc>❌ Exceeding URL limits
A sitemap with more than 50,000 URLs should be split into multiple files referenced by a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap-1.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemap-2.xml</loc>
</sitemap>
</sitemapindex>Validation Tools
- Google Search Console (opens in new tab) → Sitemaps (shows processing errors)
- XML Sitemap Validator (opens in new tab)
xmllint --noout sitemap.xmlvia command line
Exceptions
- Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
- Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
- When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.
Verification
Automated Checks
- Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
- Test the affected URL with Google Search Console or equivalent tooling where relevant.
- Re-crawl a representative page set after deployment.
Manual Checks
- Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
Fetch the sitemap and validate it against the sitemaps.org schema. Check that the `xmlns` attribute is `http://www.sitemaps.org/schemas/sitemap/0.9`, all `<loc>` values are absolute URLs, no file exceeds 50,000 URLs or 50 MB, and special characters are properly XML-encoded.
Fix
Auto-fix issues
Re-generate the sitemap using a validated sitemap library. Encode all special characters in URLs (`&` → `&`). Split oversized sitemaps into multiple files and reference them from a sitemap index. Resubmit to Google Search Console.
Explain
Learn more
Explain the sitemaps.org XML schema requirements, what causes validation errors in Google Search Console, and how encoding errors in `<loc>` URLs prevent crawlers from fetching those pages.
Review
Code review
Review metadata generation, rendered HTML, structured data, and response headers related to Keep XML sitemaps valid. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.