4XX Pages in Sitemap
Checks for sitemap URLs that return 4XX HTTP status codes, indicating broken or removed pages.
- Sitemaps should only contain URLs that return HTTP 200 and are indexable
- 4XX URLs in a sitemap waste crawl budget and cause Search Console errors
- Remove 404 pages from the sitemap or set up proper 301 redirects before adding them
- Monitor the sitemap regularly — deleted content must be removed from sitemaps promptly
Rule Details
A sitemap is a promise to search engines that the listed URLs are live, canonical-url, and indexable. Google's sitemap best practices (opens in new tab) and your broader sitemap-coverage checks depend on that promise staying true.
Code Example
<!-- sitemap.xml still references deleted blog posts -->
<url><loc>https://example.com/blog/old-post-deleted</loc></url>
<url><loc>https://example.com/blog/moved-to-new-section</loc></url>These URLs return 404, but the sitemap was not updated after the posts were deleted.
Why It Matters
Including 4XX URLs in a sitemap signals poor site maintenance to Google, wastes crawl budget on non-existent pages, and generates errors in Google Search Console (opens in new tab) that can mask real indexing issues.
Why 4XX URLs Are Harmful
- Crawl budget waste: Googlebot spends time fetching dead URLs instead of discovering new content
- Search Console errors: 4XX URLs appear as errors in the Coverage report, obscuring genuine issues
- Trust signal: A sitemap with many broken URLs signals poor site quality
Correct Status Code Expectations
| URL Condition | Expected HTTP Status | Sitemap Action |
|---|---|---|
| Page exists and is indexable | 200 OK | Include |
| Page permanently moved | 301 Moved Permanently | Update <loc> to new URL |
| Page deleted | 410 Gone or 404 Not Found | Remove from sitemap |
| Page temporarily unavailable | 503 Service Unavailable | Keep but fix quickly |
✅ After a Site Migration
- Map all old URLs to new URLs
- Implement 301 redirects from old to new
- Update sitemap to use only the new URLs
- Resubmit the sitemap in Google Search Console
Automation
For dynamic sites, generate sitemaps programmatically from your database of live content rather than a static file. This ensures the sitemap always reflects real page existence.
// Example: Only include published, non-deleted pages
const urls = await db.pages.findMany({
where: { published: true, deletedAt: null }
})Exceptions
- Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
- Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
- When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.
Standards
- Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
- Check the implementation against Google: Sitemap best practices before treating the rule as satisfied.
- Check the implementation against Google: HTTP status codes and SEO before treating the rule as satisfied.
Verification
Automated Checks
- Google Search Console → Sitemaps → view submitted sitemap details
- Search Console → Coverage → filter by "Submitted in sitemap"
- Use a crawl tool (Screaming Frog, sitebulb) to fetch all sitemap URLs and report status codes
Manual Checks
- Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
Fetch all URLs listed in the sitemap and record their HTTP status codes. Flag any URL returning 4XX (404 Not Found, 410 Gone, 403 Forbidden). Cross-reference against Google Search Console → Coverage for corroborating data.
Fix
Auto-fix issues
For each 4XX URL: if the content moved, set up a 301 redirect to the new URL and add the new URL to the sitemap. If the content is permanently gone, return 410 Gone and remove the URL from the sitemap. Regenerate and resubmit the sitemap.
Explain
Learn more
Explain why sitemaps must only contain live, indexable URLs, how 4XX URLs in sitemaps affect crawl budget allocation, and the difference between 404 and 410 status codes for deindexing.
Review
Code review
Review metadata generation, rendered HTML, structured data, and response headers related to 4XX Pages in Sitemap. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.