Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOHigh

4XX Pages in Sitemap

Checks for sitemap URLs that return 4XX HTTP status codes, indicating broken or removed pages.

Utilities
Quick take
Typical fix time 10 min
  • Sitemaps should only contain URLs that return HTTP 200 and are indexable
  • 4XX URLs in a sitemap waste crawl budget and cause Search Console errors
  • Remove 404 pages from the sitemap or set up proper 301 redirects before adding them
  • Monitor the sitemap regularly — deleted content must be removed from sitemaps promptly
Why it matters: Including 4XX URLs in a sitemap signals poor site maintenance to Google, wastes crawl budget on non-existent pages, and generates errors in Search Console that can mask real indexing issues.

Rule Details

A sitemap is a promise to search engines that the listed URLs are live, canonical-url, and indexable. Google's sitemap best practices (opens in new tab) and your broader sitemap-coverage checks depend on that promise staying true.

Code Example

<!-- sitemap.xml still references deleted blog posts -->
<url><loc>https://example.com/blog/old-post-deleted</loc></url>
<url><loc>https://example.com/blog/moved-to-new-section</loc></url>

These URLs return 404, but the sitemap was not updated after the posts were deleted.

Why It Matters

Including 4XX URLs in a sitemap signals poor site maintenance to Google, wastes crawl budget on non-existent pages, and generates errors in Google Search Console (opens in new tab) that can mask real indexing issues.

Why 4XX URLs Are Harmful

  • Crawl budget waste: Googlebot spends time fetching dead URLs instead of discovering new content
  • Search Console errors: 4XX URLs appear as errors in the Coverage report, obscuring genuine issues
  • Trust signal: A sitemap with many broken URLs signals poor site quality

Correct Status Code Expectations

URL ConditionExpected HTTP StatusSitemap Action
Page exists and is indexable200 OKInclude
Page permanently moved301 Moved PermanentlyUpdate <loc> to new URL
Page deleted410 Gone or 404 Not FoundRemove from sitemap
Page temporarily unavailable503 Service UnavailableKeep but fix quickly

✅ After a Site Migration

  1. Map all old URLs to new URLs
  2. Implement 301 redirects from old to new
  3. Update sitemap to use only the new URLs
  4. Resubmit the sitemap in Google Search Console

Automation

For dynamic sites, generate sitemaps programmatically from your database of live content rather than a static file. This ensures the sitemap always reflects real page existence.

// Example: Only include published, non-deleted pages
const urls = await db.pages.findMany({
  where: { published: true, deletedAt: null }
})

Exceptions

  • Staging, utility, login, account, or internal search pages may intentionally use different crawl or index signals if they are not meant to rank.
  • Temporary migration states can produce noisy intermediate signals; flag the live production URL pattern, not one-off transition artifacts.
  • When redirects, canonicals, robots directives, or indexability signals conflict, fix the strongest final signal first instead of reporting every downstream symptom as a separate blocker.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against Google: Sitemap best practices before treating the rule as satisfied.
  • Check the implementation against Google: HTTP status codes and SEO before treating the rule as satisfied.

Verification

Automated Checks

  • Google Search Console → Sitemaps → view submitted sitemap details
  • Search Console → Coverage → filter by "Submitted in sitemap"
  • Use a crawl tool (Screaming Frog, sitebulb) to fetch all sitemap URLs and report status codes

Manual Checks

  • Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Fetch all URLs listed in the sitemap and record their HTTP status codes. Flag any URL returning 4XX (404 Not Found, 410 Gone, 403 Forbidden). Cross-reference against Google Search Console → Coverage for corroborating data.

Fix

Auto-fix issues

For each 4XX URL: if the content moved, set up a 301 redirect to the new URL and add the new URL to the sitemap. If the content is permanently gone, return 410 Gone and remove the URL from the sitemap. Regenerate and resubmit the sitemap.

Explain

Learn more

Explain why sitemaps must only contain live, indexable URLs, how 4XX URLs in sitemaps affect crawl budget allocation, and the difference between 404 and 410 status codes for deindexing.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to 4XX Pages in Sitemap. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

Use trailing slashes consistently

Checks for consistent trailing slash usage across all URLs to avoid duplicate content and canonicalization issues.

SEO
Include indexable pages in your sitemap

Checks for canonical-url, indexable pages that are missing from the XML sitemap.

SEO
Create and submit an XML sitemap

An XML sitemap is available at /sitemap.xml and includes all important pages.

SEO
Resolve internal broken links

Detects and fixes internal links that return 404 or 5xx errors to improve user experience.

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385