Skip to main content
Beta: Front-End Checklist is currently in beta. Some issues are still being fixed. Thanks for your patience.
SEOMedium

URL Special Characters

Checks for problematic special characters in URL paths that can cause crawling, parsing, or canonicalization issues.

Utilities
Quick take
Typical fix time 10 min
  • URL paths should contain only unreserved characters: A-Z, a-z, 0-9, `-`, `_`, `.`, `~`
  • Spaces must be encoded as `%20` (not `+`, which is query-string syntax)
  • Characters like `#`, `?`, `&`, `=` have reserved meaning in URLs and must be percent-encoded in path segments
  • Non-ASCII characters (accented letters, CJK) should be percent-encoded in URLs, though modern browsers decode them for display
Why it matters: Special characters in URL paths cause inconsistent crawling, broken links when shared, and canonicalization failures when different systems encode them differently.

Rule Details

URL characters fall into two categories in RFC 3986 (opens in new tab): unreserved characters are safe as-is, while reserved characters have special meaning and must be encoded when used literally in path segments. This rule sits right next to clean URL structure and slug-formatting hygiene.

Code Example

A-Z  a-z  0-9  -  _  .  ~

Why It Matters

Special characters in URL paths cause inconsistent crawling, broken links when shared, and canonicalization failures when different systems encode them differently. MDN's percent-encoding reference (opens in new tab) is the practical baseline for understanding those failures.

Reserved Characters (Special Meaning)

CharacterReserved MeaningEncoded Form
?Start of query string%3F
#Fragment identifier%23
&Query parameter separator%26
=Key-value separator%3D
+Space (in query strings only)%2B in paths
/Path segment separator%2F if literal
%Encoding prefix%25

❌ Problematic URL Examples

/blog/my post title               # Space — breaks parsers
/products/shoes#best              # Fragment lost during crawling
/search?q=shoes&page=1/results    # Query chars in path
/café-parisien                    # Non-ASCII — encoding inconsistency
/products/C++ tips                # Special chars + space

✅ Clean URL Examples

/blog/my-post-title
/products/shoes-best-sellers
/search-results/shoes
/cafe-parisien               # Transliterated to ASCII
/products/cpp-tips

Non-ASCII Characters in URLs

For content in non-English languages (e.g., /über-uns, /产品), modern browsers display the decoded form but transmit the percent-encoded form. Inconsistencies arise when:

  • Some tools encode and others don't
  • The server treats /über-uns and /%C3%BCber-uns as different URLs

Best practice: Use ASCII slugs (transliterate accented chars) or ensure your server normalizes all encodings to a single canonical-url form.

URL Slug Sanitization (Node.js)

function slugify(text) {
  return text
    .toLowerCase()
    .normalize('NFD')                   // Decompose accented chars
    .replace(/[\u0300-\u036f]/g, '')    // Remove accent marks
    .replace(/[^a-z0-9\s-]/g, '')      // Remove non-slug chars
    .trim()
    .replace(/\s+/g, '-')             // Spaces to hyphens
    .replace(/-+/g, '-')              // Collapse multiple hyphens
}
 
slugify('Café & Restaurant — Paris!')  // → 'cafe-restaurant-paris'

Exceptions

  • Necessary utility or compliance pages can be intentionally brief and should not be judged by the same editorial-depth expectations as ranking-focused content.
  • AI-assisted drafting is not a failure by itself; flag unsupported claims, missing editorial review, or low-originality output instead.
  • When a page has both trust-signal issues and crawl/index problems, make the page eligible to rank first and then improve the content quality signals.

Standards

  • Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
  • Check the implementation against RFC 3986: Uniform Resource Identifier (URI) — Generic Syntax before treating the rule as satisfied.
  • Check the implementation against Google: Keep a simple URL structure before treating the rule as satisfied.

Verification

Automated Checks

  • Crawl the site and export all URLs; filter for characters outside [A-Za-z0-9/\-_.]
  • Test both encoded and decoded versions of a URL to ensure they return the same canonical-url response
  • Check server logs for crawl errors caused by misencoded URLs

Manual Checks

  • Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.

Use with AI

Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.

Check

Verify implementation

Scan all crawled URL paths for characters outside `[A-Za-z0-9\-_./~]`. Flag URLs containing spaces (`%20` or literal), special symbols (`#`, `?`, `&`, `=`, `+` in paths), or unencoded non-ASCII characters. Check if different encodings of the same URL return separate 200 responses.

Fix

Auto-fix issues

Update the URL generation logic in your CMS or framework to strip or replace special characters. Replace spaces with hyphens. Percent-encode any reserved characters that must appear in paths. Set up redirects from problematic old URLs to clean new slugs.

Explain

Learn more

Explain the difference between reserved and unreserved characters in RFC 3986, how browsers and crawlers handle inconsistent percent-encoding, and why special characters in paths create duplicate content risks.

Review

Code review

Review metadata generation, rendered HTML, structured data, and response headers related to URL Special Characters. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.

Sources

References used to support the guidance in this rule.

Further Reading

Tools and supplementary material for exploring the topic in more depth.

Google Search Console
search.google.comTool

Rules that often go hand-in-hand with this one.

Keep XML sitemaps valid

Validates sitemap XML structure against the sitemaps.org protocol, URL limits, and encoding requirements.

SEO
Include indexable pages in your sitemap

Checks for canonical-url, indexable pages that are missing from the XML sitemap.

SEO
URL Stop Words

Flags common stop words in URL slugs that add length without improving keyword relevance.

SEO
Do not link from HTTPS to HTTP

Detects links from HTTPS pages to HTTP destinations, which trigger mixed content warnings and lose ranking signals

SEO

Was this rule helpful?

Your feedback helps improve rule quality. This stays internal for now.

Loading feedback...
0 / 385