URL Special Characters
Checks for problematic special characters in URL paths that can cause crawling, parsing, or canonicalization issues.
- URL paths should contain only unreserved characters: A-Z, a-z, 0-9, `-`, `_`, `.`, `~`
- Spaces must be encoded as `%20` (not `+`, which is query-string syntax)
- Characters like `#`, `?`, `&`, `=` have reserved meaning in URLs and must be percent-encoded in path segments
- Non-ASCII characters (accented letters, CJK) should be percent-encoded in URLs, though modern browsers decode them for display
Rule Details
URL characters fall into two categories in RFC 3986 (opens in new tab): unreserved characters are safe as-is, while reserved characters have special meaning and must be encoded when used literally in path segments. This rule sits right next to clean URL structure and slug-formatting hygiene.
Code Example
A-Z a-z 0-9 - _ . ~Why It Matters
Special characters in URL paths cause inconsistent crawling, broken links when shared, and canonicalization failures when different systems encode them differently. MDN's percent-encoding reference (opens in new tab) is the practical baseline for understanding those failures.
Reserved Characters (Special Meaning)
| Character | Reserved Meaning | Encoded Form |
|---|---|---|
? | Start of query string | %3F |
# | Fragment identifier | %23 |
& | Query parameter separator | %26 |
= | Key-value separator | %3D |
+ | Space (in query strings only) | %2B in paths |
/ | Path segment separator | %2F if literal |
% | Encoding prefix | %25 |
❌ Problematic URL Examples
/blog/my post title # Space — breaks parsers
/products/shoes#best # Fragment lost during crawling
/search?q=shoes&page=1/results # Query chars in path
/café-parisien # Non-ASCII — encoding inconsistency
/products/C++ tips # Special chars + space✅ Clean URL Examples
/blog/my-post-title
/products/shoes-best-sellers
/search-results/shoes
/cafe-parisien # Transliterated to ASCII
/products/cpp-tipsNon-ASCII Characters in URLs
For content in non-English languages (e.g., /über-uns, /产品), modern browsers display the decoded form but transmit the percent-encoded form. Inconsistencies arise when:
- Some tools encode and others don't
- The server treats
/über-unsand/%C3%BCber-unsas different URLs
Best practice: Use ASCII slugs (transliterate accented chars) or ensure your server normalizes all encodings to a single canonical-url form.
URL Slug Sanitization (Node.js)
function slugify(text) {
return text
.toLowerCase()
.normalize('NFD') // Decompose accented chars
.replace(/[\u0300-\u036f]/g, '') // Remove accent marks
.replace(/[^a-z0-9\s-]/g, '') // Remove non-slug chars
.trim()
.replace(/\s+/g, '-') // Spaces to hyphens
.replace(/-+/g, '-') // Collapse multiple hyphens
}
slugify('Café & Restaurant — Paris!') // → 'cafe-restaurant-paris'Exceptions
- Necessary utility or compliance pages can be intentionally brief and should not be judged by the same editorial-depth expectations as ranking-focused content.
- AI-assisted drafting is not a failure by itself; flag unsupported claims, missing editorial review, or low-originality output instead.
- When a page has both trust-signal issues and crawl/index problems, make the page eligible to rank first and then improve the content quality signals.
Standards
- Use these references as the standard for the final search-facing HTML, metadata, and crawl behavior.
- Check the implementation against RFC 3986: Uniform Resource Identifier (URI) — Generic Syntax before treating the rule as satisfied.
- Check the implementation against Google: Keep a simple URL structure before treating the rule as satisfied.
Verification
Automated Checks
- Crawl the site and export all URLs; filter for characters outside
[A-Za-z0-9/\-_.] - Test both encoded and decoded versions of a URL to ensure they return the same canonical-url response
- Check server logs for crawl errors caused by misencoded URLs
Manual Checks
- Review representative live pages manually and confirm there is no stronger conflicting signal that changes the intended SEO outcome.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
Scan all crawled URL paths for characters outside `[A-Za-z0-9\-_./~]`. Flag URLs containing spaces (`%20` or literal), special symbols (`#`, `?`, `&`, `=`, `+` in paths), or unencoded non-ASCII characters. Check if different encodings of the same URL return separate 200 responses.
Fix
Auto-fix issues
Update the URL generation logic in your CMS or framework to strip or replace special characters. Replace spaces with hyphens. Percent-encode any reserved characters that must appear in paths. Set up redirects from problematic old URLs to clean new slugs.
Explain
Learn more
Explain the difference between reserved and unreserved characters in RFC 3986, how browsers and crawlers handle inconsistent percent-encoding, and why special characters in paths create duplicate content risks.
Review
Code review
Review metadata generation, rendered HTML, structured data, and response headers related to URL Special Characters. Flag exact routes or templates where search-facing output violates the rule, and describe how to verify the final page output.