More about “unsafe” characters from RFC1738:
Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters “<
” and “>
” are unsafe because they are used as the delimiters around URLs in free text; the quote mark (“"
”) is used to delimit URLs in some systems. The character “#
” is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character “%
” is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are “{
”, “}
”, “|
”, “\
”, “^
”, “~
”, “[
”, “]
”, and “`
”.
Classification | Included characters | Encoding required? |
---|---|---|
Safe characters | Alphanumerics [0-9a-zA-Z] , special characters $-_.+!*'() , and reserved characters used for their reserved purposes (e.g., question mark used to denote a query string) |
NO |
ASCII Control characters | Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.) | YES |
Non-ASCII characters | Includes the entire “top half” of the ISO-Latin set 80-FF hex (128-255 decimal.) | YES |
Reserved characters | $ & + , / : ; = ? @ (not including blank space) |
YES* |
Unsafe characters | Includes the blank/empty space and " < > # % { } | \ ^ ~ [ ] ` |
YES |