Switch to

08.07.2024

Special characters and URL problems

Special characters and URL problems

The impact of special characters in URLs has long been a topic of discussion among web developers and SEO experts.

Many industry professionals advise against using non-ASCII characters in URLs, emphasising the importance of keeping URLs concise and direct.

ASCII (pronounced “askee”), which stands for the American Standard Code for Information Interchange, is the numerical representation of characters that enables data transfer between computers. Originally a 7-bit code that represented 128 characters, it was later expanded to an 8-bit code, allowing for 256 characters.

While the inclusion of special characters can sometimes enhance readability, it often complicates URL management and can lead to indexing and positioning issues on search engines. In this article, we’ll delve into the reasons behind these challenges and explore best practices for creating effective URLs.

Why avoid special characters?

A URL (Uniform Resource Locator) is essentially a sequence of alphanumeric characters that uniquely identifies an IP address or online resource, making a website accessible via a browser.

URLs have dual communicative value: they provide users with information about a webpage and facilitate indexing by search engines, influencing Search Engine Result Page (SERP) rankings.

To maximise readability and memorability, URLs should avoid the use of special characters. Special characters include any symbols that are not letters or numbers, such as punctuation marks and mathematical symbols. In URLs, words are typically separated by hyphens:

www.sitename.com/why-avoid-special-characters

If we replace the hyphens with a non-ASCII character, such as a space, it is often converted to a code like %20:

www.sitename.com/why%20avoid%20special%20characters

Clearly, the second example is harder to read, type, and remember. To prevent such issues, it is important to be mindful of the characters and symbols used in URLs.

CMSs and special characters

Content Management Systems (CMSs), such as WordPress, are among the most popular options for website building and maintenance, even for less proficient users.

To simplify the process, especially for beginners, CMSs are designed to handle non-ASCII characters by automatically decoding them. For example, they remove apostrophes and replace spaces with hyphens.

This functionality ensures that even novices can have blogs with clearly legible and user-friendly URLs.

Special characters and technical issues

Despite advancements that streamline many processes today, it is still advisable to avoid using anomalous characters such as apostrophes, superscripts, accented letters, and symbols like the trademark symbol. This precaution not only improves memorability but also affects the visualisation of the resource.

In coding, using non-ASCII characters in relative URL paths, rather than absolute URL paths, can cause problems. A relative path specifies a URL relative to the current page’s location, based on the current directory or website structure, and does not include the domain name or protocol. In such cases, even Google’s crawlers might assemble the wrong path, leading to internal 404 error pages.
In conclusion, always remember that regardless of the points discussed, three main elements affect a URL: the operating system, the browser, and the device. Avoiding special characters is a fundamental principle that enhances the portability of a web resource, ensuring compatibility across different software and hardware. By minimising the use of special characters, you reduce the risks associated with URL handling and improve overall user experience.

  • Stay up-to-date on all our new releases

  • This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.