Percent-Encoding
May 20, 2023
Percent-encoding is a method used to represent characters in a Uniform Resource Identifier (URI) or URL in a format that is compatible with the ASCII character set. It is useful when a character that is not part of the ASCII character set needs to be included in a URL. This is necessary because URLs can only contain ASCII characters, which are limited to 128 characters.
Percent-encoding replaces a character with a sequence of two hexadecimal digits that represent the ASCII code of the character. The encoding process involves converting the character into its hexadecimal code and then representing it as a percent sign followed by the two-digit hexadecimal code. For example, the percent-encoded value for the letter “A” is %41, where 41 represents the hexadecimal code for “A”.
The purpose of percent-encoding is to enable the use of characters that are not part of the ASCII character set in a URL. This is important because some characters, such as spaces, ampersands, and slashes, have special meanings in a URL and are used to separate different parts of the URL. If these characters are not properly encoded, they can cause errors or unexpected behavior in a web application.
Usage
Percent-encoding is used in various parts of a URL, including the scheme, authority, path, query string, and fragment identifier. The following sections describe how percent-encoding is used in each of these parts.
Scheme
The scheme of a URL specifies the protocol that is used to access the resource. Examples of schemes include http, https, ftp, and file. The scheme is followed by a colon and two slashes. If the scheme includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, the scheme for the “mailto” protocol includes the “@” symbol, which must be percent-encoded as “%40”.
Authority
The authority of a URL specifies the domain name or IP address of the server that hosts the resource. It may also include the port number that is used to access the server. If the authority includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, the domain name “example.com” would be percent-encoded as “example%2Ecom”.
Path
The path of a URL specifies the location of the resource on the server. It may include one or more segments, each separated by a forward slash. If a segment includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a path that includes the segment “my document.pdf” would be percent-encoded as “my%20document%2Epdf”.
Query String
The query string of a URL specifies additional parameters that are passed to the server. It begins with a question mark and includes one or more key-value pairs, separated by an ampersand. If a key or value includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a query string that includes the key “q” and the value “coffee & tea” would be percent-encoded as “q=coffee%20%26%20tea”.
Fragment Identifier
The fragment identifier of a URL specifies a specific section of the resource to be displayed. It begins with a hash symbol and includes a fragment identifier. If the fragment identifier includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a fragment identifier that includes the text “section 3.2” would be percent-encoded as “section%203%2E2”.
Examples
The following examples demonstrate the use of percent-encoding in different parts of a URL.
Example 1: Encoding a Space in a Path
Suppose we want to include a space in the path of a URL. Since spaces are not allowed in URLs, we must encode the space as %20. For example, the URL for a file named “my document.pdf” would be:
file:///C:/User/Documents/my%20document.pdf
Example 2: Encoding an Ampersand in a Query String
Suppose we want to pass a query string parameter that includes an ampersand. Since ampersands are used to separate key-value pairs in a query string, we must encode the ampersand as %26. For example, the URL for a search query that includes the terms “coffee & tea” would be:
https://www.google.com/search?q=coffee%20%26%20tea
Example 3: Encoding a Non-ASCII Character in a Path
Suppose we want to include a non-ASCII character, such as the Euro sign (€), in the path of a URL. Since non-ASCII characters are not allowed in URLs, we must encode the character as a sequence of two hexadecimal digits. For example, the URL for a page that includes the Euro sign in its filename would be:
https://www.example.com/files/financial%20reports/euro%20revenue%20report%202019%20%e2%82%ac.pdf