Percent-Encoding

May 20, 2023

Percent-encoding is a method used to represent characters in a Uniform Resource Identifier (URI) or URL in a format that is compatible with the ASCII character set. It is useful when a character that is not part of the ASCII character set needs to be included in a URL. This is necessary because URLs can only contain ASCII characters, which are limited to 128 characters.

Percent-encoding replaces a character with a sequence of two hexadecimal digits that represent the ASCII code of the character. The encoding process involves converting the character into its hexadecimal code and then representing it as a percent sign followed by the two-digit hexadecimal code. For example, the percent-encoded value for the letter “A” is %41, where 41 represents the hexadecimal code for “A”.

The purpose of percent-encoding is to enable the use of characters that are not part of the ASCII character set in a URL. This is important because some characters, such as spaces, ampersands, and slashes, have special meanings in a URL and are used to separate different parts of the URL. If these characters are not properly encoded, they can cause errors or unexpected behavior in a web application.

Usage

Percent-encoding is used in various parts of a URL, including the scheme, authority, path, query string, and fragment identifier. The following sections describe how percent-encoding is used in each of these parts.

Scheme

The scheme of a URL specifies the protocol that is used to access the resource. Examples of schemes include http, https, ftp, and file. The scheme is followed by a colon and two slashes. If the scheme includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, the scheme for the “mailto” protocol includes the “@” symbol, which must be percent-encoded as “%40”.

Authority

The authority of a URL specifies the domain name or IP address of the server that hosts the resource. It may also include the port number that is used to access the server. If the authority includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, the domain name “example.com” would be percent-encoded as “example%2Ecom”.

Path

The path of a URL specifies the location of the resource on the server. It may include one or more segments, each separated by a forward slash. If a segment includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a path that includes the segment “my document.pdf” would be percent-encoded as “my%20document%2Epdf”.

Query String

The query string of a URL specifies additional parameters that are passed to the server. It begins with a question mark and includes one or more key-value pairs, separated by an ampersand. If a key or value includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a query string that includes the key “q” and the value “coffee & tea” would be percent-encoded as “q=coffee%20%26%20tea”.

Fragment Identifier

The fragment identifier of a URL specifies a specific section of the resource to be displayed. It begins with a hash symbol and includes a fragment identifier. If the fragment identifier includes characters that are not part of the ASCII character set, they must be percent-encoded. For example, a fragment identifier that includes the text “section 3.2” would be percent-encoded as “section%203%2E2”.

Examples

The following examples demonstrate the use of percent-encoding in different parts of a URL.

Example 1: Encoding a Space in a Path

Suppose we want to include a space in the path of a URL. Since spaces are not allowed in URLs, we must encode the space as %20. For example, the URL for a file named “my document.pdf” would be:

file:///C:/User/Documents/my%20document.pdf

Example 2: Encoding an Ampersand in a Query String

Suppose we want to pass a query string parameter that includes an ampersand. Since ampersands are used to separate key-value pairs in a query string, we must encode the ampersand as %26. For example, the URL for a search query that includes the terms “coffee & tea” would be:

https://www.google.com/search?q=coffee%20%26%20tea

Example 3: Encoding a Non-ASCII Character in a Path

Suppose we want to include a non-ASCII character, such as the Euro sign (€), in the path of a URL. Since non-ASCII characters are not allowed in URLs, we must encode the character as a sequence of two hexadecimal digits. For example, the URL for a page that includes the Euro sign in its filename would be:

https://www.example.com/files/financial%20reports/euro%20revenue%20report%202019%20%e2%82%ac.pdf