URL 编码详解：何时、为何以及如何正确编码 URL

了解什么是 URL 编码，特殊字符为何会破坏 URL，encodeURI 与 encodeURIComponent 的区别，以及如何在各种语言中处理编码。

什么是 URL 编码？

URL 编码（也称为百分号编码）将 URL 中不允许的字符转换为安全格式。每个不安全字符被替换为 % 后跟其两位十六进制 ASCII 码。

示例：

空格 → %20
& → %26
= → %3D
# → %23

因此 hello world & foo=bar 变为 hello%20world%20%26%20foo%3Dbar。

URL 编码详解：何时、为何以及如何正确编码 URL 插图

为什么 URL 需要编码？

RFC 3986 定义了 URL 的受限字符集。字符分为三类：

类别	字符	处理方式
非保留字符	A–Z, a–z, 0–9, `-`, `_`, `.`, `~`	直接使用
保留字符	`:`, `/`, `?`, `#`, `[`, `]`, `@`, `!`, `$`, `&`, `'`, `(`, `)`, `*`, `+`, `,`, `;`, `=`	在 URL 结构中有特殊含义
不安全字符	空格, `"`, `<`, `>`, `{`, `}`, `	`,` `, `^`, `

如果在查询参数值中直接包含 &，服务器会将其解释为参数分隔符，导致数据静默损坏。

encodeURI 与 encodeURIComponent

JavaScript 提供了两个函数——知道使用哪个至关重要。

`encodeURI(url)`

编码完整 URL。它不编码在 URL 中具有结构意义的字符（:, /, ?, #, &, =）。

encodeURI('https://example.com/search?q=hello world&lang=en');
// → 'https://example.com/search?q=hello%20world&lang=en'
// 注意：?, &, = 未被编码——它们保留 URL 结构角色

`encodeURIComponent(value)`

编码单个值（如查询参数）。它编码所有非保留字符，包括 :, /, ?, #, &, =。

const query = 'C++ & Java: what's better?';
const url = 'https://example.com/search?q=' + encodeURIComponent(query);
// → 'https://example.com/search?q=C%2B%2B%20%26%20Java%3A%20what's%20better%3F'

规则：始终对查询字符串中的值使用 encodeURIComponent。仅当编码整个预构建 URL 且不希望破坏其结构时，才使用 encodeURI。

URL 编码详解：何时、为何以及如何正确编码 URL 插图

其他语言中的 URL 编码

Python

from urllib.parse import quote, urlencode, quote_plus

# 编码单个值
quote("hello world & more")      # → 'hello%20world%20%26%20more'
quote_plus("hello world")        # → 'hello+world'（空格转为 +，用于表单数据）

# 编码查询参数
params = {'q': 'C++ guide', 'lang': 'en'}
urlencode(params)  # → 'q=C%2B%2B+guide&lang=en'

PHP

// 编码单个值
urlencode("hello world & more");   // → 'hello+world+%26+more'
rawurlencode("hello world & more"); // → 'hello%20world%20%26%20more'

// 构建查询字符串
http_build_query(['q' => 'C++ guide', 'lang' => 'en']);
// → 'q=C%2B%2B+guide&lang=en'

Go

import "net/url"

val := url.QueryEscape("hello world & more")
// → "hello+world+%26+more"

u := &url.URL{
    Scheme:   "https",
    Host:     "example.com",
    Path:     "/search",
    RawQuery: url.Values{"q": {"C++ guide"}}.Encode(),
}
fmt.Println(u.String())
// → https://example.com/search?q=C%2B%2B+guide

URL 编码详解：何时、为何以及如何正确编码 URL 插图

常见编码错误

双重编码

// 错误：对已编码的字符串再次编码
const val = encodeURIComponent("hello world"); // "hello%20world"
encodeURIComponent(val); // "hello%2520world" ← %25 是 % 的编码

如果需要，先解码再重新编码，或者跟踪值是否已编码。

在错误的层级编码

仅编码查询参数值——而不是整个 URL。对整个 URL 中的结构字符（如 ? 或 /）进行编码会破坏路由。

混淆 `+` 与 `%20`

在查询字符串中，+ 常作为空格的简写（application/x-www-form-urlencoded 格式）。某些解析器两者都处理，其他则不然。%20 是普遍安全的。

如何解码 URL 编码

decodeURIComponent('hello%20world%20%26%20more');
// → 'hello world & more'

decodeURIComponent('C%2B%2B%20guide');
// → 'C++ guide'

在 Python 中：

from urllib.parse import unquote
unquote('hello%20world%20%26%20more')  # → 'hello world & more'

常见问题

问：URL 编码与 HTML 编码相同吗？ 不。HTML 编码将 < 等字符替换为 < 以防止 HTML 上下文中的 XSS。URL 编码使用 %XX 格式。它们用途不同。

问：在发起 fetch 请求前是否应该编码整个 URL？ 不。使用 URL 和 URLSearchParams 安全地构建 URL：

const url = new URL('https://example.com/search');
url.searchParams.set('q', 'C++ & Java');
url.searchParams.set('lang', 'en');
fetch(url.toString()); // 编码自动处理

问：为什么我的 API 在 URL 中有空格时返回 400？ URL 中的空格无效。HTTP 规范要求将其编码为 %20。对查询值使用 encodeURIComponent。

→ 使用 URL 编码器/解码器即时编码和解码 URL。

页面加载失败

URL 编码详解：何时、为何以及如何正确编码 URL

什么是 URL 编码？

为什么 URL 需要编码？

encodeURI 与 encodeURIComponent

encodeURI(url)

encodeURIComponent(value)

其他语言中的 URL 编码

Python

PHP

Go

常见编码错误

双重编码

在错误的层级编码

混淆 + 与 %20

如何解码 URL 编码

常见问题

`encodeURI(url)`

`encodeURIComponent(value)`

混淆 `+` 与 `%20`