Why HTML Encoding Matters
HTML uses specific characters for its syntax: < and > for tags, & for entities, " for attribute values, and ' for alternate attribute quoting. When these characters appear in content (not as HTML structure), they must be encoded as HTML entities to prevent browsers from misinterpreting them as HTML syntax.
Unencoded special characters cause two major problems:
- Broken HTML: The page renders incorrectly as browsers parse characters as markup
- XSS vulnerabilities: User input displayed without encoding can execute malicious scripts
HTML Entity Syntax
HTML entities can be expressed in three forms:
Named Entities
The most readable form using standard names:
&→ &<→ <>→ >"→ "'→ ' → non-breaking space©→ ©®→ ®™→ ™€→ €
Decimal Numeric References
Using the Unicode code point in decimal:
<→ < (60 decimal)>→ > (62 decimal)©→ © (169 decimal)€→ € (8364 decimal)
Hexadecimal Numeric References
Using the Unicode code point in hexadecimal (prefixed with x):
<→ < (0x3C hex)>→ > (0x3E hex)©→ © (0xA9 hex)€→ € (0x20AC hex)
Essential HTML Entities to Know
The Security-Critical Five
These five characters MUST be encoded when displaying user-generated content:
| Character | Entity | When to Encode |
|---|---|---|
| & | & |
Always — start of entity syntax |
| < | < |
In text content and attribute values |
| > | > |
In text content |
| " | " |
In attribute values (double-quoted) |
| ' | ' or ' |
In attribute values (single-quoted) |
Typography Entities
Common typographic characters:
—→ — (em dash, longer)–→ – (en dash, shorter)‘’→ ' ' (curly single quotes)“”→ " " (curly double quotes)…→ … (ellipsis)•→ • (bullet)·→ · (middle dot)
Mathematical and Scientific
×→ × (multiplication sign)÷→ ÷ (division sign)±→ ± (plus-minus)°→ ° (degree symbol)∞→ ∞ (infinity)∑→ ∑ (summation)π→ π (pi)
Arrows
←→ ← (left arrow)→→ → (right arrow)↑→ ↑ (up arrow)↓→ ↓ (down arrow)↔→ ↔ (left-right arrow)
XSS Prevention: HTML Encoding
Cross-Site Scripting (XSS) occurs when user input is displayed in HTML without encoding:
<!-- Vulnerable: User input displayed directly -->
Hello, <?= $_GET['name'] ?>
<!-- If name = "><script>alert('xss')</script>< it becomes: -->
Hello, "><script>alert('xss')</script><
<!-- Safe: Encoded output -->
Hello, <?= htmlspecialchars($_GET['name'], ENT_QUOTES, 'UTF-8') ?>
Always encode output, not just on obvious display locations. Encoding must happen at the point of output, not at input time.
Encoding in Different Languages
JavaScript
function escapeHtml(text) {
return text
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
Python
import html
safe = html.escape('<script>alert("xss")</script>')
# <script>alert("xss")</script>
PHP
$safe = htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
Using the HTML Entities Tool
Our tool:
- Encode text — convert special characters to HTML entities
- Decode entities — convert HTML entities back to characters
- Multiple formats — choose named, decimal, or hex entities
- Character reference — browse all named HTML entities
- Copy output — one-click copy of encoded/decoded text
Essential for web developers working with user-generated content, building template systems, and testing XSS prevention.