正在加载,请稍候…

HTML Entities: Encode Special Characters for Safe Web Display

Learn which HTML characters need encoding, common entity names, and how to prevent XSS with proper encoding.

Why HTML Encoding Matters

HTML uses specific characters for its syntax: < and > for tags, & for entities, " for attribute values, and ' for alternate attribute quoting. When these characters appear in content (not as HTML structure), they must be encoded as HTML entities to prevent browsers from misinterpreting them as HTML syntax.

Unencoded special characters cause two major problems:

  1. Broken HTML: The page renders incorrectly as browsers parse characters as markup
  2. XSS vulnerabilities: User input displayed without encoding can execute malicious scripts

HTML Entity Syntax

HTML entities can be expressed in three forms:

Named Entities

The most readable form using standard names:

  • &amp; → &
  • &lt; → <
  • &gt; → >
  • &quot; → "
  • &apos; → '
  • &nbsp; → non-breaking space
  • &copy; → ©
  • &reg; → ®
  • &trade; → ™
  • &euro; → €

Decimal Numeric References

Using the Unicode code point in decimal:

  • &#60; → < (60 decimal)
  • &#62; → > (62 decimal)
  • &#169; → © (169 decimal)
  • &#8364; → € (8364 decimal)

Hexadecimal Numeric References

Using the Unicode code point in hexadecimal (prefixed with x):

  • &#x3C; → < (0x3C hex)
  • &#x3E; → > (0x3E hex)
  • &#xA9; → © (0xA9 hex)
  • &#x20AC; → € (0x20AC hex)

Essential HTML Entities to Know

The Security-Critical Five

These five characters MUST be encoded when displaying user-generated content:

Character Entity When to Encode
& &amp; Always — start of entity syntax
< &lt; In text content and attribute values
> &gt; In text content
" &quot; In attribute values (double-quoted)
' &#39; or &apos; In attribute values (single-quoted)

Typography Entities

Common typographic characters:

  • &mdash; → — (em dash, longer)
  • &ndash; → – (en dash, shorter)
  • &lsquo; &rsquo; → ' ' (curly single quotes)
  • &ldquo; &rdquo; → " " (curly double quotes)
  • &hellip; → … (ellipsis)
  • &bull; → • (bullet)
  • &middot; → · (middle dot)

Mathematical and Scientific

  • &times; → × (multiplication sign)
  • &divide; → ÷ (division sign)
  • &plusmn; → ± (plus-minus)
  • &deg; → ° (degree symbol)
  • &infin; → ∞ (infinity)
  • &sum; → ∑ (summation)
  • &pi; → π (pi)

Arrows

  • &larr; → ← (left arrow)
  • &rarr; → → (right arrow)
  • &uarr; → ↑ (up arrow)
  • &darr; → ↓ (down arrow)
  • &harr; → ↔ (left-right arrow)

XSS Prevention: HTML Encoding

Cross-Site Scripting (XSS) occurs when user input is displayed in HTML without encoding:

<!-- Vulnerable: User input displayed directly -->
Hello, <?= $_GET['name'] ?>

<!-- If name = "><script>alert('xss')</script>< it becomes: -->
Hello, "><script>alert('xss')</script><

<!-- Safe: Encoded output -->
Hello, <?= htmlspecialchars($_GET['name'], ENT_QUOTES, 'UTF-8') ?>

Always encode output, not just on obvious display locations. Encoding must happen at the point of output, not at input time.

Encoding in Different Languages

JavaScript

function escapeHtml(text) {
  return text
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#039;');
}

Python

import html
safe = html.escape('<script>alert("xss")</script>')
# &lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;

PHP

$safe = htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');

Using the HTML Entities Tool

Our tool:

  1. Encode text — convert special characters to HTML entities
  2. Decode entities — convert HTML entities back to characters
  3. Multiple formats — choose named, decimal, or hex entities
  4. Character reference — browse all named HTML entities
  5. Copy output — one-click copy of encoded/decoded text

Essential for web developers working with user-generated content, building template systems, and testing XSS prevention.