Understanding HTML entities
The four characters HTML can't ignore.
Why <, >, & and " can't appear raw — and how to write them when they have to.
What an entity is.
HTML borrows a small piece of syntax from SGML — an entity reference. It begins with an ampersand, ends with a semicolon, and stands in for a character the parser would otherwise read as markup. < means "less-than sign". The parser swallows the four characters and emits one. The mapping is fixed; both ends of the connection agree on it.
&name; ⇒ one Unicode character
The four characters that always need escaping.
In ordinary HTML body text, only three characters break parsing: < (would start a tag), > (in some contexts ends one), and & (would start another entity). Inside a double-quoted attribute value you also need to escape ". Everything else can sit raw, including emoji, accented letters and arrows — provided the page is served as UTF-8, which it should be.
Named, decimal, hex.
HTML lets you write the same character three ways. Named entities like © are mnemonic and short, but you have to know the name. Numeric forms work for any codepoint: © is decimal, © is hex. The numeric forms are useful when you don't trust the editor (or yourself) to type a character cleanly — they're pure ASCII and survive any pipeline.
Three ways to write ©
Named: © Decimal: © Hex: ©
Pick the form that's clearest in context. Named is most readable; numeric is most portable.
copy ↔ 169 ↔ 0xa9 → ©
= ©
Entities are not for safety.
A common bug pattern: take user input, slap it into HTML unescaped, and trust everyone to behave. They won't. Escaping the four characters above when you render user content into HTML is what blocks cross-site-scripting — once < becomes <, no <script> tag can sneak through. Frameworks like React do this for you on every text node, which is why JSX feels safer than string concatenation.
Where you don't need them.
You don't need entities for accented letters, currency symbols, arrows, dashes or emoji — the page's character encoding handles those. Replacing every accented letter with its named entity is a relic of the Latin-1 era; modern HTML5 in UTF-8 just stores the codepoint. Use entities only for the characters that would otherwise be parsed as markup.
Attribute values follow slightly different rules.
Inside an attribute value, the parser is hunting for the closing quote. If you wrote title="He said "hi"" without escaping, the parser would think the second " ends the value. So " exists for double-quoted attributes, and ' exists for single-quoted ones. The body-text rule of "escape four characters" widens by one whenever you're inside quotes.
Read next