Skip to content

Encoders & Crypto

ASCII / Unicode Converter

Inspect every code point of any string.

Runs in your browser
GlyphDecimalHexOctalBinaryHTMLJSRange
H72U+004811001001000H\u0048ASCII
e101U+006514501100101e\u0065ASCII
l108U+006C15401101100l\u006CASCII
l108U+006C15401101100l\u006CASCII
o111U+006F15701101111o\u006FASCII
!33U+00214100100001!\u0021ASCII
32U+00204000100000 \u0020space
👋128075U+1F44B37211311111010001001011👋\u{1F44B}Supplementary
32U+00204000100000 \u0020space
12371U+30533012311000001010011こ\u3053BMP
12435U+30933022311000010010011ん\u3093BMP
12395U+306B3015311000001101011に\u306BBMP
12385U+30613014111000001100001ち\u3061BMP
12399U+306F3015711000001101111は\u306FBMP

Every glyph is one code point — emoji and CJK characters above U+FFFF take a 4-byte hex.

Understanding code points

A number, a glyph, an encoding.

ASCII, Unicode and UTF-8 keep getting confused for the same thing. They're three distinct layers, and naming them correctly is half the trick.

Three different layers.

A character set is a numbered list of abstract characters — "the letter A is number 65". A code-point space is the range of all valid numbers in that set. An encoding is the rule for turning each number into bytes. ASCII is a tiny character set that happens to have a trivial encoding. Unicode is a vast character set that needs a real encoding — UTF-8 is by far the most common one. People say "Unicode" when they mean "UTF-8" and vice versa, but the distinction matters as soon as something breaks.

character → code point → bytes

ASCII — 128 characters, simple maths.

ASCII (1963, standardised 1967) defines 128 characters in seven bits. Codes 0–31 plus 127 are control characters — tab, newline, bell, escape. Codes 32–126 are printable — space through tilde. The encoding is "one byte per character, top bit always zero". Every modern computer can decode ASCII without a thought. Every modern Unicode encoding treats ASCII as a subset — your old plaintext files survive unchanged.

Unicode — 149,000 characters and counting.

Unicode's job is to give every character anyone uses, anywhere, a unique number called a code point. The space is up to U+10FFFF — over a million slots, of which around 149,000 are currently assigned. That covers every modern script, historical scripts like Phoenician and Egyptian hieroglyphs, mathematical symbols, technical notation, and the entire emoji ecosystem. Code points are written U+0041 (capital A), U+1F98A (fox face), U+2603 (snowman). The Unicode consortium publishes a new revision yearly.

UTF-8 — the bytes that carry Unicode.

UTF-8 is the encoding that turns Unicode code points into byte sequences. Its design is ingenious: every ASCII character takes one byte; every other character takes two, three or four bytes; you can tell from looking at any byte whether it's the start of a new character or a continuation byte. The result is backwards-compatible with ASCII, self-synchronising (lose a byte and you only lose one character), and reasonably space-efficient for languages that use the Latin alphabet.

A worked decoding.

The character "é" has code point U+00E9. UTF-8 encodes U+0080 through U+07FF as two bytes: 110xxxxx 10xxxxxx, where the x's hold the eleven bits of the code point. U+00E9 in eleven bits is 00011 101001 — pack into the template and you get 11000011 10101001, or 0xC3 0xA9. Run that through any UTF-8 decoder and you get "é" back.

é (U+00E9)

Two-byte UTF-8: 110xxxxx 10xxxxxx (11 bits)

Pack the 11-bit code point into the template's xxxxx fields.

0x00E9 = 0000 1110 1001 → 11000011 10101001

= 0xC3 0xA9

🦊 (U+1F98A)

Four-byte UTF-8: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (21 bits)

Emoji live above U+FFFF and always cost four UTF-8 bytes.

0x1F98A → 11110000 10011111 10100110 10001010

= 0xF0 0x9F 0xA6 0x8A

UTF-16, the historical mistake.

UTF-16 encodes most characters as two bytes and uses "surrogate pairs" (4 bytes) for everything outside the Basic Multilingual Plane. It was the right answer in the early 1990s when Unicode was thought to fit in 16 bits; once it didn't, the workaround stuck. UTF-16 is still the native string encoding of JavaScript, Java, Windows kernel and .NET — which means a JavaScript string.length counts UTF-16 code units, not code points, and emoji count as two. That's why a length of 2 can render as a single fox face.

Grapheme clusters — what the user actually sees.

Even a "code point" isn't quite "what the user perceives as one character". A flag emoji is two code points (regional indicators). A skin-toned waving hand is two code points (the gesture plus a modifier). "é" can be one code point (U+00E9) or two (U+0065 e + U+0301 combining acute) — both render identically. The user-perceived unit is called a grapheme cluster, and counting them correctly requires the Unicode text-segmentation algorithm in TR29. Most everyday code doesn't do this; most everyday code is occasionally wrong about emoji.

Why this inspector helps.

When something character-related is going wrong — a string is two bytes longer than it looks, a "blank" field is full of zero-width joiners, a regex isn't matching what your eyes see — what you want is to look at each individual code point and its bytes. The inspector here lays each character out alongside its U+ code point, its Unicode name, and its UTF-8 byte sequence. Looking at the table will tell you what's really there.

Frequently asked questions

Quick answers.

What is the difference between ASCII and Unicode?

ASCII is a 7-bit character set representing 128 basic characters like English letters and numbers. Unicode is a universal standard that covers over 140,000 characters, including emojis and international scripts, mapped to specific code points.

Can this tool detect hidden characters?

Yes. It will display code points for non-printable characters like null bytes, carriage returns, and zero-width spaces that are normally invisible in text editors.

What output formats are supported?

The converter provides character data in decimal, hexadecimal, and binary formats. It also displays the official Unicode name where available for easier identification.

Is my text data secure?

Yes. The conversion logic is handled entirely by your browser's JavaScript engine. No text input is sent to a server or stored in any logs.

People also search for

Related tools

More in this room.

See all in Encoders & Crypto