Understanding Base64
Bytes, in printable ink.
What it is, why it exists, and why it makes things bigger by exactly one third.
What it is.
Base64 is an encoding — a way of writing arbitrary bytes using only sixty-four printable characters. The alphabet is the uppercase and lowercase Latin letters, the digits zero through nine, plus two extras (+ and / in the standard variant; - and _ in the URL- safe one). The equals sign is reserved for padding at the end.
A–Z · a–z · 0–9 · + · / (= for padding)
Why it exists.
The internet was, originally, a text-only place. Email, the web, the early protocols — all of them assumed seven-bit ASCII and would mangle anything outside it. Base64 lets you take a binary payload (an image, an encrypted blob, a binary protocol message) and squeeze it into a stream that those text systems will pass through unchanged. It survived because so much of our plumbing still expects text — JSON, URLs, HTML attributes, email attachments, JWT payloads.
The 4-to-3 ratio.
Sixty-four is two to the sixth. Six bits per character. Bytes come in eights. The least common multiple is twenty-four bits — three bytes, four Base64 characters. So Base64 always inflates payloads by exactly one third (plus a little for padding): three bytes in becomes four characters out, three megabytes becomes four megabytes. There is no Base64 variant that avoids this overhead; the alphabet's size dictates the math.
3 bytes ⇒ 4 characters (≈ +33%)
A worked encoding.
Take the two-character string "Hi". Its bytes are 72 and 105 — binary 01001000 01101001. Slice that bit-stream into six-bit groups: 010010 000110 1001 — the last group has only four bits, so we pad with two zero bits to make100100, giving 010010 000110 100100. Look up each six-bit group in the Base64 alphabet (18 → S, 6 → G, 36 → k) and append a single = for the missing byte. The result is SGk=.
"Hi" to Base64
ASCII: 72 105 → bits → 6-bit groups → alphabet
Two bytes is a partial group, so the output is padded with one '='.
"Hi" → SGk=
= SGk=
"Hello" to Base64
ASCII: 72 101 108 108 111 → bits → groups
Five bytes is one full triplet plus two; the output ends with '='.
"Hello" → SGVsbG8=
= SGVsbG8=
URL-safe Base64.
The standard alphabet uses + and /, which both have meaning inside URLs and would have to be percent-escaped to survive. The URL-safe variant — defined in RFC 4648 — swaps them for - and _ and often drops the = padding entirely. The bytes are identical; only the cosmetic alphabet differs.
Base64 is not encryption.
A surprising number of bug reports and CVEs come from people treating Base64 as if it hid information. It does not. Anyone can decode a Base64 string with a single line of code, no key required. Base64 is a transport format. If you want secrecy, encrypt the bytes first, then Base64-encode the ciphertext.
Read next