Understanding file compression
Repetition is what compresses.
Why a 100MB log file shrinks to 5MB and a 100MB photo doesn't, the algorithms behind gzip and brotli, and the choice between speed and ratio.
Compression finds patterns.
Every lossless compression algorithm does the same thing: find repeating sequences and replace them with shorter codes. A log file with the same prefix on every line compresses dramatically; a JSON file with the same keys repeated 10,000 times compresses dramatically; a JPEG photo (already compressed) barely compresses at all because there's no remaining redundancy to find. The ratio you get depends entirely on how much pattern is in the source.
DEFLATE — the workhorse.
gzip, zlib, ZIP and PNG all use the same algorithm — DEFLATE — which combines two ideas. LZ77 replaces repeated sequences with a "look back N bytes and copy M bytes" pointer. Huffman coding then replaces each remaining byte with a variable-length code, shorter for common bytes. The combination is fast, well-implemented in every language, and reasonably effective. Compression levels 1-9 trade speed for ratio; the default 6 is a sensible middle.
Brotli — the modern web default.
Google's Brotli (2015) is the de-facto compressor for HTTP responses. Same family of techniques as DEFLATE but with a much larger built-in dictionary of common web tokens (HTML tags, common English words, URL fragments) — so HTML and JavaScript compress 15-25 % better than with gzip at comparable speeds. Browsers all support it; servers serve it via Accept-Encoding: br. The wins are real but small enough that gzip-only servers still work fine.
A worked compression.
A 50MB SQL backup file (text, lots of repeated keywords and value patterns) compresses to ~5MB with gzip default (10× reduction), ~4MB with brotli at quality 11 (12.5× reduction), but the latter takes 5× longer to compress. A 50MB MP4 video file compresses to ~49MB — already compressed by the codec, no useful pattern left. The right tool only matters when the input has redundancy.
Text vs media
50MB SQL → 5MB ; 50MB MP4 → 49MB
Algorithm matters less than whether the input is already compressed.
text → 10× ; pre-compressed media → 1.02×
= Compression suits text and structured data
The "compress everything" trap.
Compressing already-compressed data wastes CPU for no benefit. JPEG, PNG, MP3, MP4, ZIP, PDF — all of these already use a compression algorithm internally. Gzipping them produces a file 1-2 % smaller and takes the same time as gzipping a real text file. For web servers, exclude these MIME types from on-the-fly compression; serve them pre-encoded. For backup tools, skip the compression step when the archive contents are already compressed.
Zstandard for everything else.
Facebook's Zstandard (2016) sits at the modern frontier — DEFLATE-comparable compression at 3-5× the speed, brotli-comparable compression at the same speed. Increasingly the default for backup tools (Tar's -I zstd), databases (PostgreSQL's compression option), and developer toolchains (Rust's cargo, Docker registries). Browsers don't accept it on the wire yet. For local compression, zstd is the modern default; for HTTP, brotli.
What runs in the browser.
The Compression Streams API (CompressionStream, DecompressionStream) ships native gzip and deflate in every modern browser. Brotli and Zstandard need a WebAssembly polyfill. The file is read as a stream, fed through the compressor, written back out — no upload, no third party, works on multi-gigabyte files because nothing is buffered fully in memory.