Understanding batch compression
A folder of files, one archive.
The two strategies for compressing a folder, what solid archives do better, and why ZIP isn't always the right format.
Two strategies.
Strategy 1: compress each file individually, bundle into a tar or zip with no further compression. Each file's compression is independent; uncompressing individual files is fast. Strategy 2: bundle the files first, compress the bundle as one stream. The compressor sees redundancy across files (a folder of JSON documents with the same keys, a directory of source files with the same imports), which the per-file strategy can't exploit. Strategy 2 is "solid" compression.
tar.gz / tar.xz / tar.zst — solid archives.
The Unix tradition. tar bundles files into one stream; gzip / xz / zstd compresses the stream. The combination produces solid archives: redundancy across files contributes to compression. A folder of 100 source files of the same project compresses 2-3× better as tar.zst than as individually-zipped files. The trade-off is random access — extracting one file from the middle requires decompressing everything up to it.
ZIP — per-file, random-access.
ZIP compresses each file independently and stores them with their offsets in a central directory. Random access works (extract any single file without decompressing the others). Compression ratios are worse than solid archives — no cross-file redundancy exploited. ZIP is the right answer when consumers will extract individual files and shouldn't have to decompress the rest. Solid is the right answer for backup and archival where you'll extract the whole thing.
A worked compression.
A folder of 200 JSON files, 5 MB total. Per-file ZIP (gzip-compatible compression per entry): 2 MB. Solid tar.gz of the same content: 800 KB. Same files, different packaging — solid is 2.5× smaller because the JSON keys and structures repeat across files. Use ZIP when the consumer doesn't have tar; solid for everything else.
Per-file vs solid
200 small JSON files
Cross-file redundancy is the win.
5MB raw → 2MB ZIP → 800KB tar.gz
= 2.5× smaller with solid
Encryption and passwords.
ZIP has password-based encryption (AES-256 in modern implementations). The older ZipCrypto algorithm is broken — never use it. 7z, RAR and modern ZIP variants all support strong encryption. For sensitive files, encrypt the archive; for encryption that's resistant to even nation-state attackers, use age or PGP (purpose-built crypto, not the bolted-on archive variants). Archive-level encryption is a layer; not the only layer.
Empty directories and metadata.
ZIP and tar both preserve empty directories, file permissions, and modification times by default. Some platforms preserve more (extended attributes, ACLs, macOS resource forks); cross-platform archives drop those gracefully. For source-code archives, file modes (executable bit) matter; for document archives, modification times matter (so the consumer sees them in chronological order). The right tool preserves what's needed without bloating the archive with irrelevant per-file metadata.