Understanding Protobuf
Tag + wire-type + value — the format that won the binary wars.
What goes on the wire, why field numbers matter forever, and the schema-evolution rules that survive a refactor.
A field is three things.
On the wire, every field is encoded as: a varint tag combining the field number and wire type (1-5 bytes), then the value (length-prefixed or raw depending on wire type). The wire types: 0 = varint (ints, bools, enums), 1 = fixed64 (double, fixed64), 2 = length-delimited (string, bytes, sub-messages, packed repeated), 5 = fixed32. Six types total; everything else builds on them.
Varint encoding.
Integers use a variable-length encoding: 7 bits per byte, MSB = continuation flag. Numbers 0-127 take 1 byte; 128-16383 take 2 bytes; large numbers up to 10. The encoding is biased toward small numbers — perfect for field numbers, lengths, and most real-world counters. Negative numbers in signed types use zig-zag encoding first (so -1 becomes 1, -2 becomes 3) to avoid the worst-case 10-byte hit.
Field numbers are permanent.
The number assigned to a field is the wire identity. Renaming the field in the schema doesn't affect the bytes; deleting the field means future readers see an "unknown field" and skip it. Reusing a deleted number for a different type is the one operation that breaks compatibility — old clients decode the new bytes as the old type and crash or corrupt data. The fix: reserve deleted numbers explicitly (reserved 5;) so the compiler refuses to reuse them.
A worked encode.
message Point { int32 x = 1; int32 y = 2; } with x=150, y=-1. Tag for field 1, wire type 0: (1 << 3 | 0) = 0x08. Varint 150: 0x96 0x01. Tag for field 2: 0x10. Zigzag(-1) = 1; varint: 0x02 — but plain int32 sign-extends to 10 bytes, 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0x01. Use sint32 instead to get zigzag and stay small. The size matters at scale.
Point(x=150,y=-1)
tag + varint
2 fields, raw bytes.
08 96 01 + 10 FF FF FF FF FF FF FF FF FF 01
= 14 bytes total
Backward + forward compatibility.
Adding a new field with a new number is always safe — old readers skip it; new readers see it. Changing a field's type is rarely safe; most type changes break wire compatibility. Switching a singular field to repeated is safe; the reverse is not. The compatibility table in the Protobuf docs is worth memorising — it's the answer to most schema-review arguments.
vs gRPC vs Avro vs MessagePack.
gRPC is Protobuf over HTTP/2 — Protobuf is the data format, gRPC the RPC framework. Avro is the closest competitor; embeds the schema in every message, supports schema-less wire format with separate schema registry, dominant in the Kafka world. MessagePack is JSON-like — schemaless, less efficient than Protobuf, much easier to adopt without a schema toolchain. Protobuf wins when types are stable and traffic is high; Avro wins for analytics pipelines; MessagePack wins when you just need JSON-but-smaller.