Understanding Proto ↔ JSON Schema
Two schema languages — one for wire, one for validation.
What Protobuf's field numbers buy you, where JSON Schema's vocabulary wins, and the conversion gotchas that bite.
Different jobs.
Protobuf describes a binary wire format with a strong evolution story. Each field has a number; field numbers never change, names are local to the schema; old code reading new messages skips unknown fields. JSON Schema describes the shape of a JSON document for runtime validation. The two formats overlap in what they can describe — types, required fields, enums — but their reason for existence is different.
The field-number rule.
In Protobuf, you can rename a field freely; the wire format only cares about the number. You can delete a field, reserving its number to prevent reuse. The schema evolution rules are strict but well-defined. JSON Schema has no equivalent — field names are the identity, renames are breaking. When converting Proto → JSON Schema, field numbers don't survive; converting JSON Schema → Proto requires inventing numbers (and committing to them forever).
What converts cleanly.
Scalar types: int32 → integer; string → string; bool → boolean; double → number. Messages → objects with properties. Enums → JSON Schema enums. Repeated fields → arrays. The straightforward 80 % of any schema converts unambiguously.
What doesn't.
Proto's oneof maps to JSON Schema's oneOf with a wrapper — the field names don't translate directly. Proto's well-known types (Timestamp, Duration, Any) need documented conventions on the JSON side. Proto's optional + presence semantics (whether the field was explicitly set vs defaulted) don't fit JSON Schema's "present vs absent" model. Maps in Proto → patternProperties in JSON Schema. The conversion tool has to make decisions about each; document them.
A worked Proto message.
message User { string id = 1; string name = 2; repeated string emails = 3; } → JSON Schema: { "type": "object", "properties": { "id": {"type":"string"}, "name": {"type":"string"}, "emails": {"type":"array","items":{"type":"string"}} } }. Required fields in proto3 don't translate (proto3 made all fields optional with defaults); add them in the JSON Schema by hand or via custom annotation.
User message
3 fields
Scalar + scalar + repeated string.
proto3 → JSON Schema 2020-12
= Direct translation
Which to keep as source.
If you serialise binary on the wire (gRPC, ProtoBuf-over-Kafka, mobile sync), Proto is the source. Generate JSON Schema for documentation, API gateway validation, and frontend code. If you serialise JSON on the wire (most REST APIs), JSON Schema is the source. Generate Proto only if a gRPC service later needs to consume the same types. Maintaining both as sources of truth ends in drift, every time.