Free Apache Avro Schema Tools

{ "type": "record", "name": "User", "namespace": "com.anytimeconvert", "fields": [ { "name": "id", "type": "long" }, { "name": "email", "type": "string" }, { "name": "name", "type": [ "null", "string" ], "default": null }, { "name": "active", "type": "boolean", "default": true }, { "name": "tags", "type": { "type": "array", "items": "string" }, "default": [] } ] }

Understanding Avro

Schema with the data — the Kafka format.

What Avro is, why the schema travels with the data, and how the schema registry handles evolution in distributed systems.

Schema-first, but inline.

Avro (Apache, 2009) is a row-oriented binary data format with schemas written in JSON. The trick: in Avro Object Container Files (.avro), the schema is embedded in the file header. A consumer can read the file without prior knowledge — schema and data ship together. In streaming systems like Kafka, the schema is replaced by a schema-ID that resolves through a Schema Registry.

The schema language.

A JSON document describing records, primitives, arrays, maps, unions, enums.{"type":"record","name":"User","fields":[{"name":"id","type":"long"},{"name":"email","type":["null","string"]}]}. Nullable fields are unions with null. Default values are required for evolution (without them, you can't add a new field to the schema without breaking old consumers). The schema becomes more verbose than Protobuf but more discoverable.

The wire format.

The data itself is compact binary: variable-length integers, length-prefixed strings, primitive types laid out in field order. No field tags; the decoder uses the schema to know which bytes are which. Result: ~30 % smaller than Protobuf for the same data, because there's no per-field tag overhead. The catch: decoding requires knowing the schema. Lose the schema, the file is unreadable.

Schema Registry.

Confluent's Schema Registry (now an open-standard) is the canonical way to ship Avro on Kafka. The producer publishes the schema to the registry, gets back a schema-ID (4 bytes), and prepends the ID to every message. Consumers look up the ID, fetch the schema, decode. Backward and forward compatibility checks happen at registration time — the registry refuses to register a breaking schema change.

A worked record.

Schema: User with id (long), email (nullable string), createdAt (long, logicalType timestamp-millis). Data: { id: 42, email: "alice@example.com", createdAt: 1748736000000 }. Encoded: ~25 bytes (varint 42 + length+string + varint timestamp). With Schema Registry: 4 byte schema ID + 25 bytes = 29 bytes on the wire. The same record in JSON: ~75 bytes. 3× compression with no entropy-coding compression layer.

User record

JSON 75B → Avro 29B

Schema-driven binary, registry ID prefix.

4B ID + 25B payload

= 3× smaller than JSON

vs Protobuf, JSON, Parquet.

Protobuf: schemas in .proto files, field numbers for evolution, slightly bigger wire format, no built-in registry. JSON: human-readable, much bigger, no schema-as-data. Parquet: column-oriented, optimised for analytics scans, terrible for row-by-row writing. Avro is the right answer for Kafka and other streaming row-oriented pipelines; Parquet is the right answer for the data lake those streams write into. Many pipelines convert Avro → Parquet at ingestion time.

Quick answers.

›What is an .avsc file?

An `.avsc` file is a JSON document that defines the schema for Avro data, including the record name, namespace, and fields with their respective types.

›Is my schema sent to a server for validation?

No. The validation and formatting logic runs entirely in your browser using JavaScript, so your schema remains local to your machine.

›Does this tool support complex Avro types?

Yes. It validates primitive types as well as complex types including records, enums, arrays, maps, unions, and fixed types.

›Why is my schema failing validation?

Common issues include missing required fields like `name` or `type`, syntax errors in the JSON structure, or using reserved keywords incorrectly.

Apache Avro Schema Tools