Understanding JSON Schema inference
A schema is a contract, not a class.
What JSON Schema describes that a TypeScript or Go type can't, and the trade-off you're making when you generate one from a sample.
Schema is a different layer.
A TypeScript type tells the compiler what shape a value has. A JSON Schema tells a validator what shape a value is allowed to have. The first is consumed by a language tool; the second is consumed at runtime, in any language, by any conforming implementation. Schemas survive across languages, encode richer constraints (string patterns, numeric ranges, format hints), and form the basis of OpenAPI, AsyncAPI and most modern API contracts.
type = shape ; schema = contract
Draft 7, 2019-09, 2020-12.
JSON Schema has gone through several drafts; the still-most-deployed is Draft 7, and 2020-12 is the modern standard. The differences are in how composition works (Draft 7's definitions became $defs), how unevaluated properties are handled, and which keywords exist. Inference tools usually emit Draft 7 or 2020-12 by default; pick whichever your validator supports.
The inferred fields.
A reasonable inferrer reads each value and emits a type, a properties map for objects, an items spec for arrays, and a required list for keys that appeared in every sample. Beyond that baseline it can add format hints (date-time, email, uri) detected from string values, minLength/maxLength/pattern from observed string ranges, and minimum/maximum from numeric extremes. The default behaviour is structure-only; richer constraints have to be opted in or set by hand.
A worked schema.
From { "id": 7, "email": "a@b.com", "tags": ["x", "y"] }, a useful schema is: { "type": "object", "properties": { "id": { "type": "integer" }, "email": { "type": "string", "format": "email" }, "tags": { "type": "array", "items": { "type": "string" } } }, "required": ["id","email","tags"] }. The format hint for email is detected; the integer vs number distinction comes from the absence of decimals in the sample.
object + array shape
single sample
Required because the key appeared; format-email because the value matched.
type: object → properties + items + required + format hints
= A minimal but complete schema
The "no example shows it" problem.
Schema inference from a single document inflates "required" too aggressively — every field present is required. With multiple samples the inferrer can widen properly: a field that's missing from any sample becomes optional. The right input is therefore a representative collection (a few hundred real documents), not one perfect example. The opposite mistake is generalising too far: a sample collection of homogeneous documents will emit a schema with everything optional, which validates almost anything.
Composition with allOf / anyOf / oneOf.
These are the keywords that distinguish JSON Schema from simple type systems. allOf says "the value must validate against all of these schemas" — used for inheritance and extension. anyOf says "at least one" — used for permissive unions. oneOf says "exactly one" — used for discriminated unions where only one shape can fit. Naive inferrers don't produce these; sophisticated ones detect tagged unions in arrays and emit oneOf with a discriminator.
Schemas drive everything downstream.
A clean JSON Schema is the input to dozens of code generators (quicktype, openapi- generator, datamodel-code-generator), to validators in every major language, to mock- data generators, to API documentation tools, to record-shape databases. Investing in the schema rather than the language-specific output is the durable path: it ages better than any one language's idioms, and a single schema fans out to a dozen consumers.