Understanding fake-data generation
A thousand convincing rows in a second.
What a fake-JSON generator does that Math.random() doesn't, the locales that matter, and the testing gotchas around deterministic seeds.
Faker is the library underneath.
Almost every "fake data" tool in JavaScript ultimately wraps Faker.js (or its modern fork @faker-js/faker). The library carries dictionaries of realistic-looking values — first names, last names, street addresses, company names, product titles, lorem-ipsum sentences — and provides generators that pick plausible combinations. The output looks like real data to a glance because the parts are real.
Schema-driven generation.
A fake-JSON generator takes a schema (often described as field name → generator type) and produces N records matching it. Common types: name.fullName,internet.email, address.streetAddress, date.recent, commerce.price, lorem.paragraph. The schema spells out the shape; the generator fills in plausible values for each row.
A worked schema.
Schema: { id: "datatype.uuid", name: "name.fullName", email: "internet.email", joined: "date.past" }. Generating 3 rows might produce: [
{ id: "9b...", name: "Cora Wilkinson", email: "cora.wilkinson@gmail.com", joined: "2024-02-14" },
{ id: "5a...", name: "Marvin Lakin", email: "marvin.lakin84@hotmail.com", joined: "2023-11-02" },
{ id: "f2...", name: "Aniyah Smith", email: "aniyah62@yahoo.com", joined: "2025-08-19" }
] Each row's email is locally consistent (loosely matches the name), the dates fall in a sensible range. Convincing-enough for a screenshot.
Schema → rows
field types map to generators
One pass per row picks one value per field.
schema={id,name,email,joined} → 100 plausible records
= Realistic-looking test data
Seeding for reproducible tests.
Pass a seed (faker.seed(42)) and the generator becomes deterministic — the same call produces the same output every time. This is the difference between fake data you can use in unit tests (deterministic, asserts pin to specific values) and fake data that's just for screenshots (random per run). Generators that don't expose a seed force you to snapshot the output, which makes refactors painful.
Locales matter.
Names that look right in English look obviously wrong in a Korean app. Faker supports dozens of locales — fr, de, ja, ko, ar — that swap in locally-appropriate dictionaries. Picking the right locale is the difference between fake data that surfaces real rendering bugs (longest-name layout, RTL alignment, character-set support) and fake data that's invisible to half your problem cases.
Don't seed real systems.
Fake-but-plausible data is great for local dev and CI. It's a problem in a system that does anything real: an automated email to aniyah62@yahoo.com generated as test data may land in a real inbox. A seeded credit card number may accidentally satisfy a Luhn-check at the payment processor. Keep fake data in isolated environments; tag every record so any code path that escaped to production can be detected.