Skip to content

Formatters & Code

CSV → JSON Dataset

CSV + inferred column metadata for ML pipelines.

Runs in your browser
JSON dataset with metadata
{
  "metadata": {
    "rowCount": 5,
    "columnCount": 6,
    "columns": [
      {
        "name": "id",
        "type": "integer",
        "nullCount": 0,
        "uniqueCount": 5
      },
      {
        "name": "name",
        "type": "string",
        "nullCount": 0,
        "uniqueCount": 5
      },
      {
        "name": "email",
        "type": "string",
        "nullCount": 0,
        "uniqueCount": 1
      },
      {
        "name": "age",
        "type": "integer",
        "nullCount": 0,
        "uniqueCount": 5
      },
      {
        "name": "active",
        "type": "boolean",
        "nullCount": 0,
        "uniqueCount": 2
      },
      {
        "name": "city",
        "type": "string",
        "nullCount": 0,
        "uniqueCount": 5
      }
    ]
  },
  "data": [
    {
      "id": 1,
      "name": "Ada",
      "email": "[email protected]",
      "age": 36,
      "active": true,
      "city": "London"
    },
    {
      "id": 2,
      "name": "Grace",
      "email": "[email protected]",
      "age": 85,
      "active": true,
      "city": "New York"
    },
    {
      "id": 3,
      "name": "Alan",
      "email": "[email protected]",
      "age": 41,
      "active": false,
      "city": "Cambridge"
    },
    {
      "id": 4,
      "name": "Linus",
      "email": "[email protected]",
      "age": 55,
      "active": true,
      "city": "Helsinki"
    },
    {
      "id": 5,
      "name": "Margaret",
      "email": "[email protected]",
      "age": 87,
      "active": true,
      "city": "Boston"
    }
  ]
}

Understanding CSV → JSON dataset

Tabular rows into nested objects — for ML training and API responses.

What "nested" means in this context, the type-inference rules that mostly work, and the size threshold where JSONL beats one-big-array.

The simple translation.

A CSV with headers row plus N data rows becomes a JSON array of N objects. Each object's keys are the column headers; each value is the cell. The trivial conversion takes ~5 lines of code in any language. The interesting decisions are downstream: type inference, nested-object reconstruction, and how to chunk large files.

Type inference.

CSV is text-only. The converter has to guess types: "42" → integer, "3.14" → float, "true" → boolean, "2026-05-14" → ISO date string. Edge cases bite. "001" — a ZIP code or an integer? Usually you want the string. "True" — capitalised, still a boolean? Most parsers say yes; some say no. "1.0" vs "1" — same value, different type. The safe default is "infer from the column as a whole, not per-cell", and let the user override with a type hint.

Nested keys.

When column names contain dots — address.city, address.zip,address.country — they often encode nested structure. The conversion can re-nest: each row becomes { address: { city, zip, country } }instead of flat keys. Useful for feeding REST APIs that expect nested JSON. Optional; some downstream tools prefer the flat form for tabular processing.

JSONL vs JSON-array.

A small dataset (< 100k rows): output one JSON array. Fits in memory; loads withJSON.parse. A large dataset (millions of rows, training data): output JSONL — one JSON object per line, no surrounding array. Streams line-by-line; works with readline + JSON.parse; doesn't blow memory. ML pipelines, BigQuery, OpenAI fine-tuning APIs all consume JSONL. Knowing which format your downstream wants is the first question.

A worked conversion.

A 50k-row CSV of e-commerce orders with 12 columns including shipping.addressand shipping.country. Convert with type inference + nesting: output is 50k JSON objects, each with a shipping sub-object. File size grows ~20 % vs CSV (JSON syntax overhead) but the structure is now consumable by the frontend's table component without extra parsing. For ML training it would have been better as JSONL; for the dashboard it's fine as one array.

50k orders CSV

flat → nested

Header dots → sub-objects.

shipping.address + shipping.country → { shipping: {...} }

= 50k array, 20% larger

When to keep CSV.

Excel and Sheets users — JSON isn't openable for them. Reporting pipelines that consume CSV directly. Tools like AWS Athena and Snowflake that bulk-load CSV faster than JSON. Anywhere the consumer is tabular by nature. Convert to JSON only when the consumer needs nested structure, when the data is heading to a JSON-only API, or when the rows will be streamed one at a time. The conversion is cheap; the choice should be need-driven.

Frequently asked questions

Quick answers.

How does the tool handle data types?

The converter checks each value to see if it represents a number or a boolean. If a column consistently contains numeric patterns, it is output as a `number` type in the JSON rather than a string.

Does my data leave my computer?

No. The parsing and transformation logic is executed locally within your browser session. No data is sent to our servers or stored in any database.

What happens if my CSV has no header row?

The tool assumes the first row contains labels for the JSON keys. If headers are missing, the resulting objects will use the first row of data as keys, so it is best to provide a header-first file.

Is there a limit on file size?

The limit depends on your browser's available memory. For datasets exceeding several hundred megabytes, we recommend using a dedicated local script to avoid browser tab crashes.

People also search for

Related tools

More in this room.

See all in Formatters & Code