Skip to content

Formatters & Code

CSV Statistical Analyzer

Mean, median, std, quartiles per numeric column.

Runs in your browser

id

integer

5 rows · 0 nulls · 5 unique

min: 1.00
max: 5.00
mean: 3.00
median: 3.00
stdDev: 1.41
Q1/Q3: 2.00 / 4.00

name

string

5 rows · 0 nulls · 5 unique

  • Ada1
  • Grace1
  • Alan1
  • Linus1
  • Margaret1

email

string

5 rows · 0 nulls · 1 unique

  • [email protected]5

age

integer

5 rows · 0 nulls · 5 unique

min: 36.00
max: 87.00
mean: 60.80
median: 55.00
stdDev: 21.51
Q1/Q3: 41.00 / 85.00

active

boolean

5 rows · 0 nulls · 2 unique

  • true4
  • false1

city

string

5 rows · 0 nulls · 5 unique

  • London1
  • New York1
  • Cambridge1
  • Helsinki1
  • Boston1

Understanding CSV statistics

Summary stats — the first pass on every new dataset.

The five-number summary, why mean lies about skewed data, and the column-by-column ritual that catches data-quality problems early.

The five-number summary.

For any numeric column: minimum, first quartile (Q1, the 25th percentile), median (Q2, the 50th), third quartile (Q3, the 75th), maximum. These five describe the distribution without assuming shape — equally useful for normal data and for long-tailed counts. The boxplot is a graphical summary of these five. Most "what does this column look like?" questions are answered by the five numbers.

IQR = Q3 − Q1 ; outliers if x < Q1 − 1.5·IQR or x > Q3 + 1.5·IQR

Mean vs median.

For symmetric distributions, mean and median agree. For skewed distributions (income, sales, web-traffic, anything with a long right tail), the mean is pulled toward the tail and overestimates the typical value. The median is robust to outliers. Always look at both: when they disagree, the median is usually more honest. Reporting "average customer spend" using the mean for a distribution with a few whale customers is a classic misleading statistic.

Spread — std-dev vs IQR.

Standard deviation measures spread around the mean — useful for normal-ish data. For skewed or heavy-tailed data, std-dev exaggerates spread. The IQR (Q3 − Q1) is a robust alternative: it captures the middle 50 % of values regardless of tails. A dataset's std-dev being much larger than 1.5 × IQR is a sign the distribution isn't normal and analysis should use the median + IQR pair.

What to report for every column.

Numeric columns: count, min, Q1, median, mean, Q3, max, std-dev, count of NaN/empty. Categorical columns: unique count, top 5 values + frequencies, count of empty. Datetime columns: min, max, count of empty, frequency by month/year. The data summary should fit on one screen per column; reading it is the first sanity check before any analysis.

A worked profile.

A 100k-row sales CSV with 12 columns. Profile run: order_total mean $87, median $42 — strongly right-skewed (some whale orders). customer_idhas 8 NaN values (data-entry bug). order_date ranges 2024-01 to 2026-05, with a suspicious gap in Feb 2025 (system outage?). regionhas 5 unique values, one mis-spelled ("Eurpoe" appears 12 times). Three data quality issues found in 30 seconds; without the profile, they'd surface as wrong numbers in the dashboard a week later.

100k-row sales profile

12 columns × 8 stats

Run profile, scan for outliers/NaN/typos.

mean $87 vs median $42 ; 8 NaN ; 1 typo

= 3 issues caught early

vs full data exploration.

The summary is the first 5 minutes. Real data exploration adds: pairwise correlations, distributions plotted as histograms or density curves, scatter matrix for relationships, time-series plots for temporal columns, group-by aggregations for categorical interactions. Tools like pandas-profiling and ydata-profiling auto-generate all of that as an HTML report. The CSV summary stats are the cheap first pass; the report is the second.

Frequently asked questions

Quick answers.

What statistical measures are included?

The tool calculates count, sum, mean, median, mode, variance, standard deviation, and the five-number summary (min, Q1, median, Q3, max).

Does my CSV file leave my computer?

No. The analyzer uses the File API to read your document directly in the browser; the data is processed in memory and never transmitted to our servers.

How does it handle non-numeric values?

It identifies columns containing numbers and ignores any cells with text or symbols. If a column is entirely non-numeric, it is excluded from the statistical summary.

Is there a file size limit?

The limit depends on your browser's available memory. Most modern browsers can handle CSV files up to several hundred megabytes without difficulty.

People also search for

Related tools

More in this room.

See all in Formatters & Code