Skip to content

Files & Media

Audio Waveform Visualizer

Render any audio file as a waveform PNG.

Runs in your browser

Drop an audio file or click to browse

MP3, WAV, FLAC, OGG, M4A, AAC — anything the browser can decode.

Everything decodes locally via the Web Audio API — your file never leaves your browser.

Understanding audio waveforms

A line that tells you what's loud.

What an audio waveform image actually shows, how it gets generated, and the downsampling decisions that make it look how it does.

It's a plot of amplitude over time.

A waveform image shows audio amplitude on the vertical axis and time on the horizontal. Each pixel column represents some span of time (a few milliseconds in a zoomed-in view, multiple seconds in an overview); its height represents the peak amplitude within that span. The result is a visual outline of the sound's loudness shape — useful for spotting silences, locating loud moments, and navigating long files.

The downsampling step.

A 60-minute audio file at 44.1 kHz contains 159 million samples. A 1000-pixel-wide waveform image needs 1000 columns. Each column therefore aggregates ~159,000 samples. Three common ways to aggregate: peak (the loudest sample in the bucket — shows the envelope), RMS (the root-mean-square, perceived loudness — shows the actual energy), or hybrid (peak as outline, RMS as inner filled area). The choice changes the visual character of the result.

A worked render.

A 5-minute song rendered as a 1200×200 PNG. ffmpeg -i song.mp3 -filter_complex "showwavespic=s=1200x200:colors=#FFA500" out.png — produces an orange peak-style waveform. The track's quiet intro shows as a narrow line, the chorus fills the full height, the silence at the end drops to flat. The image is a compact navigation aid for audio editing — show users where to click.

Aggregation per pixel

159M samples / 1200 pixels ≈ 133k samples per column

Each column shows the peak (or RMS) of its sample bucket.

peak(bucket) → pixel height

= Compact visual outline

Multi-resolution waveforms.

Audio editors and DAWs let you zoom in and out — and re-rendering the waveform on every zoom is wasteful. The trick is precomputed pyramids: store peaks at multiple resolutions (1 peak per second, 1 per 100ms, 1 per 10ms), pick the right level for the current zoom. Same idea as map tiles. The total storage is roughly 2× the finest level (the geometric series of 1+1/2+1/4+... converges to 2). Audacity, Pro Tools, Audition all do this.

Why people want them.

Podcast players show waveform scrubbers so you can see where speakers pause. Audio editors use them for navigation. Music streaming services (SoundCloud) show them as a visual identity for tracks. For social-media audio clips, a waveform image is the thumbnail. The image alone carries no audio — it's purely a navigation and identity aid — but it adds enough visual context that audio- first interfaces become more usable.

Frequently asked questions

Quick answers.

What audio formats can I use?

The tool supports any format your browser can decode, including `MP3`, `WAV`, `OGG`, and `M4A`. Encrypted or proprietary formats may not render correctly.

Are my audio files uploaded?

No. The waveform is generated using your browser's Web Audio API. Your files remain on your device and are never sent to our infrastructure.

Can I customise the appearance?

Yes. You can adjust the foreground and background colours, as well as the bar width and spacing to change the density of the visualization.

What is the output format?

The tool exports the waveform as a transparent or solid background `PNG` file. This allows you to easily overlay the waveform on top of other designs.

People also search for

Related tools

More in this room.

See all in Files & Media