Understanding audio trimming
Two timestamps, one cut.
Why cutting an MP3 cleanly is harder than it looks, the difference between stream-copy and re-encode, and the click that announces every careless trim.
Trim is "extract a window".
A trim operation reads the input audio, skips ahead to a start timestamp, copies samples to an end timestamp, writes the result. For uncompressed WAV that's literally byte-level copying. For compressed formats — MP3, AAC, Opus — the encoded data is organised in frames, and the start of a frame is the only legal cut point. Slice mid-frame and the result either contains an unrenderable partial frame or starts with a discontinuity that audibly clicks.
Stream-copy vs re-encode.
Two ways to do the cut. Stream-copy (FFmpeg's -c copy): no decode, no re-encode, just write the same encoded frames into the new file. Fast (a few milliseconds per minute of audio), no quality loss. Catch: the cut points snap to the nearest frame boundary, which can be 20-40ms off the requested timestamp. Re- encode: decode to PCM, cut exactly, re-encode. Slower (real-time-ish), one generation of lossy degradation, sample-accurate cut points.
The click problem.
If the audio waveform at the cut point isn't at zero amplitude, the result has a discontinuity — the speaker driver has to jump instantly between two amplitude levels, which the ear hears as a "click". The fix is a short fade-in / fade-out (5-20ms) at each cut point. Smooths the transition into silence. Most trim tools do this by default; ones that don't produce clicky cuts that listeners notice.
A worked trim.
Trim a 60-minute podcast to seconds 30-1800 (the actual interview, minus intro and outro). With stream-copy: ffmpeg -ss 30 -t 1770 -i podcast.mp3 -c copy out.mp3 — takes 0.5 seconds, lossless, cuts snap to nearest frame (~20ms off). With re- encode: same flags but a real encoder spec; ~30 seconds, sample-accurate cuts, one generation of lossy-encode quality loss. For a podcast, stream-copy is almost always the right call.
Stream-copy
-c copy → no decode, frame-aligned cuts
Fast, lossless, slightly imprecise endpoints.
ffmpeg -ss 30 -t 1770 -i in.mp3 -c copy out.mp3
= 0.5s runtime, ~20ms imprecision
Re-encode
Decode → trim → re-encode
Slow, sample-accurate, one generation of quality loss.
ffmpeg -i in.mp3 -ss 30 -t 1770 -b:a 192k out.mp3
= ~30s runtime, exact cuts
Multiple cuts in one pass.
Removing three commercial breaks from a recording means three trim ranges. Doing this as three separate trims and concatenating works, but is fiddly — and FFmpeg's concat demuxer has its own quirks (formats must match exactly, timestamps don't always merge cleanly). The cleaner approach is a single filter graph that picks the desired segments and concatenates them in one operation. For interactive tools, the UI usually exposes "select region" with multiple selections and handles the plumbing.
Lossless formats trim cleanly.
FLAC and WAV trim with sample accuracy and no quality cost — the source is uncompressed (WAV) or losslessly compressed (FLAC), so cutting and writing produces bit-exact output. If you're going to do extensive editing, work from a lossless source if possible and encode to lossy only at the export step. Editing in lossy formats compounds quality loss with every save.