Understanding subtitle formats
SRT and VTT, side by side.
The two subtitle formats you'll meet, the bytes that differ between them, and the conversion that's mostly a regex.
SRT — the universal classic.
SubRip Text (SRT) is the de facto subtitle format. Each cue is a number, a timestamp range in HH:MM:SS,ms format (note the comma before milliseconds), one or more text lines, then a blank line. No styling, no positioning, no colours — just timed text. Almost every video player on every platform reads SRT; it's the file you'll get from any transcription service.
WebVTT — the web-native cousin.
WebVTT (.vtt) is the W3C standard for HTML video subtitles. Almost identical to SRT, with three differences. A required WEBVTT header at the top. Timestamps use a period before milliseconds (00:00:01.500) instead of SRT's comma. Cue identifiers (the number above each cue in SRT) are optional rather than required. WebVTT also supports inline styling, positioning, and chapter markers — the rich-features option.
A worked conversion.
SRT input: 1
00:00:01,500 --> 00:00:04,000
Hello, world.
2
00:00:05,000 --> 00:00:08,500
A second line. Same content as VTT: WEBVTT
00:00:01.500 --> 00:00:04.000
Hello, world.
00:00:05.000 --> 00:00:08.500
A second line. Add the header, drop the cue numbers, change comma to period in timestamps. That's the whole conversion.
SRT → VTT
Add header ; comma → period ; drop cue numbers
Three regex substitutions cover 99 % of files.
/^(\\d+:\\d+:\\d+),(\\d+)$/ → "$1.$2"
= Browser-ready subtitle file
Encoding and the BOM.
SRT files in the wild are often Windows-1252 (the historical default of the tool that wrote them) rather than UTF-8. Loading them in a UTF-8 reader produces mojibake on accented characters. Modern players auto-detect; some don't. A conversion tool should detect and convert encoding to UTF-8 as part of the process. WebVTT requires UTF-8 by spec, so SRT-to-VTT is also an encoding-normalisation step.
Other formats worth knowing.
SSA / ASS (Advanced SubStation Alpha) is the format anime fans use — supports fonts, positioning, animations, karaoke effects. Used heavily in fan-subtitled content. SBV is YouTube's older format, simpler than SRT. TTML is the W3C rich-text format used by professional broadcast workflows. For most video on the web, SRT or VTT covers the case; ASS is only relevant if you're doing styled subtitles.
What conversion can't fix.
A conversion preserves what's in the source. Bad timestamps stay bad; bad line breaks stay bad; bad spelling stays bad. For subtitles that need quality work (timing alignment to dialogue, sensible line breaks at 32-40 chars, no-more-than- two-lines rule), use a subtitle editor like Aegisub. The conversion tool is for changing format, not for editing content.