Understanding EPUB validation
epubcheck — the gate every distributor runs.
What the validator checks, the common errors and what they mean, and the difference between "passes epubcheck" and "actually reads well".
What an EPUB has to be.
An EPUB is a ZIP file with a strict structure: mimetype first, uncompressed;META-INF/container.xml pointing at the OPF; an OPF manifest listing every file plus reading order; HTML content (XHTML, not HTML5 quirks mode); a nav.xhtml for table of contents; optional metadata, CSS, images, fonts. Every file listed in the OPF must exist; every link in the HTML must resolve; every image must load. Validation enforces all of that.
epubcheck — the reference validator.
The W3C/IDPF tool that every store uses. Run it on an .epub file and it reports errors (must fix), warnings (likely problems), and usage (style hints). Apple, Kobo, Amazon, Google Play Books all require epubcheck-clean before accepting a submission. Errors disqualify; warnings sometimes do too depending on the store.
Common errors.
OPF-014: file in the ZIP not listed in the OPF manifest. RSC-005: external resource (image, stylesheet) referenced but missing. NCX-001: invalid table-of-contents structure. HTM-009: HTML with deprecated tags or unclosed elements. CSS-008: unsupported CSS feature. The validator output cites the file and line; fixes are usually one-line edits.
A worked check.
Author runs epubcheck mybook.epub: 3 errors, 2 warnings. Error 1: image path in chapter 5 has uppercase extension .JPG but the file is.jpg (case-sensitive ZIP). Error 2: stray & in HTML — should be &. Error 3: nav.xhtml has a link to chapter 8 but the file doesn't exist. Fix all three, re-run, clean. The validator caught all of these; a human reading the book would have hit the broken link only on chapter 8.
3 errors → 0
missing image + bad escape + broken nav
Run validator, fix line-by-line.
3 errors, 2 warnings → all clear
= Ready for distribution
Validation isn't reading quality.
epubcheck only sees mechanical correctness. It doesn't catch: bad pagination, orphaned headings, inconsistent typography, ugly cover image, missing alt text on images, broken footnotes, poorly-tagged TOC. A book that passes epubcheck can still read terribly. The validator is a necessary check, not a sufficient one — render the EPUB on a real device, read three random chapters, then ship.
EPUB 2 vs EPUB 3.
EPUB 2 (2007): XHTML, navigation via NCX. EPUB 3 (2011, updated 2017): HTML5 features, MathML, SVG, audio/video, ARIA, accessibility metadata, fixed-layout for comics and children's books. Most modern readers support both; some legacy e-readers only do 2. epubcheck validates against either; choose 3 unless you need legacy compatibility.