Skip to content

Files & Media

PDF to Word Converter

Extract PDF text into an editable .docx file.

Runs in your browser

Drop a PDF or click to browse

Works on text PDFs. Scanned (image-only) PDFs need OCR first — not part of this tool.

First page preview

The first paragraphs of the converted document will appear here.

Understanding PDF → Word

A page of glyphs becomes a stream of words.

Why PDF-to-Word is harder than it sounds, what gets preserved, and the difference between "extract text" and "reconstruct the document".

PDF is a print format.

A PDF page is a flat list of drawing instructions: "draw the glyph 'A' at coordinates (100, 700) in this font at this size". There's no concept of paragraphs, lists, headings, or reading order. A PDF can be perfectly legible on screen while being internally chaos. Converting back to Word means inferring the structural information the PDF deliberately discarded — which paragraphs are headings, which lines are list items, which floating text is a footnote.

Three layers of fidelity.

The easiest output is plain text: a stream of words in something close to reading order, no formatting. Reasonable accuracy. The middle option is text with basic formatting: bold, italic, font size, paragraphs detected from line spacing. Harder. The hardest is fully-styled .docx with tables, columns, images, footnotes, and numbered lists — needs heuristics for each structural element, and is wrong roughly a third of the time on any PDF more complex than a one-column letter.

Reading order, the hardest part.

A two-column page lists glyphs by position, not order. Without analysis, the converter reads across the columns — line 1 of column 1, line 1 of column 2, line 2 of column 1, line 2 of column 2 — producing nonsense. Good converters cluster glyphs into columns first by finding vertical gutters, then flow each column top-to-bottom. Tables make it worse: tables can sometimes be detected by their grid lines, but tables without visible borders are ambiguous.

Scanned PDFs need OCR.

A "PDF" can be (a) a real PDF whose pages contain text glyphs, or (b) a PDF that's just an embedded scanned image. The first extracts cleanly; the second extracts to nothing because there is no text — just pixels. OCR (optical character recognition, via Tesseract or a cloud OCR service) converts the image back to text, with a variable error rate. Always check whether the source PDF is searchable before assuming the conversion will work.

A worked extraction.

A two-page memo with one heading, two paragraphs, and a bulleted list. The converter groups glyphs by line (similar Y coordinates), groups lines into paragraphs (similar line spacing, no large vertical gap), and detects the heading (larger font size, bolder weight). Bullets are tricky: usually heuristic-detected by the presence of a leading "•" / "—" / digit-period at the line start. Output is a .docx with three paragraph styles (heading, body, list-item) and the text content flowed sensibly.

Detected structure

font size, weight, line spacing

Properties of the glyphs inform structural inference.

larger+bold → heading ; • prefix → list item

= Three paragraph styles

What never round-trips.

Form fields, comments, embedded fonts you don't have, locked digital signatures, encrypted content streams, annotations — all of these either drop or arrive in degraded form. The output .docx loses information; if you need to faithfully edit the document, the right tool is the original Word source (if you can get it) or a dedicated PDF editor (Acrobat Pro, Foxit) that operates on the PDF in place.

When this is the right tool.

For a PDF you need to substantially rewrite — a contract you're editing, a chapter you're translating, a memo you're updating — converting to Word and editing there is usually faster than living inside the PDF. The conversion's imperfections are fixable in five minutes; the rewrite would take an hour either way. For a PDF you just need to skim, paste, or quote from, plain-text extraction is enough.

Frequently asked questions

Quick answers.

Is my document data private?

Yes. The conversion logic runs locally in your browser session. Your file contents are never sent to our servers or stored anywhere.

Does it support images and complex layouts?

The tool prioritises text extraction and basic formatting. Complex layouts, overlapping images, or heavily styled backgrounds may require manual adjustment after conversion.

Can it convert scanned handwriting or images?

This tool extracts embedded text layers rather than perform OCR (Optical Character Recognition). If the PDF consists only of photos or scans without an invisible text layer, the resulting Word document will likely be empty.

What file format is the output?

The converter generates a standard `.docx` file, which is compatible with Microsoft Word, LibreOffice, Apple Pages, and Google Docs.

People also search for

Related tools

More in this room.

See all in Files & Media