Understanding duplicate removal
Each line, kept once.
The hard part isn't removing duplicates — it's deciding what counts as one.
Order is preserved by default.
When duplicates appear, this tool keeps the first occurrence and drops every later copy. That preserves the meaningful order of the input — important for lists where the leader matters (todo items, ranked URLs, log entries). If you'd prefer to deduplicate and sort, run sort first; the order then becomes alphabetical / numeric / by length.
What "the same line" means.
By default, two lines are the same if their contents match byte-for-byte. "Apple" and "apple" are different. "apple" with a trailing space is different from "apple" without. The case-insensitive option folds those together; the trim-whitespace option ignores leading and trailing spaces. Both can be on at once for the loosest match.
Empty lines and blank-line groups.
Several blank lines in a row collapse to a single blank line by default — most users want this. Truly preserving empty lines is rare for this kind of cleanup but possible to do externally if the use case requires it.
When the duplicates aren't really duplicates.
Visually identical strings can be technically different — one might use a non-breaking space (U+00A0) instead of a regular space, or use Unicode normal form C versus form D (the difference between café as one composed character vs. two combining ones). The tool sees those as different. To fix it, normalise your input first (NFC / NFD) and strip non-breaking spaces.