Understanding the regex toolbox
The hundred patterns you keep copying.
Why the famous email pattern is wrong, why the popular UUID pattern misses things, and why anchoring matters more than cleverness.
Patterns travel through Stack Overflow.
Almost every regex you'll find in a production codebase started life as a Stack Overflow snippet pasted in by someone who had a job to do and no time to fully validate it. That isn't a bug — patterns are exactly the kind of dense, fiddly knowledge that benefits from communal authorship. But the snippets that travel furthest are usually the ones that were "good enough for the example", not the ones that were right.
The "email" pattern, examined.
The pattern /^\w+@\w+\.\w+$/ appears in tens of thousands of forms. It fails on every email containing a dot in the local-part (firstname.lastname), every domain with multiple dots (mail.example.co.uk), every hyphenated domain, every plus-addressed inbox (alice+spam@example.com), and every internationalised domain. A fully RFC 5322-compliant email regex exists, runs to several hundred characters, and is still wrong because RFC 5322 permits structures that no real mail server accepts. The pragmatic answer: /^[^@\s]+@[^@\s]+\.[^@\s]+$/ is fine for client-side hint validation, and only the SMTP exchange can actually verify the address.
/^[^@\s]+@[^@\s]+\.[^@\s]+$/
UUIDs — versioned, not arbitrary.
A correct UUID v4 pattern checks for the version digit (4) and the variant nibble (the 14th character must be 8, 9, a or b). The everywhere-copied /^[0-9a-f-]{36}$/i matches plenty of strings that aren't valid v4 UUIDs. The right pattern is /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i. For v7, the third group starts with 7 and the variant nibble is the same; for v1, the third group starts with 1 and the timestamp is meaningful.
Anchoring is the most-skipped step.
A surprising portion of "this regex matches too much" issues is patterns missing ^ and $. /[a-z]+/ matches the first lowercase run inside the input. Anchored — /^[a-z]+$/ — it only matches strings that are nothing but lowercase letters. The default in most flavours is to find a match anywhere; assertions are how you say "the whole string has to look like this".
Greedy vs lazy, with a worked example.
The pattern /<.+>/ against <b>hi</b> matches <b>hi</b> — the whole string — because the .+ is greedy and eats everything up to the last > it can find. Switch to /<.+?>/ with a lazy quantifier and it matches <b>, the smallest valid match. Lazy quantifiers are the right default for HTML-like fragment extraction; greedy quantifiers are the right default for "everything up to the last separator".
Greedy
/<.+>/ against <b>hi</b>
.+ eats until the last > on the line.
match group = <b>hi</b>
= <b>hi</b>
Lazy
/<.+?>/ against <b>hi</b>
.+? stops at the first > it finds.
match group = <b>
= <b>
Catastrophic backtracking.
/^(a+)+$/ matching against aaaaaaaaaaaaaaaaab — eighteen a's and a b — takes a regex engine roughly 2¹⁸ steps before reporting "no match", because the inner and outer plus signs can split the run in exponentially many ways. This is the classic "regex denial-of-service" — a malicious user-supplied string that freezes a server. Modern engines like Rust's regex crate sidestep it with a different algorithm; PCRE/Perl/Python backtracking engines remain vulnerable. Avoid nested quantifiers that can match the same characters two ways; if you need to handle adversarial input, use a non-backtracking engine.
Unicode flags and grapheme clusters.
/\w/ in many flavours means [A-Za-z0-9_] by default — ASCII-only. The Unicode flag (/u in JavaScript, re.UNICODE in Python) extends \w to every letter in every script, and \p{L} matches "any letter character". Emoji and grapheme clusters still don't have native quantifier support in most engines; pattern matching on user-perceived characters needs explicit help.
Named groups and back-references.
(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2}) gives you readable extraction without counting bracket positions. Almost every modern regex flavour supports named groups (the syntax varies between (?<name> and (?P<name>); JavaScript, .NET, Python and PCRE all do). Back- references — \1 for the first group, \k<name> for a named one — let you match repeated content like (.)(.)\2\1 against "abba".
The playground's role.
The patterns library here is curated, not crawled. Each entry is annotated with what it matches, what it deliberately doesn't, and the smallest counterexample we could find that breaks it. Treat them as starting points: copy, paste, modify, then run against the failing case you actually have. The fastest way to write a correct regex is to start with one that's nearly right and adjust until the live tester confirms it.