Regex for Developers: A Practical Guide with Examples
Regular expressions are one of those skills that every developer needs but few take the time to learn properly. You copy a regex from Stack Overflow, it works, and you move on — until the day it doesn't, and you're staring at a wall of symbols with no idea where to start debugging.
This guide builds your regex knowledge from the ground up with practical, real-world examples you can use immediately.
The Building Blocks
At its core, a regex is just a pattern that describes text. The simplest regex is a literal string:
hello matches "hello" in "say hello world"
But regex becomes powerful when you add special characters called metacharacters:
| Character | Meaning | Example |
|---|---|---|
. | Any single character (except newline) | h.t matches "hat", "hot", "h9t" |
^ | Start of string | ^Hello matches "Hello world" but not "Say Hello" |
$ | End of string | world$ matches "hello world" but not "world cup" |
\ | Escape a metacharacter | \. matches a literal dot |
| | OR (alternation) | cat|dog matches "cat" or "dog" |
Character Classes
Character classes let you match one character from a set. They use square brackets:
[aeiou] any vowel
[0-9] any digit
[a-zA-Z] any letter (upper or lower)
[^0-9] any character that is NOT a digit
Notice the ^ inside square brackets means "NOT" — the opposite of its meaning outside brackets.
Shorthand Classes
Regex provides shortcuts for common character classes:
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\D | [^0-9] | Any non-digit |
\w | [a-zA-Z0-9_] | Any word character |
\W | [^a-zA-Z0-9_] | Any non-word character |
\s | [ \t\n\r\f] | Any whitespace |
\S | [^ \t\n\r\f] | Any non-whitespace |
Tip: Uppercase shorthand classes are always the inverse of their lowercase version. \d matches digits, \D matches everything else.
Quantifiers: How Many?
Quantifiers specify how many times the preceding element should repeat:
| Quantifier | Meaning | Example |
|---|---|---|
* | 0 or more | ab*c matches "ac", "abc", "abbc" |
+ | 1 or more | ab+c matches "abc", "abbc" but not "ac" |
? | 0 or 1 (optional) | colou?r matches "color" and "colour" |
{3} | Exactly 3 | \d{3} matches "123" but not "12" |
{2,5} | Between 2 and 5 | \d{2,5} matches "12", "123", "12345" |
{3,} | 3 or more | \d{3,} matches "123", "1234", etc. |
Greedy vs Lazy
By default, quantifiers are greedy — they match as much text as possible. Add a ? to make them lazy (match as little as possible):
# Input: <b>bold</b> and <b>more</b>
<b>.*</b> greedy: matches "<b>bold</b> and <b>more</b>"
<b>.*?</b> lazy: matches "<b>bold</b>" (stops at first </b>)
Common mistake: Using .* when you mean .*?. Greedy matching with .* is the #1 cause of regex patterns matching more text than expected. When in doubt, use the lazy version.
Groups and Capturing
Parentheses () create groups. Groups serve two purposes: they let you apply quantifiers to multi-character sequences, and they capture the matched text for extraction.
# Group + quantifier
(ha)+ matches "ha", "haha", "hahaha"
# Capturing groups (numbered left to right)
(\d{4})-(\d{2})-(\d{2})
# Input: "2026-03-05"
# Group 1: "2026"
# Group 2: "03"
# Group 3: "05"
Non-Capturing Groups
If you need grouping but don't need to capture, use (?:...):
# Non-capturing group (just for alternation)
(?:https?|ftp)://\S+
# Same matching behaviour, but no capture overhead
Named Groups
For readability, name your captures with (?<name>...):
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Most languages let you access named groups by name instead of index, making your code much clearer.
Anchors and Boundaries
Anchors don't match characters — they match positions in the string:
| Anchor | Position |
|---|---|
^ | Start of string (or line with multiline flag) |
$ | End of string (or line with multiline flag) |
\b | Word boundary (between \w and \W) |
\B | Non-word boundary |
Word boundaries are incredibly useful for matching whole words:
\bcat\b matches "cat" but not "concatenate" or "scatter"
\berror\b matches "error" but not "errors" or "terror"
Lookahead and Lookbehind
Lookarounds let you match based on what comes before or after, without including it in the match:
| Syntax | Name | Meaning |
|---|---|---|
(?=...) | Positive lookahead | Followed by ... |
(?!...) | Negative lookahead | NOT followed by ... |
(?<=...) | Positive lookbehind | Preceded by ... |
(?<!...) | Negative lookbehind | NOT preceded by ... |
# Match "USD" only when followed by a number
USD(?=\d) matches "USD" in "USD100" but not "USD only"
# Match a number NOT preceded by a minus sign
(?<!-)\b\d+\b matches "42" but not "-42"
# Password validation: at least one digit and one uppercase
^(?=.*\d)(?=.*[A-Z]).{8,}$
Tip: Lookaheads and lookbehind are "zero-width" — they check a condition without consuming characters. This means multiple lookaheads can be stacked at the same position, which is why they're perfect for password validation rules.
Real-World Patterns
Here are battle-tested patterns you'll actually use:
Email (simplified)
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
This covers the vast majority of real email addresses. A fully RFC-compliant email regex is thousands of characters long and rarely necessary.
IPv4 Address
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
Note: this matches the format but doesn't validate the range (0–255). For strict validation, you'd need alternation or post-match checks.
ISO Date (YYYY-MM-DD)
\b\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\b
TODO/FIXME Comments
(?://|#)\s*(?:TODO|FIXME|HACK|XXX)\b.*
Semantic Version
\bv?\d+\.\d+\.\d+(?:-[a-zA-Z0-9.]+)?\b
Matches versions like 1.2.3, v2.0.0, and 1.0.0-beta.1.
Common Pitfalls
- Forgetting to escape dots.
.matches any character. To match a literal dot, use\.. The pattern192.168.1.1also matches192x168y1z1. - Catastrophic backtracking. Patterns like
(a+)+bcan take exponential time on inputs likeaaaaaaaac. Avoid nested quantifiers on the same characters. - Anchoring. Without
^and$, your pattern matches substrings.\d{3}matches inside"12345"(it finds"123"). Use^\d{3}$for exact matches. - Multiline mode. By default,
^and$match start/end of the entire string. Enable multiline mode (mflag) to match line boundaries. - Overcomplicating. If you can solve the problem with simple string methods (
contains,split,startsWith), do that instead. Regex is powerful but less readable.
Quick Reference
| Pattern | Description |
|---|---|
. | Any character except newline |
\d / \D | Digit / non-digit |
\w / \W | Word char / non-word char |
\s / \S | Whitespace / non-whitespace |
[abc] | Character class (a, b, or c) |
[^abc] | Negated class (not a, b, or c) |
* / + / ? | 0+, 1+, 0 or 1 |
{n,m} | Between n and m times |
*? / +? | Lazy (minimal) versions |
(group) | Capturing group |
(?:group) | Non-capturing group |
\b | Word boundary |
(?=...) / (?!...) | Lookahead / negative lookahead |
(?<=...) / (?<!...) | Lookbehind / negative lookbehind |
Test Patterns in Real Time
BoltKit's RegexLab tool lets you write regex patterns and test them against sample text with live highlighting, group extraction, and a library of common patterns. All on your iPhone or iPad.
Get BoltKit Free