RegexLab 7 min read

Regex for Developers: A Practical Guide with Examples

Regular expressions are one of those skills that every developer needs but few take the time to learn properly. You copy a regex from Stack Overflow, it works, and you move on — until the day it doesn't, and you're staring at a wall of symbols with no idea where to start debugging.

This guide builds your regex knowledge from the ground up with practical, real-world examples you can use immediately.

The Building Blocks

At its core, a regex is just a pattern that describes text. The simplest regex is a literal string:

hello     matches "hello" in "say hello world"

But regex becomes powerful when you add special characters called metacharacters:

Character	Meaning	Example
`.`	Any single character (except newline)	`h.t` matches "hat", "hot", "h9t"
`^`	Start of string	`^Hello` matches "Hello world" but not "Say Hello"
`$`	End of string	`world$` matches "hello world" but not "world cup"
`\`	Escape a metacharacter	`\.` matches a literal dot
`\|`	OR (alternation)	`cat\|dog` matches "cat" or "dog"

Character Classes

Character classes let you match one character from a set. They use square brackets:

[aeiou]       any vowel
[0-9]         any digit
[a-zA-Z]      any letter (upper or lower)
[^0-9]        any character that is NOT a digit

Notice the ^ inside square brackets means "NOT" — the opposite of its meaning outside brackets.

Shorthand Classes

Regex provides shortcuts for common character classes:

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f]`	Any whitespace
`\S`	`[^ \t\n\r\f]`	Any non-whitespace

Tip: Uppercase shorthand classes are always the inverse of their lowercase version. \d matches digits, \D matches everything else.

Quantifiers: How Many?

Quantifiers specify how many times the preceding element should repeat:

Quantifier	Meaning	Example
`*`	0 or more	`ab*c` matches "ac", "abc", "abbc"
`+`	1 or more	`ab+c` matches "abc", "abbc" but not "ac"
`?`	0 or 1 (optional)	`colou?r` matches "color" and "colour"
`{3}`	Exactly 3	`\d{3}` matches "123" but not "12"
`{2,5}`	Between 2 and 5	`\d{2,5}` matches "12", "123", "12345"
`{3,}`	3 or more	`\d{3,}` matches "123", "1234", etc.

Greedy vs Lazy

By default, quantifiers are greedy — they match as much text as possible. Add a ? to make them lazy (match as little as possible):

# Input: <b>bold</b> and <b>more</b>

<b>.*</b>      greedy: matches "<b>bold</b> and <b>more</b>"
<b>.*?</b>     lazy:   matches "<b>bold</b>" (stops at first </b>)

Common mistake: Using .* when you mean .*?. Greedy matching with .* is the #1 cause of regex patterns matching more text than expected. When in doubt, use the lazy version.

Groups and Capturing

Parentheses () create groups. Groups serve two purposes: they let you apply quantifiers to multi-character sequences, and they capture the matched text for extraction.

# Group + quantifier
(ha)+          matches "ha", "haha", "hahaha"

# Capturing groups (numbered left to right)
(\d{4})-(\d{2})-(\d{2})
# Input: "2026-03-05"
# Group 1: "2026"
# Group 2: "03"
# Group 3: "05"

Non-Capturing Groups

If you need grouping but don't need to capture, use (?:...):

# Non-capturing group (just for alternation)
(?:https?|ftp)://\S+

# Same matching behaviour, but no capture overhead

Named Groups

For readability, name your captures with (?<name>...):

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

Most languages let you access named groups by name instead of index, making your code much clearer.

Anchors and Boundaries

Anchors don't match characters — they match positions in the string:

Anchor	Position
`^`	Start of string (or line with multiline flag)
`$`	End of string (or line with multiline flag)
`\b`	Word boundary (between `\w` and `\W`)
`\B`	Non-word boundary

Word boundaries are incredibly useful for matching whole words:

\bcat\b        matches "cat" but not "concatenate" or "scatter"
\berror\b      matches "error" but not "errors" or "terror"

Lookahead and Lookbehind

Lookarounds let you match based on what comes before or after, without including it in the match:

Syntax	Name	Meaning
`(?=...)`	Positive lookahead	Followed by ...
`(?!...)`	Negative lookahead	NOT followed by ...
`(?<=...)`	Positive lookbehind	Preceded by ...
`(?<!...)`	Negative lookbehind	NOT preceded by ...

# Match "USD" only when followed by a number
USD(?=\d)           matches "USD" in "USD100" but not "USD only"

# Match a number NOT preceded by a minus sign
(?<!-)\b\d+\b       matches "42" but not "-42"

# Password validation: at least one digit and one uppercase
^(?=.*\d)(?=.*[A-Z]).{8,}$

Tip: Lookaheads and lookbehind are "zero-width" — they check a condition without consuming characters. This means multiple lookaheads can be stacked at the same position, which is why they're perfect for password validation rules.

Real-World Patterns

Here are battle-tested patterns you'll actually use:

Email (simplified)

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This covers the vast majority of real email addresses. A fully RFC-compliant email regex is thousands of characters long and rarely necessary.

IPv4 Address

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

Note: this matches the format but doesn't validate the range (0–255). For strict validation, you'd need alternation or post-match checks.

ISO Date (YYYY-MM-DD)

\b\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\b

TODO/FIXME Comments

(?://|#)\s*(?:TODO|FIXME|HACK|XXX)\b.*

Semantic Version

\bv?\d+\.\d+\.\d+(?:-[a-zA-Z0-9.]+)?\b

Matches versions like 1.2.3, v2.0.0, and 1.0.0-beta.1.

Common Pitfalls

Forgetting to escape dots. . matches any character. To match a literal dot, use \.. The pattern 192.168.1.1 also matches 192x168y1z1.
Catastrophic backtracking. Patterns like (a+)+b can take exponential time on inputs like aaaaaaaac. Avoid nested quantifiers on the same characters.
Anchoring. Without ^ and $, your pattern matches substrings. \d{3} matches inside "12345" (it finds "123"). Use ^\d{3}$ for exact matches.
Multiline mode. By default, ^ and $ match start/end of the entire string. Enable multiline mode (m flag) to match line boundaries.
Overcomplicating. If you can solve the problem with simple string methods (contains, split, startsWith), do that instead. Regex is powerful but less readable.

Quick Reference

Pattern	Description
`.`	Any character except newline
`\d` / `\D`	Digit / non-digit
`\w` / `\W`	Word char / non-word char
`\s` / `\S`	Whitespace / non-whitespace
`[abc]`	Character class (a, b, or c)
`[^abc]`	Negated class (not a, b, or c)
`*` / `+` / `?`	0+, 1+, 0 or 1
`{n,m}`	Between n and m times
`*?` / `+?`	Lazy (minimal) versions
`(group)`	Capturing group
`(?:group)`	Non-capturing group
`\b`	Word boundary
`(?=...)` / `(?!...)`	Lookahead / negative lookahead
`(?<=...)` / `(?<!...)`	Lookbehind / negative lookbehind

Test Patterns in Real Time

BoltKit's RegexLab tool lets you write regex patterns and test them against sample text with live highlighting, group extraction, and a library of common patterns. All on your iPhone or iPad.

Get BoltKit Free