
Introduction
Regular expressions (regex) are patterns used to match character combinations in strings. They are a powerful tool for text processing, validation, search and replace, and data extraction. This guide covers regex fundamentals, common patterns, testing strategies, and pitfalls — all with practical examples you can try in our regex tester.
Regex Fundamentals
Literals and Metacharacters
A regex pattern consists of literals (characters that match themselves) and metacharacters (special symbols with a specific meaning).
| Pattern | Matches | Example |
|---|---|---|
hello |
The literal string "hello" | hello in "say hello" |
. |
Any single character (except newline) | h.t matches "hat", "hot", "hut" |
\d |
Any digit (0-9) | \d{3} matches a 3-digit number |
\w |
Any word character (letter, digit, underscore) | \w+ matches a word |
\s |
Any whitespace (space, tab, newline) | \s matches a space |
^ |
Start of string | ^Hello matches "Hello" at the start |
$ |
End of string | world$ matches "world" at the end |
\b |
Word boundary | \bcat\b matches "cat" as a whole word |
Character Classes
Use square brackets [...] to define a set of characters to match.
[abc]matches 'a', 'b', or 'c'[a-z]matches any lowercase letter[0-9]matches any digit (same as\d)[^abc]matches any character except 'a', 'b', or 'c'
Quantifiers
Quantifiers specify how many times a character or group should appear.
| Quantifier | Meaning | Example |
|---|---|---|
* |
0 or more | ab*c matches "ac", "abc", "abbc" |
+ |
1 or more | ab+c matches "abc", "abbc" but not "ac" |
? |
0 or 1 | colou?r matches "color" and "colour" |
{n} |
Exactly n | \d{3} matches exactly 3 digits |
{n,} |
n or more | \d{2,} matches 2 or more digits |
{n,m} |
Between n and m | \d{2,4} matches 2 to 4 digits |
Groups and Alternation
- Groups:
(pattern)captures the matched substring. Use(?:pattern)for non-capturing groups. - Alternation:
|acts like OR.cat|dogmatches "cat" or "dog".
Escaping
To match a literal metacharacter, escape it with a backslash: \. matches a dot, \* matches an asterisk.
Common Patterns and Use Cases
Email Validation
A simple email regex: ^[\w.-]+@[\w.-]+\.\w{2,}$
^start of string[\w.-]+one or more word characters, dots, or hyphens (local part)@literal @[\w.-]+domain name\.literal dot\w{2,}top-level domain (at least 2 letters)$end of string
URL Extraction
Pattern to extract URLs from text: https?://[\w./?=&-]+
Password Strength
Require at least 8 characters, one uppercase, one lowercase, one digit: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$
(?=.*[a-z])positive lookahead for lowercase(?=.*[A-Z])positive lookahead for uppercase(?=.*\d)positive lookahead for digit.{8,}at least 8 characters
Removing Extra Whitespace
Replace multiple spaces with a single space: \s+ → (single space)
Worked Example: Parsing Log Lines
Suppose you have log lines like:
2025-03-21 14:23:45 ERROR User login failed: invalid password (user: jdoe)
2025-03-21 14:24:01 INFO User jdoe logged in successfully
You want to extract the timestamp, level, and message. Use the pattern:
^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)$
^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})captures the timestamp(\w+)captures the log level (ERROR, INFO, etc.)(.*)$captures the rest of the message
In JavaScript:
const log = "2025-03-21 14:23:45 ERROR User login failed: invalid password (user: jdoe)";
const regex = /^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)$/;
const match = log.match(regex);
if (match) {
console.log("Timestamp:", match[1]);
console.log("Level:", match[2]);
console.log("Message:", match[3]);
}
Output:
Timestamp: 2025-03-21 14:23:45
Level: ERROR
Message: User login failed: invalid password (user: jdoe)
Try this pattern in our regex tester to see the matches highlighted.
Testing Strategies
- Start simple: Build your pattern step by step, testing each addition.
- Use anchors: Always use
^and$when validating entire strings to avoid partial matches. - Test edge cases: Empty string, very long string, strings with special characters.
- Use a regex tester: Visual feedback helps catch errors. Our regex tester provides real-time matching and explanation.
- Consider performance: Avoid catastrophic backtracking by using possessive quantifiers or atomic groups when possible.
Common Pitfalls
- Greedy vs. lazy:
.*is greedy (matches as much as possible). Use.*?for lazy matching. - Escaping in strings: In many languages, backslashes need to be escaped (e.g.,
\din JavaScript string). - Unicode support:
\wand\dmay not match non-ASCII characters. Use Unicode property escapes like\p{L}for letters. - Lookahead/lookbehind: Not all regex engines support lookbehind; check compatibility.
- Overly complex patterns: Sometimes a simple string method is clearer and faster.
FAQ
What is the difference between greedy and lazy quantifiers?
Greedy quantifiers (*, +, {n,m}) match as much as possible. Lazy quantifiers (*?, +?, {n,m}?) match as little as possible. For example, given the string "<div>text</div>", /<.*>/ matches the entire string, while /<.*?>/ matches "<div>" only.
How do I match a literal dot?
Escape it with a backslash: \. matches a period. An unescaped dot matches any character.
What is a capturing group?
Parentheses (pattern) create a capturing group that stores the matched substring for later use. For example, /(\d+)-(\d+)/ captures two numbers separated by a hyphen.
How can I test regex performance?
Use our regex tester with large input strings to check for slowdowns. Avoid patterns that cause catastrophic backtracking, such as nested quantifiers on overlapping patterns.
Why does my regex not match newlines?
By default, the dot . does not match newline characters. Use the s flag (DOTALL) in most engines to make . match newlines.