What Is a Regular Expression?
A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. Regular expressions are used for searching, matching, validating, and transforming text. They are supported in virtually every programming language and many text editors.
Despite their cryptic appearance, regular expressions follow consistent rules. Once you learn the syntax, you can apply it across Python, JavaScript, Java, Perl, Ruby, and dozens of other languages.
Basic Regex Syntax
Literal Characters
Most characters match themselves: cat matches the string "cat".
Metacharacters
Special characters have special meaning and must be escaped with backslash to match literally:
. ^ $ * + ? { } [ ] \\ | ( )
The Dot: Any Character
. matches any single character except newline:
c.t matches: cat, cot, cut, c4t, c.t
doesn't match: cart (two chars between c and t)
Character Classes
[abc] matches any single character a, b, or c:
[aeiou] matches any vowel
[a-z] matches any lowercase letter
[A-Za-z0-9] matches any alphanumeric character
[^aeiou] matches any non-vowel (^ negates inside [])
Quantifiers
Control how many times the preceding element must match:
| Quantifier | Meaning |
|---|---|
* |
Zero or more times |
+ |
One or more times |
? |
Zero or one time (optional) |
{3} |
Exactly 3 times |
{2,5} |
Between 2 and 5 times |
{3,} |
3 or more times |
Shorthand Character Classes
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[a-zA-Z0-9_] |
Any word character |
\W |
[^a-zA-Z0-9_] |
Any non-word character |
\s |
[ \t\n\r\f\v] |
Any whitespace |
\S |
[^ \t\n\r\f\v] |
Any non-whitespace |
Anchors
| Anchor | Meaning |
|---|---|
^ |
Start of string (or line with multiline flag) |
$ |
End of string (or line with multiline flag) |
\b |
Word boundary |
\B |
Non-word boundary |
Groups and Alternation
(abc) captures the matched text in a group:
(\d{4})-(\d{2})-(\d{2}) matches and captures a date like 2025-06-15
| provides alternation (OR):
cat|dog matches "cat" or "dog"
(cat|dog)s matches "cats" or "dogs"
Practical Regex Examples
Email validation (simplified):
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
URL detection:
https?://[^\s<>"{}|\\^`\[\]]+
IP address (basic):
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
Phone number (US):
\+?1?\s?(\d{3}[\s.-]?)?\d{3}[\s.-]?\d{4}
Date (YYYY-MM-DD):
\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
Hex color code:
#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})
Password (8+ chars, uppercase, lowercase, digit):
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$
Lookahead and Lookbehind
Lookahead matches a position where a pattern follows (without including it):
\d+(?= dollars) matches "100" in "100 dollars" but not "100 euros"
Negative lookahead matches where a pattern does NOT follow:
\b\w+\b(?! are) matches words not followed by " are"
Lookbehind matches a position preceded by a pattern:
(?<=\$)\d+ matches "100" in "$100"
Regex Flags
| Flag | JavaScript | Python | Meaning |
|---|---|---|---|
| Global | g |
(not needed) | Find all matches, not just first |
| Case insensitive | i |
re.IGNORECASE |
Match regardless of case |
| Multiline | m |
re.MULTILINE |
^ and $ match line boundaries |
| Dotall | s |
re.DOTALL |
. matches newlines too |
Using This Tool
Enter your regular expression and test string to see which parts match, highlighted in real time. Supports JavaScript regex syntax with flags (g, i, m, s). Great for building and debugging patterns before using them in code.
-> Try the Regex Tester