正在加载,请稍候…

Regex Tester: From Basics to Advanced Patterns

A practical guide to regular expressions: fundamentals, common patterns, testing strategies, and pitfalls. Use our regex tester tool to experiment.

Regex Tester: From Basics to Advanced Patterns

Introduction

Regular expressions (regex) are patterns used to match character combinations in strings. They are a powerful tool for text processing, validation, search and replace, and data extraction. This guide covers regex fundamentals, common patterns, testing strategies, and pitfalls — all with practical examples you can try in our regex tester.

Regex Fundamentals

Literals and Metacharacters

A regex pattern consists of literals (characters that match themselves) and metacharacters (special symbols with a specific meaning).

Pattern Matches Example
hello The literal string "hello" hello in "say hello"
. Any single character (except newline) h.t matches "hat", "hot", "hut"
\d Any digit (0-9) \d{3} matches a 3-digit number
\w Any word character (letter, digit, underscore) \w+ matches a word
\s Any whitespace (space, tab, newline) \s matches a space
^ Start of string ^Hello matches "Hello" at the start
$ End of string world$ matches "world" at the end
\b Word boundary \bcat\b matches "cat" as a whole word

Character Classes

Use square brackets [...] to define a set of characters to match.

  • [abc] matches 'a', 'b', or 'c'
  • [a-z] matches any lowercase letter
  • [0-9] matches any digit (same as \d)
  • [^abc] matches any character except 'a', 'b', or 'c'

Quantifiers

Quantifiers specify how many times a character or group should appear.

Quantifier Meaning Example
* 0 or more ab*c matches "ac", "abc", "abbc"
+ 1 or more ab+c matches "abc", "abbc" but not "ac"
? 0 or 1 colou?r matches "color" and "colour"
{n} Exactly n \d{3} matches exactly 3 digits
{n,} n or more \d{2,} matches 2 or more digits
{n,m} Between n and m \d{2,4} matches 2 to 4 digits

Groups and Alternation

  • Groups: (pattern) captures the matched substring. Use (?:pattern) for non-capturing groups.
  • Alternation: | acts like OR. cat|dog matches "cat" or "dog".

Escaping

To match a literal metacharacter, escape it with a backslash: \. matches a dot, \* matches an asterisk.

Common Patterns and Use Cases

Email Validation

A simple email regex: ^[\w.-]+@[\w.-]+\.\w{2,}$

  • ^ start of string
  • [\w.-]+ one or more word characters, dots, or hyphens (local part)
  • @ literal @
  • [\w.-]+ domain name
  • \. literal dot
  • \w{2,} top-level domain (at least 2 letters)
  • $ end of string

URL Extraction

Pattern to extract URLs from text: https?://[\w./?=&-]+

Password Strength

Require at least 8 characters, one uppercase, one lowercase, one digit: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$

  • (?=.*[a-z]) positive lookahead for lowercase
  • (?=.*[A-Z]) positive lookahead for uppercase
  • (?=.*\d) positive lookahead for digit
  • .{8,} at least 8 characters

Removing Extra Whitespace

Replace multiple spaces with a single space: \s+ (single space)

Worked Example: Parsing Log Lines

Suppose you have log lines like:

2025-03-21 14:23:45 ERROR User login failed: invalid password (user: jdoe)
2025-03-21 14:24:01 INFO User jdoe logged in successfully

You want to extract the timestamp, level, and message. Use the pattern:

^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)$

  • ^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) captures the timestamp
  • (\w+) captures the log level (ERROR, INFO, etc.)
  • (.*)$ captures the rest of the message

In JavaScript:

const log = "2025-03-21 14:23:45 ERROR User login failed: invalid password (user: jdoe)";
const regex = /^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (.*)$/;
const match = log.match(regex);
if (match) {
    console.log("Timestamp:", match[1]);
    console.log("Level:", match[2]);
    console.log("Message:", match[3]);
}

Output:

Timestamp: 2025-03-21 14:23:45
Level: ERROR
Message: User login failed: invalid password (user: jdoe)

Try this pattern in our regex tester to see the matches highlighted.

Testing Strategies

  1. Start simple: Build your pattern step by step, testing each addition.
  2. Use anchors: Always use ^ and $ when validating entire strings to avoid partial matches.
  3. Test edge cases: Empty string, very long string, strings with special characters.
  4. Use a regex tester: Visual feedback helps catch errors. Our regex tester provides real-time matching and explanation.
  5. Consider performance: Avoid catastrophic backtracking by using possessive quantifiers or atomic groups when possible.

Common Pitfalls

  • Greedy vs. lazy: .* is greedy (matches as much as possible). Use .*? for lazy matching.
  • Escaping in strings: In many languages, backslashes need to be escaped (e.g., \d in JavaScript string).
  • Unicode support: \w and \d may not match non-ASCII characters. Use Unicode property escapes like \p{L} for letters.
  • Lookahead/lookbehind: Not all regex engines support lookbehind; check compatibility.
  • Overly complex patterns: Sometimes a simple string method is clearer and faster.

FAQ

What is the difference between greedy and lazy quantifiers?

Greedy quantifiers (*, +, {n,m}) match as much as possible. Lazy quantifiers (*?, +?, {n,m}?) match as little as possible. For example, given the string "<div>text</div>", /<.*>/ matches the entire string, while /<.*?>/ matches "<div>" only.

How do I match a literal dot?

Escape it with a backslash: \. matches a period. An unescaped dot matches any character.

What is a capturing group?

Parentheses (pattern) create a capturing group that stores the matched substring for later use. For example, /(\d+)-(\d+)/ captures two numbers separated by a hyphen.

How can I test regex performance?

Use our regex tester with large input strings to check for slowdowns. Avoid patterns that cause catastrophic backtracking, such as nested quantifiers on overlapping patterns.

Why does my regex not match newlines?

By default, the dot . does not match newline characters. Use the s flag (DOTALL) in most engines to make . match newlines.