How to Use Regular Expressions: A Beginner's Guide
Regular expressions — commonly known as regex or regexp — are one of the most powerful tools in a programmer's arsenal. They let you search, match, and manipulate text using patterns instead of literal strings. Whether you're validating email addresses, parsing log files, or doing find-and-replace operations, regex can save you hours of manual work. This guide will take you from zero to confident with regular expressions.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. Think of it as a mini programming language specifically designed for text matching. Regex is supported in virtually every programming language (JavaScript, Python, Java, PHP, Go, Ruby) and many text editors (VS Code, Sublime Text, Vim).
At its simplest, a regex can be a literal string. The pattern hello matches the text "hello" wherever it appears. But the real power comes from special characters — called metacharacters — that let you express complex patterns concisely.
Basic Regex Syntax
Literal Characters
Most characters match themselves. The pattern cat matches "cat" in "concatenate", "catalog", or "the cat sat".
The Dot (.)
The dot matches any single character except a newline. So c.t matches "cat", "cot", "cut", "c9t", and even "c t".
Character Classes [ ]
Square brackets define a set of characters to match. [aeiou] matches any single vowel. [0-9] matches any digit. [a-zA-Z] matches any letter. You can negate a class with a caret: [^0-9] matches any non-digit character.
Predefined Character Classes
\d— any digit (same as[0-9])\D— any non-digit\w— any word character (letters, digits, underscore)\W— any non-word character\s— any whitespace (space, tab, newline)\S— any non-whitespace character
Quantifiers: How Many Times?
Quantifiers specify how many times the preceding element should appear:
*— zero or more times+— one or more times?— zero or one time (makes it optional){3}— exactly 3 times{2,5}— between 2 and 5 times{3,}— 3 or more times
For example, \d{3}-\d{4} matches phone numbers like "555-1234", and colou?r matches both "color" and "colour".
Anchors: Where in the String?
Anchors don't match characters — they match positions:
^— start of the string (or line in multiline mode)$— end of the string (or line)\b— word boundary
The pattern ^\d{5}$ matches a string that is exactly a 5-digit number. \bcat\b matches "cat" as a whole word but not "catalog".
Groups and Alternation
Parentheses ( )
Parentheses create capture groups that let you extract parts of a match or apply quantifiers to a group. For example, (ha)+ matches "ha", "haha", "hahaha", etc.
The Pipe |
The pipe acts as an OR operator. cat|dog matches either "cat" or "dog". Combined with groups: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day matches any day of the week.
Practical Regex Examples
Email Validation (Simplified)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches most standard email formats. Note that truly RFC-compliant email validation via regex is notoriously complex.
URL Matching
https?://[^\s/$.?#].[^\s]*
Matches URLs starting with http:// or https://.
Phone Number (US Format)
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
Matches formats like (555) 123-4567, 555-123-4567, or 555.123.4567.
IP Address
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
Matches IPv4 addresses (though it doesn't validate that each octet is 0-255).
HTML Tags
<[^>]+>
Matches HTML tags. However, for real HTML parsing, always use a proper parser — regex and HTML are a famously poor combination for complex documents.
Lookaheads and Lookbehinds
These are advanced features that let you match based on what comes before or after your pattern, without including it in the match:
(?=...)— positive lookahead: match only if followed by ...(?!...)— negative lookahead: match only if NOT followed by ...(?<=...)— positive lookbehind: match only if preceded by ...(?<!...)— negative lookbehind: match only if NOT preceded by ...
For example, \d+(?= dollars) matches "100" in "100 dollars" but not "100" in "100 euros".
Flags and Modifiers
Most regex engines support flags that change how patterns are interpreted:
g— global: find all matches, not just the firsti— case-insensitive matchingm— multiline:^and$match line starts/endss— dotall:.also matches newlines
Common Regex Mistakes
- Forgetting to escape special characters: If you want to match a literal dot, use
\.not. - Greedy vs. lazy matching:
.*is greedy (matches as much as possible). Add?to make it lazy:.*? - Overcomplicating patterns: Start simple and build up. Test incrementally.
- Using regex when you shouldn't: For parsing structured data like JSON or HTML, use a proper parser.
Where to Practice
The best way to learn regex is to practice. Use the Wootils Regex Tester to experiment with patterns in real time. Start with simple patterns and gradually increase complexity. Try extracting data from sample text, validating formats, or doing search-and-replace operations.
Conclusion
Regular expressions may look intimidating at first, but they follow logical rules that become second nature with practice. Start with basic character matching and quantifiers, then work your way up to groups, lookaheads, and more advanced features. Once you're comfortable with regex, you'll find yourself reaching for it constantly — it's one of those skills that pays dividends across your entire career.