
Also referred to as Regex, it’s a sequence of characters that defines a search pattern. Primarily used for string matching, data validation, and search and replace operations, regexp allow you to find complex patterns—like email addresses, phone numbers, or specific code structures—within a body of text using a single line of logic.
Key Specifications
- Engine: The software component (e.g., PHP, JS, Python
re) that interprets the regex. - Literal Characters: Characters that match themselves (e.g.,
abcmatches “abc”). - Metacharacters: Special symbols (like
*,+,?,.) that define the logic of the search. - Delimiters: Characters (usually
/) used to wrap the expression, e.g.,/pattern/flags.
Regexp Syntax “Cheat Sheet”
| Symbol | Name | Description | Example |
. | Wildcard | Matches any single character except a newline. | h.t matches “hat”, “hot” |
^ | Anchor (Start) | Matches the beginning of a string. | ^Hello |
$ | Anchor (End) | Matches the end of a string. | world$ |
* | Quantifier | Matches 0 or more of the preceding element. | ab* matches “a”, “ab”, “abbb” |
+ | Quantifier | Matches 1 or more of the preceding element. | ab+ matches “ab”, “abbb” |
\d | Digit | Matches any single numerical digit (0-9). | \d\d matches “42” |
[a-z] | Character Set | Matches any single character within the brackets. | [A-C] matches “A”, “B”, or “C” |
(...) | Capture Group | Groups multiple tokens together for extraction. | (abc)+ |
Practical Example: Email Validation
A simplified regexp to find an email address might look like this:
/\b[\w.-]+@[\w.-]+\.[a-zA-Z]{2,6}\b/g
[\w.-]+: Looks for one or more letters, numbers, dots, or dashes.@: Looks for the literal “@” symbol.\.: Looks for a literal period (the backslash escapes it so it’s not a wildcard).{2,6}: Ensures the top-level domain (like .com or .org) is between 2 and 6 characters long.
The “Double-Edged Sword”
Regex is incredibly powerful but notoriously difficult to read once it reaches a certain complexity. This has led to the famous industry joke:
Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”
Jamie Zawinski
Common Pitfalls
- Greediness: Quantifiers like
*are greedy by default, meaning they match as much text as possible. This can lead to matching more than you intended. - Readability: Complex regex one-liners can be nearly impossible for teammates (or your future self) to debug.