Markdown

Regex

Regex is the Acronym for Regular Expression

Also referred to as Regex, it’s a sequence of characters that defines a search pattern. Primarily used for string matching, data validation, and search and replace operations, regexp allow you to find complex patterns—like email addresses, phone numbers, or specific code structures—within a body of text using a single line of logic.

Key Specifications

  • Engine: The software component (e.g., PHP, JS, Python re) that interprets the regex.
  • Literal Characters: Characters that match themselves (e.g., abc matches “abc”).
  • Metacharacters: Special symbols (like *, +, ?, .) that define the logic of the search.
  • Delimiters: Characters (usually /) used to wrap the expression, e.g., /pattern/flags.

Regexp Syntax “Cheat Sheet”

SymbolNameDescriptionExample
.WildcardMatches any single character except a newline.h.t matches “hat”, “hot”
^Anchor (Start)Matches the beginning of a string.^Hello
$Anchor (End)Matches the end of a string.world$
*QuantifierMatches 0 or more of the preceding element.ab* matches “a”, “ab”, “abbb”
+QuantifierMatches 1 or more of the preceding element.ab+ matches “ab”, “abbb”
\dDigitMatches any single numerical digit (0-9).\d\d matches “42”
[a-z]Character SetMatches any single character within the brackets.[A-C] matches “A”, “B”, or “C”
(...)Capture GroupGroups multiple tokens together for extraction.(abc)+

Practical Example: Email Validation

A simplified regexp to find an email address might look like this:

/\b[\w.-]+@[\w.-]+\.[a-zA-Z]{2,6}\b/g

  • [\w.-]+: Looks for one or more letters, numbers, dots, or dashes.
  • @: Looks for the literal “@” symbol.
  • \.: Looks for a literal period (the backslash escapes it so it’s not a wildcard).
  • {2,6}: Ensures the top-level domain (like .com or .org) is between 2 and 6 characters long.

The “Double-Edged Sword”

Regex is incredibly powerful but notoriously difficult to read once it reaches a certain complexity. This has led to the famous industry joke:

Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

Jamie Zawinski

Common Pitfalls

  • Greediness: Quantifiers like * are greedy by default, meaning they match as much text as possible. This can lead to matching more than you intended.
  • Readability: Complex regex one-liners can be nearly impossible for teammates (or your future self) to debug.

Articles Tagged Regex

View Additional Articles Tagged Regex