regex

Regex is a universal* system for pattern-matching. This means it's standard* across all languages. So no matter what you do, you'll probably come across regex.

* Well actually 🤓 🤓 🤓 different regular expression engines may be missing/have extra features, but it's mostly the same

Remark

Regex isn't something that can really be taught. I could list all the syntax etc, but that's pointless. It takes lots of practice, and even people who know how to use regex end up googling and doing lots of trial and error. No matter how "good" you are at regex, you always put it through some tests to make sure it works.

If you ever need help making a regex, try googling your case. Common cases likely have some answer on stack overflow already. Also, more specific features can also be googled. For example, "regex match all letters and numbers" yields a Stackoverflow result. Finally, regexr is a very good resource for testing regexes.

You can also click "cheatsheet" on regexr for a good cheatsheet.

Example

^[A-z0-9_-]+@gmail.com$

  • ^ means beginning of string
  • [A-z0-9_-] matches one character which is either:
    • A letter (capital or not), since we go from A-z. a-z would be lowercase only.
    • A number
    • _ or -
  • + means one or more of the previous pattern
  • @gmail.com means it must have @gmail.com at the end
  • $ means the end of the string

So the following would be matched:

  • deez_NUTs@gmail.com
  • 123@gmail.com

But the following would not (despite being valid emails):

  • "valid email"@gmail.com
  • 12#3@gmail.com

Alternative ways to write the same regex:

  • ^(A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|0|1|2|3|4|5|6|7|8|9|-|_){1,}@gmail.com$
  • ^[\w-]+@gmail.com$
    • Since \w is any word character (alphanumeric & underscore)
Example

^(\+1 )?\([0-9]{3}\) [0-9]{3}-[0-9]{4}$

This is actually a regex for a phone number (in a very specific format).

  • ^ means beginning of string
  • (\+1 )? matches a potential +1 prefix to the phone number
    • The brackets "group" the items inside
    • \+ means a literal plus sign. Since the + sign has a special meaning in regex, it needs to be escaped
    • 1 means a 1 and a space
    • ? means optional, so this pattern may or may not exist
  • \( means a literal bracket, since brackets have special meaning in regex. Same with \)
  • [0-9] matches any number
    • {3} means 3 numbers
  • Other numbers and repetitions are self-explanatory
  • $ means the end of the string

The following will be matched:

  • (304) 625-2000
  • +1 (800) 267-2001

The following will not be matched (despite being valid phone numbers):

  • +7 (495) 697-03-49
  • (850 2) 381 44 10
  • 123-456-789

Alternative ways to write the same regex:

  • ^(\+1 ){0,1}\(\d{3}\) \d\d\d-\d{4}$
    • since \d is any digit