Input validation lets us ensure only properly formed data is entering the workflow in an information system, preventing malformed data from reaching the database and triggering malfunction of various downstream components. RegEx is a powerful pattern-matching engine designed to search and match complex patterns: From email addresses to IP addresses to phone numbers to infinitely complex text patterns.
In this Amulet, we’ll use regular expressions to match some email addresses.
Problem Statement
We are presented with a set of email addresses containing different structures. A sample of the complete set can be found below:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Our job is to match all the valid email addresses, using whichever flavor suits us best. A typical email address is composed of the following parts:
- A unique username:
dougie.jones
- A domain name:
luckyseven
- A top-level domain name:
.com
Requirements
- Only the valid addresses must be matched.
- Each address must be a complete and single match, meaning we must not do partial matches on fragments of the address.
- Our expression has to be flexible, meaning it has to account for all possible and top-level domains.
- We must name each part of the email address with its respective section. That means:
- Username
- Domain name
- Top-level domain name
Tackling Groups
Tackling group by group and composing a final expression in the end can significantly ease the task.
The RegEx expression can be written in any flavor using any tool. However, the solution only includes the expression in PCRE2 flavor (PHP >=7.3).