A few weeks ago, I needed to clean up an email list for a growth team. They had an Excel sheet with rows containing multiple email addresses mixed with other data. My task was to extract and sanitize these email addresses for use in an automated system.
Here's how I approached the problem using Python:
Email Extraction with Regular Expressions
The core of the solution is a regular expression pattern to identify email addresses:
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
This pattern effectively captures standard email address formats.
Script Structure
I organized the script using a few design patterns:
Strategy Pattern: Allows for different email extraction methods.
Singleton Pattern: Ensures a single instance of the email extractor.
Facade Pattern: Provides a simple interface for the entire process.
CSV Processing
The script reads the input CSV, extracts emails from each cell, removes duplicates, and writes unique emails to a new file.
Key Features
Extracts emails using regex
Eliminates duplicate addresses
Processes entire CSV files
Flexible design for easy modifications
Here's a basic outline of the main processing logic:
def process_csv(self):
seen_emails = set()
with open(self.input_file, 'r') as infile, open(self.output_file, 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(next(reader)) # Write header
for row in reader:
for element in row:
emails = self.email_extractor.extract_emails(element)
for email in emails:
if email not in seen_emails:
writer.writerow([email])
seen_emails.add(email)
This solution efficiently cleans email lists, saving time and reducing errors in email marketing campaigns. It's easily adaptable for similar data cleaning tasks.
If you are interested, the full code is on my github for use in your own projects or workflows.
What’s New?
For new readers, thank you for checking out my blog.
My name is Uchechukwu Emmanuel, I am a software engineer and a cyber security student.
I am passionate about learning and technology is how i satisfy that curiosity.
In that light, I am currently enaged with the HNG internship where I am looking to learn more about backend development and Devops. I also subscribed to the HNG Premium program where I will be given a certificate after my internship and placed in a community of like-minded people for collaboration and other opportunities. It is indeed an exciting season for me.
See you on the other side.