Extracting Analyzing Email Data Python

A Guide to Extracting and Analyzing Email Data Using Python

In the modern world, email is one of the most essential means of communication. As a result, individuals and organizations often have a vast amount of email data at their disposal. This data can be a valuable resource for various purposes, including analysis, reporting, and data mining. In this guide, we will walk you through the process of extracting and analyzing email data using Python.

Step 1: Installing Dependencies

Before we dive into the code, you'll need to install a Python library called beautifulsoup4. You can do this using the following command:

pip install beautifulsoup4

Step 2: Python Code

Here's a Python code snippet that demonstrates how to extract email data from .eml files and store it in a CSV file for further analysis.

import os
import email
import csv
from bs4 import BeautifulSoup

# Specify the directory where your .eml files are located
eml_directory = r'C:\Users\abdulrehman\mail'

# Specify the CSV output file
csv_file = "emails.csv"

# Initialize a list to store email data
email_data = []

# Loop through .eml files in the directory
for filename in os.listdir(eml_directory):
    if filename.endswith(".eml"):
        with open(os.path.join(eml_directory, filename), "r", encoding="utf-8") as eml_file:
            msg = email.message_from_file(eml_file)
            sender = msg["From"]
            recipient = msg["To"]
            subject = msg["Subject"]
            date = msg["Date"]
            
            # Extract the email message body (HTML)
            body = ""
            if msg.is_multipart():
                for part in msg.walk():
                    content_type = part.get_content_type()
                    if content_type == "text/plain":
                        body += part.get_payload()
                    elif content_type == "text/html":
                        body += part.get_payload(decode=True).decode("utf-8")
            else:
                body = msg.get_payload(decode=True).decode("utf-8")

            # Parse HTML to remove tags and keep text content
            soup = BeautifulSoup(body, "html.parser")
            text_body = soup.get_text()
            
            email_data.append([sender, recipient, subject, text_body, date])

# Write the email data to a CSV file
with open(csv_file, "w", newline="", encoding="utf-8") as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerow(["Sender", "Recipient", "Subject", "Email Body (Text)", "Date & Time"])
    csv_writer.writerows(email_data)

Step 3: Understanding the Code

  • We start by importing necessary libraries, including os, email, csv, and BeautifulSoup.
  • You need to specify the directory where your .eml files are located (eml_directory) and the name of the CSV output file (csv_file).
  • We initialize an empty list email_data to store the extracted email data.
  • The code then loops through all the .eml files in the specified directory, extracts relevant information such as sender, recipient, subject, and date, and extracts the email body content (both HTML and plain text).
  • We use BeautifulSoup to parse the HTML email body and extract the text content, removing any HTML tags.
  • Finally, the email data is written to a CSV file with headers for easy analysis.

Step 4: Running the Code

To run this code, make sure you have Python installed on your system. Modify the eml_directory and csv_file variables to point to your email data directory and desired output CSV file.

Execute the code, and it will generate a CSV file containing the extracted email data, ready for further analysis using Python or any data analysis tool of your choice.

By following these steps, you can efficiently extract and analyze email data using Python, opening up opportunities for various data-driven insights and decision-making processes.

Related Articles

How to Install WordPress in One Click on Lvato (Beginner Guide 2026)

Starting a website used to be painful. You had to download WordPress, create a database by hand, edit configuration files, upload everything over FTP, and pray nothing broke. For a beginner, that first step alone was enough to give up.Today, none of that is necessary....

Handling a 50,000 Users Cron Job in PHP Without Breaking the System

When a system grows to tens of thousands of users, the real challenge is no longer features—it’s how efficiently you process data repeatedly without collapsing your database or server. A common mistake in PHP applications is treating cron jobs as “run everything every...

How to Check If Your Next SaaS Project Can Handle High Traffic

It is not features that usually kill most SaaS applications; they die when traffic hits them. All good until it starts going live… and then the site slows down, the API gets slow, and all sorts of issues related to databases show up, making that "scalable SaaS" look...