MITM datasets & projects | Cyber Attack And Def

Edge-IIoTset Cyber Security Dataset of IoT & IIoT

The Kaggle notebook “MITM Cleaned” by Tania Uzunova loads the Edge-IIoTset dataset, inspects the dataset (shape, column info, and sample rows), and checks the unique values in the Attack_type label column. It then isolates the two classes Normal and MITM/MiTM, standardizes the Attack_type text to consistent lowercase formatting, and reduces the Normal class to exactly 30,000 samples via random sampling to control class imbalance while keeping all available MITM records. The notebook saves the resulting two-class dataset as reduced_dataset.csv, and visualizes the final class distribution with a pie chart (exported as attack_type_distribution.png) while printing the final dataset shape and label counts.

The Edge-IIoTset Cyber Security Dataset of IoT & IIoT was created in 2022 by Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, and Helge Janicke as a realistic benchmark dataset for machine-learning-based intrusion detection in Internet of Things (IoT) and Industrial IoT (IIoT) environments. The dataset was generated using a multi-layer IoT/IIoT testbed containing various sensors, devices, protocols, and edge/cloud computing components to simulate real industrial network traffic. It contains over 72 million network records derived from large PCAP captures, with features extracted using tools such as Zeek and TShark, and includes 61 selected flow features derived from over 1,000 raw attributes. The dataset includes 14 attack types grouped into five threat categories, including DoS/DDoS attacks, information gathering attacks, and man-in-the-middle attacks, making it suitable for training and evaluating machine learning and deep learning intrusion detection systems in IoT and IIoT networks.

Code 1

Dataset Summary

Code Using the Dataset

The Kaggle notebook “Hybrid IDS for MiTM: PCA + Isolation Forest + KNN” by Sabarna97 loads the Edge-IIoTset dataset and reframes the original Attack_type labels into a 3-class target: 0 = Normal (No Attack), 1 = Other Attacks, 2 = MITM. It drops label-leakage columns (Attack_label, Attack_type) and additional non-feature fields (e.g., timestamp/IP-style columns), scales the remaining numeric features, and reduces dimensionality with PCA. The notebook then applies Isolation Forest on the training set to remove anomalous samples before model training, and trains a K-Nearest Neighbors (KNN) classifier with GridSearchCV (5-fold StratifiedKFold) to tune n_neighbors and weights. Finally, it evaluates the tuned KNN using train/test accuracy, a classification report, and confusion matrices, and produces ROC curves (one-vs-rest) for the three classes, saving plots to an output folder.

Code 2

The Kaggle notebook “Predict Attack and Attack Type” by Waleed Gul uses the Edge-IIoTset dataset to build two supervised machine learning tasks: (1) binary attack detection using the Attack_label column and (2) multi-class attack classification using the Attack_type column. The notebook performs exploratory analysis, removes high-cardinality or non-numeric fields, and encodes labels before splitting the data. To handle class imbalance, it applies SMOTE on the training data and trains several models including Logistic Regression, Decision Tree, and Random Forest, with GridSearchCV used for hyperparameter tuning. The models are evaluated using accuracy and confusion matrices. The notebook lists among multiple attack categories present in the dataset, including Normal traffic, DDoS, DoS, Information Gathering attacks (Port scans, etc), Man-in-the-Middle (MITM), Injection attacks.

Kitsune Network Attack Dataset

Dataset

Code 1

Dataset Summary

The Kitsune Network Attack Dataset on Kaggle was created by Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai as part of their research on the Kitsune network intrusion detection system, which was introduced in 2018. The dataset contains network traffic collected from an IoT environment and a commercial IP-based surveillance system, capturing both normal traffic and multiple cyberattack scenarios. It includes over 27 million network instances with time-series features, and represents nine different attack types, including ARP Man-in-the-Middle attacks, SYN DoS attacks, Mirai botnet activity, OS scanning, and fuzzing attacks. The dataset is commonly used in machine learning and deep learning research for developing anomaly detection and network intrusion detection systems (NIDS) because it provides realistic IoT network traffic with labeled attack behaviors.

Code 2

Code 3

The Kaggle notebook “Kitsune MITM Attack Detection” by smeyra builds a supervised ML detector for ARP Man-in-the-Middle (MitM) traffic using the Kitsune Network Attack Dataset. It loads ARP MitM dataset, drops non-feature columns (Info, No.), and converts categorical network fields (Source, Destination, Protocol) into numeric form using LabelEncoder. It then imports the ground-truth labels from ARP_MitM_labels.csv, attaches them as a target column (y), scales all features with StandardScaler, and trains a RandomForestClassifier using a train/test split. The notebook evaluates detection performance with a classification report, accuracy score, and a labeled confusion matrix, and finishes with exploratory correlation analysis (feature–label correlations plus correlation heatmaps/bar plots) to understand which numeric features are most associated with the MitM label.

Code 1

Code Using the Dataset

The Kaggle notebook “Time Series Visualization of Network Activity” by ernie55ernie uses the Kitsune Network Attack Dataset and processes PCAP files plus their label CSVs to build time-series views of network behavior across multiple attack scenarios. In the code, it explicitly works with these attack types: ARP MitM, SYN DoS, Active Wiretap, SSDP Flood, Video Injection, SSL Renegotiation, Mirai Botnet, Fuzzing, and OS Scan. For each attack folder, the notebook parses packets with Scapy (PcapReader), extracts per-packet fields (timestamp, source/destination IP, TCP ports when present) and aligns them with the provided labels, then aggregates packet timestamps per source IP to plot per-minute packet-count time series (marking sources as malicious when labeled). Finally, it exports the processed packet-level records for each attack type to separate CSV files with columns time, src, sport, dst, dport, label.

Code 2

The Kaggle notebook “Kitsune – Attack Specific Analysis” by Julien Michel builds an end-to-end analysis pipeline on the Kitsune Network Attack Dataset by loading and sampling (every 20th row) multiple per-attack CSV files, attaching their label files, and combining them into a unified dataset with a mapped multiclass label scheme (benign + attack categories). The notebook then prepares binary and multiclass targets, performs exploratory checks/visualizations, and (when enabled) runs machine-learning evaluation on the extracted feature set. The attacks explicitly handled and analyzed in the code include ARP MitM, Active Wiretap, Fuzzing, Mirai Botnet, OS Scan, SSDP Flood, SSL Renegotiation, SYN DoS, and Video Injection.

Code 3

MITM Projects

Project 1

Project 2

Project 1

The mitm-detector project by Chris Horn is a lightweight set of shell scripts aimed at detecting potential Man-in-the-Middle (MITM) interception, especially in HTTPS/TLS connections. It works by fetching SSL/TLS certificates from target servers, extracting certificate details and fingerprints, and comparing them against expected or previously trusted certificates to flag suspicious certificate substitution that can occur under MITM or transparent proxying. The repository also includes helper checks geared toward identifying transparent proxy behavior and possible content tampering, making it a practical defensive toolkit for verifying connection integrity on untrusted networks or environments where interception is a risk.

Project 3

Project 4

Project 2

Project 3

The RDP-MITM-Detection-Script project by Pushkar Singh is a small Nmap Lua (NSE) script designed to help identify Remote Desktop Protocol (RDP) servers that may be susceptible to Man-in-the-Middle (MITM) interception due to weak security settings. The script targets common RDP misconfigurations linked to MITM risk—particularly weak encryption configurations and the absence or disablement of Network Level Authentication (NLA)—by initiating an RDP connection probe and analyzing the server’s response through Nmap’s scripting engine. It is intended as a defensive discovery check that can be automated within Nmap scans (typically against port 3389) to quickly flag RDP endpoints that should be hardened to reduce interception risk.

The MITM-Detection project by Furat Dahesh is a containerized, educational MITM simulation and detection system that demonstrates how man-in-the-middle attacks can intercept and manipulate traffic in a controlled lab environment. It implements a client–proxy–server architecture where the middle component acts as an attacker capable of intercepting, modifying, replaying, or delaying network communications, while a detection engine monitors the traffic and a real-time dashboard visualizes attack activity and impact for analysis and demonstration purposes.

Project 4

The dos-and-arp-nids project by Harel Itzhaki is an ML-based Network Intrusion Detection System (NIDS) designed to detect Denial-of-Service (DoS) attacks and ARP Man-in-the-Middle (MitM) attacks on a LAN. The repository implements a supervised machine learning pipeline that extracts traffic characteristics, trains models to detect and classify these attacks, and evaluates performance as part of an intrusion detection workflow; its README explicitly describes the goal as leveraging machine learning algorithms for DoS and ARP MitM detection.