DDoS datasets | Cyber Attack And Def

DDoS SDN Dataset

The Kaggle notebook “AISecurityIssues – Dataset2” by Seveen Samir loads the DDoS SDN dataset, performs initial data exploration and cleaning (including checking missing values and understanding feature distributions), and preprocesses the data to prepare it for machine learning analysis. The code then applies feature selection and transforms the dataset into a suitable numerical format before training classification models to identify security-related patterns or threats within the data. Model performance is evaluated using a train/test split along with standard metrics such as accuracy and confusion matrices, forming a complete pipeline that integrates data preprocessing, feature engineering, model training, and evaluation for AI-driven cybersecurity issue detection.

The DDoS SDN Dataset on Kaggle was created and published by Aiken Kazin in 2021 and is referenced in research as a publicly available dataset used for Software-Defined Networking (SDN) DDoS attack detection experiments. The dataset contains approximately 104,345 network traffic records with 23 features, including flow-based attributes such as packet count, byte count, switch information, and other SDN traffic statistics, along with a binary label column indicating benign (0) or malicious/DDoS (1) traffic. It is specifically designed for supervised machine learning and intrusion detection research in SDN environments, allowing models to classify normal network traffic versus distributed denial-of-service attack behavior based on structured network flow features rather than textual data.

Code 1

Dataset Summary

Code Using the Dataset

The Kaggle notebook “DDoS Attack Detection & Classification” by Josia Given Santoso loads the DDoS SDN dataset, performs data preprocessing such as handling missing values, encoding labels, and normalizing numerical network traffic features, and conducts exploratory analysis to understand attack versus benign traffic distribution. The code then trains multiple supervised machine learning models on structured flow-based features to classify network traffic as normal or DDoS, typically using a train/test split for validation. Model performance is evaluated through accuracy scores, confusion matrices, and classification reports, demonstrating an end-to-end intrusion detection pipeline for identifying DDoS attacks in Software-Defined Networking environments.

Code 2

The Kaggle notebook “DDoS SDN” by Chitraksh Singh loads the DDoS SDN dataset, performs preprocessing steps such as cleaning the data, encoding the attack labels, and analyzing feature distributions related to network traffic flows in an SDN environment. The code then applies machine learning classification models to detect and distinguish DDoS attack traffic from normal traffic using the dataset’s structured flow-based features. Model performance is evaluated using a train/test split along with accuracy metrics, confusion matrices, and classification reports, presenting a complete machine learning pipeline for DDoS detection in Software-Defined Networking systems.

CIC-DDoS2019 Dataset

Dataset

Code 1

Dataset Summary

The CICDDoS2019 dataset on Kaggle (uploaded by dhoogla) was originally developed by the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick, with the dataset generation commonly attributed to Sharafaldin et al., and was released in 2019 as part of the CIC network intrusion detection dataset series. It contains large-scale labeled network traffic data representing both benign activity and multiple modern Distributed Denial-of-Service (DDoS) attacks, stored primarily as CSV files with over 80 flow-based features extracted using CICFlowMeter, such as packet counts, flow duration, protocol details, ports, timestamps, and statistical traffic metrics. The dataset includes diverse attack types (e.g., DNS, LDAP, NTP, SYN, UDP, SSDP, MSSQL, and WebDDoS) along with realistic background user traffic generated in a controlled testbed to mimic real-world network behavior, making it widely used for supervised machine learning and intrusion detection research in DDoS and network security analysis.

Code 2

Code 3

The Kaggle notebook “LSTM DDoS” by Azhar Zen loads the CICDDoS2019 network traffic dataset, performs preprocessing steps such as cleaning the data, handling missing values, encoding attack labels, and scaling numerical flow-based features extracted from network traffic. The code then reshapes the structured traffic features into sequences suitable for deep learning and trains an LSTM (Long Short-Term Memory) neural network to classify traffic as benign or DDoS attack. Model performance is evaluated using a train/test split along with accuracy and loss metrics, demonstrating a deep learning–based intrusion detection pipeline that leverages temporal patterns in network flow data for DDoS attack detection.

Code 1

Code Using the Dataset

The Kaggle notebook “CIC-DDoS2019-ML” by Chitraksh Singh loads the CICDDoS2019 dataset, performs extensive preprocessing including data cleaning, handling missing values, label encoding, and normalization of the flow-based network traffic features extracted from CICFlowMeter. The code conducts exploratory data analysis to understand attack distributions and feature correlations, then trains multiple traditional machine learning classifiers (such as Random Forest, Logistic Regression, and other models) to classify network traffic as benign or DDoS attacks. Model performance is evaluated using a train/test split along with accuracy scores, confusion matrices, and classification reports, providing a comprehensive machine learning pipeline for intrusion detection using large-scale network flow data.

Code 2

The Kaggle notebook “DDoS Detection Using Machine Learning” by Rakib Hossain Sajib loads the CICDDoS2019 dataset, performs data preprocessing steps such as cleaning missing values, encoding categorical labels, and scaling numerical network flow features, and conducts exploratory analysis to examine attack versus benign traffic distribution. The code then trains and compares multiple supervised machine learning classifiers—commonly including models such as Logistic Regression, Random Forest, Decision Tree, and Support Vector Machine—to classify network traffic as normal or DDoS attack. Model performance is evaluated using a train/test split along with accuracy scores, confusion matrices, and classification reports, presenting a complete end-to-end machine learning pipeline for DDoS detection based on structured network flow features.

Code 3

DDoS Traffic Dataset

Dataset

Code 1

Dataset Summary

The DDoS Traffic Dataset on Kaggle was created and published by Oktay Ördekçi and is listed on Kaggle as being updated recently (approximately 2024–2025). It contains structured network traffic data captured under Distributed Denial-of-Service (DDoS) attack conditions, designed to analyze how malicious traffic differs from normal network behavior. The dataset primarily includes numerical network flow features (such as traffic statistics and packet-related attributes) along with a target label indicating whether the traffic instance corresponds to a DDoS attack or normal activity, making it suitable for supervised machine learning and intrusion detection research focused on network-based cyberattack classification.

Code 2

Code 3

Code Using the Dataset

Code 1

Code 2

The Kaggle notebook “DDoS Prediction EDA & ML (Accuracy 98%)” by Oktay Ördekçi loads the DDoS Traffic Dataset, performs exploratory data analysis to examine feature distributions, correlations, and class balance between normal and attack traffic, and preprocesses the data through cleaning, label encoding, and feature scaling to prepare structured network flow attributes for modeling. The code then trains multiple supervised machine learning classifiers on the numerical traffic features to detect DDoS attacks, compares their performance, and evaluates results using a train/test split with metrics such as accuracy, confusion matrices, and classification reports, forming a complete end-to-end pipeline for DDoS traffic classification using EDA, preprocessing, and comparative model analysis.

The Kaggle notebook “DDoS Attack EDA & ML (R² = 0.97)” by Fatmanur Sari loads the DDoS traffic dataset, performs exploratory data analysis to investigate feature distributions, correlations, and the relationship between attack and normal traffic instances, and preprocesses the dataset through data cleaning, normalization, and feature preparation of structured network traffic attributes. The code then applies multiple machine learning models to learn patterns in the numerical flow-based features and predict DDoS attack behavior, using a train/test split for validation. Model performance is evaluated with statistical and classification metrics, visualizations, and accuracy-related scores, presenting a full pipeline that integrates EDA, preprocessing, model training, and performance analysis for DDoS attack detection.

Code 3

The Kaggle notebook “Multi-Model Automated DDoS Attack” by Rafael Rainer loads a DDoS traffic dataset, performs preprocessing steps such as data cleaning, handling missing values, label encoding, and feature scaling on structured flow-based traffic features, and conducts exploratory analysis to understand attack versus benign traffic patterns. The code then implements and compares multiple machine learning classifiers within an automated pipeline to detect DDoS attacks, training the models using a train/test split and evaluating them with metrics such as accuracy, confusion matrices, and classification reports. This notebook presents a comparative, automated approach to intrusion detection by testing several models on the same network traffic features to identify the most effective method for DDoS attack classification.