Leipzig Intrusion Detection Data Set

LID-DS is a modern host based anomaly intrusion detection system (HIDS) data set

LID-DS logo

On this page you can download the recorded data of the LID-DS.


  • LID-DS is a modern host based intrusion detection system (HIDS) data set.
  • It is recorded on a modern operating system (Ubuntu 18.04).
  • It consists of different scenarios. Each scenario represents a real vulnerability.
  • We recorded system calls with their metadata like parameters, return values, user ids, process/thread ids, file system handles, timestamps, and io buffers (always the first 80 byte).
  • With this host based intrusion detection system data set you can re-evaluate and compare old HIDS, develop and compare new approaches using arguments and other metadata in addition to the system call sequences.
  • For anomaly-based HIDS we recommend to use the first 200 recordings of normal behaviour of each scenario as training data for reasons of comparability. This corresponds to more than 2 hours of real world training data. The remaining at least 800 recordings of normal behavior and at least 100 recordings of attack behavior can then be used for evaluation.


Each of the archive files contains about 1000 files with normal behavior and about 100 files with normal and exploited behavior. In addition, each archive contains a runs.csv file with the following format:

image_name, scenario_name, is_executing_exploit, warmup_time, recording_time, exploit_start_time
actual_name, file_name_0001, False, 10, 35, -1
actual_name, file_name_9999, True, 10, 40, 15

This file tells you how long the recording took, whether it was normal or attack behavior, and if so, when the attack started.

The data files itself have the following format:

event_number event_time cpu user_uid process_name thread_id event_direction event_type event_arguments

For example:

1159 00:27:12.310259734 6 33 apache2 15862 > writev fd=12(<4t>> size=31 
1160 00:27:12.310311837 6 33 apache2 15862 < writev res=31 data=................j*.@s.&......4. 

Literature and Sources

For a more detailed description of the data set, the recording process and the frameworks used, please refer to the following publications, thesises and github repositories:

  • Martin Grimmer; Martin Max Röhling; Dennis Kreusel; Simon Ganz, A Modern and Sophisticated Host Based Intrusion Detection Data Set, 16. Deutscher IT-Sicherheitskongress, 2019
  • Martin Max Röhling; Martin Grimmer; Dennis Kreußel; Jörn Hoffmann; Bogdan Franczyk, Standardized container virtualization approach for collecting host intrusion detection data, [submitted]
  • Dennis Kreußel. “Simulation and analysis of system call traces for adversial anomaly detection.”. Bachelor thesis, Leipzig University, 2019
  • Simon Ganz. “Ein moderner Host Intrusion Detection Datensatz”. Master thesis, Leipzig University, 2019
  • LID-DS Framework: A lightweight intrusion detection data simulation framework.

Scenarios and Downloads

Attack ScenarioCVE / CWELink
HeartbleedCVE-2014-0160148 mb
PHP file upload:
unrestricted upload of file with dangerous type
CWE-434649 mb
Bruteforce login:
improper restriction of excessive authentication
CWE-307208 mb
SQL injection with sqlmapCWE-89689 mb
ZipSlipvarious6,6 gb
EPS file upload:
unrestricted upload of file with dangerous type
CWE-4343,9 gb
MySQL authentification bypassCVE-2012-2122155 mb
Nginx integer overflow vulnerabilityCVE-2017-752941 mb
Sprockets information leak vulnerabilityCVE-2018-3760389 mb
Rails file content disclosure vulnerabilityCVE-2019-5418369 mb

Contact regarding LID-DS

Leipzig University, Martin Grimmer (grimmer@informatik.uni-leipzig.de) and Martin Max Röhling (roehling@wifa.uni-leipzig.de).