LID-DS

Leipzig Intrusion Detection – Data Set

LID-DS logo

On this page you can download the recorded data of the LID-DS.

Overview

  • LID-DS is a modern host based intrusion detection system (HIDS) data set.
  • It is recorded on a modern operating system (Ubuntu 18.04).
  • It consists of different scenarios. Each scenario represents a real vulnerability.
  • We recorded system calls with their metadata like parameters, return values, user ids, process/thread ids, file system handles, timestamps, and io buffers (always the first 80 byte).
  • With this host based intrusion detection system data set you can re-evaluate and compare old HIDS, develop and compare new approaches using arguments and other metadata in addition to the system call sequences.
  • For anomaly-based HIDS we recommend to use the first 200 recordings of normal behaviour of each scenario as training data for reasons of comparability. This corresponds to more than 2 hours of real world training data. The remaining at least 800 recordings of normal behavior and at least 100 recordings of attack behavior can then be used for evaluation.

Usage

Each of the archive files contains at least 1000 scap files with normal behavior and at least 100 scap files with normal and exploited behavior. In addition, each archive contains a runs.csv file with the following format:

image_name, scenario_name, is_executing_exploit, warmup_time, recording_time, exploit_start_time
actual_name, file_name_0001, False, 10, 35, -1
actual_name, file_name_9999, True, 10, 40, 15
...

This file tells you how long the recording took, whether it was normal or attack behavior, and if so, when the attack started. The files can be read with the open source tool Sysdig. (sysdig -r filename)

Literature and Sources

For a more detailed description of the data set, the recording process and the frameworks used, please refer to the following publications, thesises and github repositories:

  • Martin Grimmer; Martin Max Röhling; Dennis Kreusel; Simon Ganz, A Modern and Sophisticated Host Based Intrusion Detection Data Set, 16. Deutscher IT-Sicherheitskongress, 2019
  • Martin Max Röhling; Martin Grimmer; Dennis Kreußel; Jörn Hoffmann; Bogdan Franczyk, Standardized container virtualization approach for collecting host intrusion detection data, [submitted]
  • Dennis Kreußel. “Simulation and analysis of system call traces for adversial anomaly detection.”. Bachelor thesis, Leipzig University, 2019
  • Simon Ganz. “Ein moderner Host Intrusion Detection Datensatz”. Master thesis, Leipzig University, 2019
  • LID-DS Framework: A lightweight intrusion detection data simulation framework.

Scenarios and Downloads

Attack ScenarioCVE / CWELink
HeartbleedCVE-2014-0160574 mb
PHP file upload:
unrestricted upload of file with dangerous type
CWE-434885 mb
Bruteforce login:
improper restriction of excessive authentication
attempts
CWE-307688 mb
SQL injection with sqlmapCWE-89904 mb
ZipSlipvarious5,4 gb
EPS file upload:
unrestricted upload of file with dangerous type
CWE-4343,3 gb
MySQL authentification bypassCVE-2012-2122523 mb
Nginx integer overflow vulnerabilityCVE-2017-7529372 mb
Sprockets information leak vulnerabilityCVE-2018-3760735 mb
Rails file content disclosure vulnerabilityCVE-2019-5418539 mb

Contact regarding LID-DS

Leipzig University, Martin Grimmer (grimmer@informatik.uni-leipzig.de) and Martin Max Röhling (roehling@wifa.uni-leipzig.de).