User Tools

Site Tools


datasets

This is an old revision of the document!


Datasets on Security Research at the University of Trento

Datasets are difficult to build and construct and yet they are an essential part of scientific research.

Some of them may be just collection of publicly available raw data (such as the NVD) but there is a huge difference between a Web-site or the archive of a mailing list and an Excel file.

At the Security Group we realize how difficult is to build a dataset, so we have decided to make them available to promote access and replicability of our experiments on vulnerability assessement models.

Available Datasets Used in our Research

  • NVD: is the reference database for the population of vulnerabilities. It is collects the data from the National Vulnerability Database from NIST (link).
  • EDB is the reference database for public (proof-of-concept) exploits. It collects the data from the Exploit-DB web site (link).
  • EKITS is our database of vulnerabilities and exploits traded in the black markets. We have built an update infrastructure that allow us to keep our database well ahead of any public source on such vulnerabilities publicly available (e.g. Contagio's Exploit Pack Table). We only share this dataset on the basis of a joint research project.
  • SYM is a database of vulnerabilities exploited in the wild as reported by Symantec's sensors world wide. This dataset is a collection of publicly available vulnerability data through Symantec's Threat Explorer and Attack Signatures websites.
  • WINE reports volumes of attacks per month from 2009 and 2012. Its integration with our datasets was possible thank to our collaboration with Symantec's WINE Program. If you want to have accesss to this dataset you should directly contact Symantec.
  • FFV collects the vulnerabilities of the Firefox browser. It is the most comprehensive database. It integrates the Mozilla Foundation Security Advisory (MFSA) bulletin, the Mozilla Bugzilla bugtracker and the NVD.
  • GCV reports the vulnerabilities of the Google Chrome Browser extracted from Chrome Issue Tracker, integrated with the NVD to reconstruct affected versions and checked for consistency with the code distribution (just using the NVD would yield more than 10% bogus foundational vulnerabilities). Another caveat is that this might not include all vulnerabilities of the browser as some of the third party software such as WebKit are only partly included.
  • IEV lists the vulnerabilities for Internet Explorer extracted from the Microsoft Security Bulletin and integrated with the NVD to reconstruct affected versions.
  • ASV Vulnerabilities of the Safari Web Browser extracted from the Apple Knowledge Base and integrated with the NVD to reconstruct affected versions.

How to Access the Data

  1. Write us at security-dataREMOVESPAM@disi.unitn.it to see if the data is what you actually want (the email alias will expand to the researchers who worked on the datasets);
  2. Specify the initial purpose for which you would like to use the data (this will go in the formal licence and in the web page with your name attached to it);
  3. We will fill the licensing agreement unitn_license_v2.6.pdf with your data and the head of department (or a tenured full professor of department) should sign it;
  4. We will return the signed copy of the agreement and the excel file;
  5. Report to us at security-dataREMOVESPAM@disi.unitn.it the publications based on the data which should be include the citation to our appropriate paper;
  6. That's it. No fee, no painful plodding through websites for web2junk, no junk2data cleaning, etc. citation and reporting back are the only formal requirements in the license for your research.

Rights for Access

This is the human readable summary of the rights and obligations that the license entails:

  • You can
    1. share these datasets in whatever format with any member of your institution—faculty, administration, students, research associates in the case of universities, and employees in the case of government ministries and research organizations;
    2. use these datasets in creative ways for scientific, not-profit, non-commercial use including publications under the terms of the agreement.
  • You cannot
    1. post any of these datasets on your website such that it becomes available to non-members of your institution, or make copies that circulate outside of your institution;
    2. use it for commercial purposes unless agreed in writing
  • You agree to
    1. cite the appropriate reference work in all your publications that make use of the datasets or its derivatives
    2. provide us the information on the publication where you used the data at security-dataREMOVESPAM@disi.unitn.it for the purposes of posting it on our web site
    3. refer to us any people outside your institution that requires you for the data
  • You are aware that
    1. Other parties may have rights or set licensing obligations in some of the data contained in the datasets. It is up to you to obtain permissions from these parties if needed.
    2. The datasets may contains errors or may be unfit for your purposes and we bear no liability for any problem you might encounter.

Users

Here are researchers/institutes who are granted the access permission for our data sets.

  1. DAI-Labor (Technische Universitat Berlin)
  • Competence Center Security at DAI-Labor is a security research group, and in one of our current public-grant research projects, Auvegos, we develop a discrete-event simulation software for performing security analysis in network infrastructures, especially in the context of e-government. To this end, we generate or explicitly model of the domain networks to assess, and we associate the nodes in this network with CPE and CVE information. Based on this, we perform algorithmic computations (Attack Graph Generation, MDP-based risk assessment,…) and evaluate the effectiveness of potential mitigation strategies via simulation runs. The requested datasets would be used to generate input for the aforementioned simulation tool.
datasets.1433179615.txt.gz · Last modified: 2021/01/29 10:58 (external edit)