User Tools

Site Tools


prosved

This is an old revision of the document!


ProSVED

Probability of vulnerabilities in libraries with high exposure to the Internet ProSVED: Projection of Security Vulnerabilities caused by Exploits in Dependencies






Official EU information

Objective and approach

Projection of Security Vulnerabilities caused by Exploits in DependenciesProSVED—is an Horizon Europe MSCA Postdoctoral Fellowship hosted in the University of Trento, Italy, led by Carlos E. Budde and Fabio Massacci.

Estimating the amount and severity of security vulnerabilities in code is essential for software quality and control. Purely-empirical approaches lack prognosis versatility, as they are generally applicable to the codebase used for training, with extrapolation capabilities that degrade over time. In turn, traditional formal-modelling approaches (“the other side of the spectrum”) depend on assumptions that do not hold in the field, such as the independence between a codebase and its following version.

ProSVED generates quantitative forecasts about the emergence of security vulnerabilities in (third-party) open-source code, which propagate via software dependencies to threaten entire projects. Its main goal is to introduce theoretical and practical methodologies for the prognosis of vulnerability-propagation in software, that can model the full stack of third-party libraries underlying the codebase of a software project.

Time Dependency Tree

Time Dependency Trees — This calls for lightweight representations of the evolution of software in time, and the acyclic interdependency of codebases in projects that can use hundredths of third-party source code, where a single vulnerability can bring it all down. ProSVED organises this complexity at the high-abstraction layer of library-dependency and -evolution, generating a directed acyclic graph that represents the evolution in time of a dependency tree: a Time Dependency Tree. Forecasting analyses can then proceed by harvesting the data available about past vulnerabilities, effectively implementing a time-series study where the nodes in the Time Dependency Tree DAG will be labelled with the quantitative estimates produced.

Propagation of probabilities

Time Dependency Trees provide the skeleton on which vulnerability propagation can be analysed. These propagations can be deterministic by code use, or probabilistic by code evolution. An example of deterministic propagation is dependency inclusion. e.g. if library a₁ has a dependency d₂, then running a₁ will at some point execute code from d₂, which means that an exploit to d₂ can be used as an exploit to a₁. An example of probabilistic propagation is the persistence of the codebase in source code development. If d₃ is the version of library d released as the successor of d₂, and a vulnerability is found in d₃, there is a non-zero (and typically quite large) probability that d₂ is affected by the same vulnerability.

Attack Trees — By their acyclic nature, both in the dependency and time dimensions, Time Dependency Trees (TDTs) are close to the standard modelling formalism known as Attack Trees (ATs), and in fact an injection can be made from TDT models to AT models. ATs are used in event-based representations of progressive attacks, much alike the propagation of vulnerabilities across a chain of code dependencies, and count with a plethora of efficient algorithms for the quantification of security properties such as the probability or min-time to attack. The representation of TDTs as ATs leverages these memory- and runtime-optimal algorithms.

Quantitative forecasts of security vulnerabilities

TDTs (and ATs) offer optimal representations of codebases and their evolution in time, to allow quantitative studies of the propagation of security vulnerabilities—but they do nothing to effectively quantify these probabilities. For that, ProSVED poses the following broad research question:

How does the probability of finding a security vulnerability in a software library evolve over time?

While time-dependence of exploits and vulnerabilities is agreed upon by the practitioners' community—see e.g. the Temporal Metrics from the CVSS standard—the great majority of research has focused on the detection of vulnerabilities already known in the code. Some past attempts to generate vulnerability forecasts have used time-series machinery: one of the most modern and tangible outcomes is provided by the Vulnerability Forecasting interest group of FIRST, which is periodically updated to reflect yearly and quarterly projections of CVEs: https://github.com/FIRSTdotorg/Vuln4Cast/blob/main/README.md.

Probability of vulnerability as a function of time

Probability of vulnerability as a function of time — For forecasting capacities, the novelty of ProSVED is the prediction of vulnerabilities for individual codebases, as opposed to the entire universe of CVE entries. Note that “predict” here is synonym of forecast—i.e. determine occurrence in a future time point—as opposed to the ML-interpretation of the term which could also be understood as “detect”.

A hurdle is that, when considering an individual code base such as the source code of a single library, security vulnerabilities become rare events. This hinders statistical fitting and is commonly combated with data aggregation—cf. the Vulnerability Forecasting approach to work on the entire CVE dataset. To generate more specific forecasts, ProSVED proposes divisions of the learning sets by attributes that are known or suspected to affect security vulnerability occurrence, such as library size, seniority of developers, and functional purpose.

Probability of vulnerabilities in libraries with little exposure to the Internet

From a singled-out set of libraries, ProSVED measures the time elapsed between the release of the source code and the publication of a CVE for it, fitting statistical models to come up with probability density functions (PDFs) for the publication of a CVE since code release.

This provides individual PDFs for specific types of codebases, that can be linked to the nodes that compose a TDT, by determining which type of library each such node represents. Integrating these functions over time yields pointwise probability values, that represent the likelihood of having a new vulnerability (CVE) released for a codebase that our project is using. Depending on the severity of the vulnerability, or more fine-grained information such as the potential attack vector, this can represent a disruptive event that forces the release of urgent patches. Quantifying these probabilities gives companies concrete estimates of the workload needed in the future, thus facilitating security-related decisions.

Real-world examples and applications

Scientific publications

Journals

  1. :!: COSE
  2. :!: DiB

International conferences

  1. :!: FIG cybersec
  2. :!: ???

Dissemination & events

SFSCON presentation

A social objective of ProSVED is to raise awareness of cybersecurity practices in general, and the importance (and feasibility) of forecasting security vulnerabilities in particular. In this sense, ProSVED has been part of the following scientific and industrial dissemination events:

Special thanks

While ProSVED is driven by Carlos and supervised by Fabio, many more people have influenced its scientific developments and application to existent source code. From that too-long list, we extend our explicit gratitude to the following:

  • R. Paramitha & Y. Feng (Univ. of Trento, IT)
  • A. Hartmanns & S. Nicoletti (Univ. of Twente, NL)
  • I. Pashchenko (TomTom, NL)
  • E. Vicario & L. Carnevali (Univ. of Florence, IT)
  • J. Salonen & A. Karinsalo (VTT, FI)
  • P. RubÉn D'Argenio (Univ. of Córdoba, AR)
  • D. Di Nucci (Univ. of Salerno, IT)
  • G. Di Tizio (Airbus, FR)
prosved.1716052769.txt.gz · Last modified: 2024/05/18 19:19 by carlosesteban.budde@unitn.it