Institut Polytechnique de Paris
Ecole Polytechnique ENSTA Ecole des Ponts ENSAE Télécom Paris Télécom SudParis
Share

When the code changes...

30 Oct. 2025
Stefano Zacchiroli is a professor at the Information Processing and Communication Laboratory (LTCI*) at Télécom Paris, and co-founder of the Software Heritage project. Drawing on this global archive of open source software, he offers tools that can be used to trace the evolution of the code used in most products on the market and detect anomalies between different versions. This work contributes to the security of the open source software supply chain and to European sovereignty in this area. He spoke at the IP Paris Cybersecurity and Defense Conference on October 21, 2025.
When the code changes...
Stefano Zacchiroli presented his work on October 21, 2025, at the Cybersecurity and Defense Meetings held by the Institut Polytechnique de Paris.

What is the context of your research?

I am the co-founder and scientific director of the Software Heritage project, which is the world's largest archive of source code and software development history. More than 300 million projects are stored there for the purpose of preserving intangible assets (with the support of UNESCO), but also for scientific purposes, with applications in research and industry. On this last point, the stakes are high because 99% of digital products today reuse open source code. It is therefore useful to be able to trace the history of a piece of code and verify whether the version on which a piece of software is running is consistent with the one archived in Software Heritage. My work also focuses on the security of the software supply chain. In recent years, cyberattackers have been trying to introduce vulnerabilities into the open source components of products on the market. Government policies to secure the open source supply chain are being rolled out in Europe and the United States in response to this phenomenon.

How do you carry out this work?

We use a technique known as vulnerability propagation. We rely on the Software Heritage archive—which interlinks all source code development histories—to determine when vulnerabilities are introduced into source code and, if necessary, apply a patch. To do this, our engineers have developed tools to propagate external information collected about vulnerabilities and their patches across our entire archive (2,000 TB of data or several billion software versions). We are then able to alert users of source code that is recognized as vulnerable but not yet patched. This information is freely accessible, and we are the only ones in the world who can offer this service, independently of a specific platform such as GitHub, for example.

We can also detect anomalies in the repositories (GitHub or other) that developers use to collaborate. We have noticed that certain versions of software observed at a given moment in time show variations after a certain period of time. This phenomenon raises questions because these technologies are usually used to add code but should only very rarely delete previous versions. So either code has been removed for legal reasons, or it has been rewritten for other reasons, with the risk of introducing malicious code that is difficult to detect. We have therefore developed a tool capable of comparing two archive visits to the same repository and detecting these anomalies. 

What are the possible applications of your work in terms of cybersecurity?

We are working extensively on reproducible builds, i.e., the process of reproducing identical code or software. Differences can indeed arise between the version of software developed by a producer and the version that is ultimately used. Playstores are sometimes the victims of attacks and unwittingly offer versions of applications that have been recompiled with vulnerabilities or modifications requested by an external agent (the US government could, for example, require Apple or Google to introduce spyware into certain applications available for download from their stores). There is therefore a real need to verify, or even certify (through a cryptographic signature, for example), that the user version is indeed compliant with the original. To this end, we are working with developers to archive and make important codes accessible, thereby obtaining reproducible builds.

And in terms of defense?

Our work guarantees a form of resilience and sovereignty in the field of open source software. On the one hand, by offering global visibility on the origin of existing codes around the world. On the other hand, by maintaining their availability over time: what would happen if tomorrow the US no longer allowed access to GitHub or if the intellectual property regime changed retroactively in China, rendering today's open source code unusable? Finally, our archive reduces our dependence on third-party platforms for accessing code. We also have independent copies of our archive in various European countries. 

 

Stefano Zacchiroli is a professor of computer science at Télécom Paris, Institut Polytechnique de Paris. His current research interests include digital commons, open source software engineering, computer security, and the software supply chain. He is co-founder and Chief Scientific Officer of Software Heritage, the largest public archive of software source code. He has been Debian project leader three times and is a former member of the board of directors of the Open Source Initiative (OSI). He is the recipient of the O'Reilly Open Source Award in 2015 and the Google Award for Inclusion Research in 2022.

>> Stefano Zacchiroli's Télécom Paris webpage

>> Stefano Zacchiroli on Google Scholar

 

*LTCI : a research lab Télécom Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France