Contributed"> Binary Provenance, SBOMs and the Software Supply Chain for Humans - The New Stack
TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Microservices / Operations / Software Development

Binary Provenance, SBOMs and the Software Supply Chain for Humans

Explore how these concepts help trace code origins, understand software components and secure the development-to-deployment journey.
Oct 1st, 2024 10:00am by
Featued image for: Binary Provenance, SBOMs and the Software Supply Chain for Humans

“What’s really running in prod?”

Every engineer will hear these immortal words on a long enough timeline (or career). It might be because a new security zero day was dropped, alerts fired from the depths of a vast microservice architecture, or you might just be looking to know what commit was actually tested. Either way, it often comes with the promise of a stressful day.

Let’s demystify three critical concepts for delivering secure, reliable software: binary provenance, software bills of materials (SBOMs) and the software supply chain.

These are not just industry buzzwords — and they’re also not industry buzzwords — they’re vital tools for ensuring software integrity and transparency in an era of increasing cyberthreats and complex development processes. Let’s explore how these concepts help trace code origins, understand software components and secure the development-to-deployment journey.

Here are the basics about digital fingerprints, tracking open source dependencies and supply chain security. By the end, you’ll be able to grasp what these terms mean, why they’re crucial for modern software development and how to start implementing them in your projects.

Let’s begin with binary provenance — knowing your software’s true origins.

Binary Provenance 

In software, binary provenance is the answer to the question, ” Where did this artifact come from?”

Say you find a Docker image running on a host. How can you trace its origin? How do you know it was built in CI? How do you know what source commit or even repo it comes from? How do you know it hasn’t been tampered with?

If all you have is the image, answering these questions can be extremely difficult. If you are lucky, you might be able to find some of the information in CI logs or container registries, but the hunt will take a lot of time.

So, what is the solution?

What if we got the build process to create records for every build we make?

The key to making this work is content-addressable storage. This is a fancy term, but its meaning is straightforward: the file’s contents (in our case, the docker image) are the identity. You might ask yourself: How does this work? Well, it’s pretty simple: We use a cryptographic hashing function like sha256 to create a digest for the artifact.

This approach has some very powerful properties:

  • Zero trust: We don’t need to believe version numbers, image tags or metadata identify artifacts; the digest uniquely identifies the binary.
  • Tamper-evident: If a single byte in the software changes, it will have a different identity.

Binary provenance ensures we can’t qualify one software artifact and deploy a different one. It also allows us to create a provable chain of custody from commit to build to production. This approach overcomes the limitations of relying solely on CI logs or container registries, which can be incomplete or manipulated.

Here’s how you can use OpenSSL to get the cryptographic fingerprint for a file:

And if you are using Docker images, you can query the digest from the command line:

When you create the provenance document, you probably want to store it in a system of record. There are many options for this, ranging from the open source sigstore project to artifact management systems and software delivery evidence management (SDEM) tools like Kosli. It is important that it is stored in a secure, append-only system.

Software Bill of Materials 

OK, so hopefully, now you have records of where your software is coming from. The next question to answer is, “What’s inside this software artifact?” This is where SBOMs can help.

SBOM stands for software bill of materials. It’s essentially a formal, machine-readable inventory of all the ingredients (components and dependencies) in your software recipe.

Software systems are built using vast amounts of open source components. In addition to your code, a typical software project contains many open source libraries and base container images. Consider this:

  1. Security vulnerabilities can hide in any part of your supply chain. Licensing issues might be lurking in any of those dependencies.
  2. Updates to third-party components can break your app in unexpected ways.

The software supply chain is all about understanding and managing these risks. It’s about having visibility into every ingredient in your software recipe.

The key benefit is transparency: SBOMs give you complete insight into your software supply chain. An SBOM contains a complete description of the component names, versions, licenses and other metadata for dependencies pulled into the build.

There are many SBOM formats, but the leading choices are SPDX and CycloneDX. There are many tools available for generating SBOMs. Here is an example from our own CI pipeline, which uses the open source anchore/sbom-action tool:

SBOM Formats

This will produce a file containing a complete package inventory in the Docker image. It looks like this (Please note: this is truncated and just an example of the sheer size of the file).

If you store this information with the binary provenance, you begin a software supply chain.

  • Where did this software originate?
  • What is included in the package?

Now, you can answer important questions about security vulnerabilities, identify outdated dependencies, and monitor licensing and compliance issues.

Software Supply Chain 

Now that we know about binary provenance and SBOMs, how do these fit into the bigger picture of software delivery? This is where the concept of the software supply chain comes in.

Think of the software supply chain as the journey your code and all its dependencies take from idea to production. Modern software delivery is a factory of integrated tools, processes and human activities.

The software supply chain is all about traceability. It gives you visibility into every ingredient in your software recipe and every step of the journey.

You’ll notice that binary provenance and SBOMs produce metadata around a single step on this journey. These metadata files are often called attestations. A comprehensive supply chain will provide centralized records for every step you value, including code reviews, security scans and even runtime events.

Conclusion 

Binary provenance, SBOMs and the software supply chain are intrinsically linked, and each plays a crucial role in modern software development and security.

Binary provenance provides the foundation, ensuring we can trace and verify the origin of every software artifact. SBOMs build upon this, offering a detailed inventory of all components within our software. The software supply chain ties it all together, giving us a comprehensive view of our code’s journey from inception to deployment.

These concepts don’t exist in isolation — they work together to enhance security, improve efficiency and build trust. By implementing these practices, you’ll be able to respond swiftly to vulnerabilities, deploy with greater confidence and maintain transparency throughout the development process.

And you’ll always know what’s in prod.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Docker.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.