Challenges while building SBOM infrastructure for CPython

Published 2024-02-14 by Seth Larson
Reading time: 3 minutes

This critical role would not be possible without funding from the OpenSSF Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

In case you missed it, recently I announced support for Software Bill-of-Materials for the CPython project on the Python Software Foundation blog.

Part of the project is intended to document the challenges that the project faced to start publishing first-party SBOM documents alongside release artifacts. Yesterday I gave a presentation to the OpenSSF SBOM Everywhere SIG, the slides are available in Google Drive, but I'll be summarizing the discussion here:

List of challenges so far for CPython SBOMs

Building for sustainability. There are no guarantees that my role will be around forever. There's a non-zero chance that CPython core developers will need to maintain SBOM infrastructure at some point in the future. This means it needs to have buy-in, be as low-effort to maintain as possible, fit into existing workflows, and be well documented.

This challenge sparked a bunch of conversation in the group about getting buy-in from technical leadership of open source projects. My thoughts being the following:

Recursive dependency bundling: CPython bundles pip in the ensurepip module. pip bundles 18 dependencies. There are even more projects bundled in older CPython versions which was the primary reason I stopped backporting beyond 3.12. This requires a customized tool to handle the job. pip and its dependencies being a part of the Python package ecosystem helps a lot, this means we can automatically do lookups for metadata on PyPI, drastically reducing maintainer effort to keep the SBOM updated.

Software IDs and metadata of C projects: Many of CPython dependencies are C projects (libexpat, mpdecimal, HACL*, etc) and those projects aren't a part of a packaging ecosystem, they're tarballs you download and install by pasting into a directory. This means there aren't any standards for what a version number or name should be. The unfortunate part of this is it means the SBOM has to be updated manually when these projects are updated by core developers.

CPEs also don't exist universally for the projects, CPEs being the primary software vulnerability identifier used for C projects. OSV and PURL don't work for projects outside of packaging ecosystems.

Other items

That's all for this week! 👋 If you're interested in more you can read next week's report or last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0