This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!
Python 3.12.0 has been released! 🥳 There are multiple people and tons of steps behind every successful release of Python which I partially documented in the article "Visualizing the CPython release process". This week I was invited to discuss the article on Talk Python with Michael Kennedy. You can listen to the full episode on talkpython.fm.
Since the entire process has happened in order to deliver 3.12.0, I wanted to discuss how it can be tweaked to provide better assurances of the integrity of the built artifacts. I'll be discussing these improvements with release managers after the dust has settled around the 3.12.0 release on how to potentially implement these improvements.
Currently, the source tarballs for Python 3.12.0 are built locally on the Release Manager's machine from the tagged commit (1). This process being local to the release manager's machine using release-tool (2) means there's no verifiability of the process. Whether intentional or not, the release manager's environment can have an influence on the resulting built artifact, for example if extra files are included or excluded from a dirty build environment or checkout. This is a similar situation for the macOS installer builds (4).
If the source tarballs were instead built using the public commit as a verified input instead of using a local machine, any tampering would either be thwarted (by requiring a known specific commit to be used as input) or would at a minimum be publicly traceable (by the commit itself being public, so any injected code would be publicly visible).
Using a public commit (instead of a tag) also means that if the tag is changed mid-release (which can happen with only write access to the release managers fork (3)) then the commit SHA won't match the expected one for the given Python release. This provides better assurances over using a git tag alone.
I proposed and implemented
such a workflow using GitHub Actions (5). This setup is very similar to the one used already for Windows installers
and Azure Pipelines, where the input to the workflow is a specific commit and git repository (for releases, this would be
the release managers' fork of
Using this implementation I was able to verify that the contents of the Python 3.12.0 source tarballs match exactly
with what was built by the 3.12 release manager Thomas Wouters, and that my own build used the same commit as Python 3.12.0 (
0fb18b0) using SLSA verifier:
# Unzip the artifacts from GitHub Actions $ unzip artifacts.zip $ unzip artifacts.intoto.jsonl.zip # Use slsa-verifier to verify the workflow and inputs $ slsa-verifier verify-artifact \ --provenance-path artifacts.intoto.jsonl \ # This is the workflow used and the branch # In a real run this would be # 'python/release-tools' and 'main'. --source-uri github.com/sethmlarson/release-tools \ --source-branch build-source-tar \ # These are the inputs to the 'workflow_dispatch' # event when the build was triggered. # We verify the git commit for CPython --build-workflow-input git_commit=0fb18b02c8ad56299d6a2910be0bab8ad601ef24 \ # We verify the git remote used (in this case 'Yhg1s' # is Thomas Wouters' GitHub account) --build-workflow-input git_remote=Yhg1s \ # And finally we verify that the release being built # is 3.12.0. This feeds into the git tag that's checked # out and the name of the tarballs. --build-workflow-input cpython_release=3.12.0 \ src/* Verified signature against tlog entry index 39982623 at URL: ... Verified build using builder ... Verifying artifact src/Python-3.12.0.tar.xz: PASSED Verified signature against tlog entry index 39982623 at URL: ... Verified build using builder ... Verifying artifact src/Python-3.12.0.tgz: PASSED # Success! 🥳 PASSED: Verified SLSA provenance
Now that we've verified the provenance of these "reference" builds, we can check their contents against the actual builds of CPython:
# Make two directories to make comparing the # two tarball's contents possible. $ mkdir a/ b/ # Download the Python tarball from python.org # and verify it's checksum $ wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz $ echo '51412956d24a1ef7c97f1cb5f70e185c13e3de1f50d131c0aac6338080687afb Python-3.12.0.tgz' | sha256sum --check Python-3.12.0.tgz: OK # Extract the two tarballs into directories a and b. # Remember the one in 'a/' is the reference, 'b/' is actual. $ tar -xzvf src/Python-3.12.0.tgz -C a/ $ tar -xzvf Python-3.12.0.tgz -C b/ # Do a recursive content diff. Unfortunately we can't # check the tarball checksums directly because metadata # will be different between the two. No output means # the two directories :tada:are identical in content! 🥳 $ git diff --no-index a/ b/
tgz source tarbll contents match exactly what was created by the reference build.
tar.xz contents are left as an exercise to the reader. ;)
These source tarballs are important because they're the "source of truth" for many installs of Python, especially downstream distributions like Debian or Fedora. These source tarballs also get used automatically by pyenv when compiling from source. So not being verifiable against the commit or tag means that if an attack were to inject code into this build, not having a trace against the public commit would make this type of attack much tougher to detect.
Even though the Python APIs haven't changed, there's been lots of movement below the surface. We can use Software Bill-of-Materials (SBOMs) to track the subcomponents of a software distribution and how they change between releases.
I created an SBOM for Python 3.12.0 and then compared the components against the ones included in Python 3.11.6. Comparing the two SBOM documents revealed the differences between the two release streams:
Let's dive into those changes:
ensurepipmodule which was previously needed to bootstrap pip and venv in a Python environment. Now in 3.12.0 there's only a bundled copy of
pipwhich includes many other bundled dependencies like Requests and certifi. These packages still need to be captured in the CPython SBOM.
Seeing mostly removals and replacements with secure implementations is a great sign! Nice work core developers. I'm hoping to get more visibility into the macOS and Windows installers using SBOMs as well, so changes there can also be tracked.
Don't let social media algorithms decide what you want to see.
This work is licensed under