AboutBlogNewsletterLinks

Python 3.12.0 from a supply chain security perspective

Published 2023-10-04 by Seth Larson
Reading time: 8 minutes

This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Python 3.12.0 has been released! 🥳 There are multiple people and tons of steps behind every successful release of Python which I partially documented in the article "Visualizing the CPython release process". This week I was invited to discuss the article on Talk Python with Michael Kennedy. You can listen to the full episode on talkpython.fm.

Improving supply chain integrity of the Python release process

Since the entire process has happened in order to deliver 3.12.0, I wanted to discuss how it can be tweaked to provide better assurances of the integrity of the built artifacts. I'll be discussing these improvements with release managers after the dust has settled around the 3.12.0 release on how to potentially implement these improvements.

Release Tool
Release Tool
CPython RM Fork Repo
CPython RM Fo...
v3.X.YaN
v3.X.YaN
git commit
git tag
git commit...
Source Tarballs
Source Tarbal...
Azure Pipelines
Azure Pipelin...
macOS Build Process
macOS Build P...
Windows Installers
Windows Insta...
macOS Installers
macOS Install...
Release Tool
Release Tool
CPython RM Fork Repo
CPython RM Fo...
v3.X.YaN
v3.X.YaN
git commit
git tag
git commit...
Source Tarballs
Source Tarbal...
Azure Pipelines
Azure Pipelin...
macOS Build Process
macOS Build P...
Windows Installers
Windows Insta...
macOS Installers
macOS Install...
GitHub Actions
& SLSA
GitHub Action...
release.py --export
release.py --...
1
1
3
3
2
2
4
4
5
5
Text is not SVG - cannot display

Left is current process, right is proposed changes.

Currently, the source tarballs for Python 3.12.0 are built locally on the Release Manager's machine from the tagged commit (1). This process being local to the release manager's machine using release-tool (2) means there's no verifiability of the process. Whether intentional or not, the release manager's environment can have an influence on the resulting built artifact, for example if extra files are included or excluded from a dirty build environment or checkout. This is a similar situation for the macOS installer builds (4).

If the source tarballs were instead built using the public commit as a verified input instead of using a local machine, any tampering would either be thwarted (by requiring a known specific commit to be used as input) or would at a minimum be publicly traceable (by the commit itself being public, so any injected code would be publicly visible).

Using a public commit (instead of a tag) also means that if the tag is changed mid-release (which can happen with only write access to the release managers fork (3)) then the commit SHA won't match the expected one for the given Python release. This provides better assurances over using a git tag alone.

I proposed and implemented such a workflow using GitHub Actions (5). This setup is very similar to the one used already for Windows installers and Azure Pipelines, where the input to the workflow is a specific commit and git repository (for releases, this would be the release managers' fork of python/cpython).

Using this implementation I was able to verify that the contents of the Python 3.12.0 source tarballs match exactly with what was built by the 3.12 release manager Thomas Wouters, and that my own build used the same commit as Python 3.12.0 (0fb18b0) using SLSA verifier:

# Unzip the artifacts from GitHub Actions
$ unzip artifacts.zip
$ unzip artifacts.intoto.jsonl.zip

# Use slsa-verifier to verify the workflow and inputs
$ slsa-verifier verify-artifact \
    --provenance-path artifacts.intoto.jsonl \
    # This is the workflow used and the branch
    # In a real run this would be
    # 'python/release-tools' and 'main'.
    --source-uri github.com/sethmlarson/release-tools \
    --source-branch build-source-tar \
    # These are the inputs to the 'workflow_dispatch'
    # event when the build was triggered.
    # We verify the git commit for CPython
    --build-workflow-input git_commit=0fb18b02c8ad56299d6a2910be0bab8ad601ef24 \
    # We verify the git remote used (in this case 'Yhg1s'
    # is Thomas Wouters' GitHub account)
    --build-workflow-input git_remote=Yhg1s \
    # And finally we verify that the release being built
    # is 3.12.0. This feeds into the git tag that's checked
    # out and the name of the tarballs.
    --build-workflow-input cpython_release=3.12.0 \
    src/*

Verified signature against tlog entry index 39982623 at URL: ...
Verified build using builder ...
Verifying artifact src/Python-3.12.0.tar.xz: PASSED

Verified signature against tlog entry index 39982623 at URL: ...
Verified build using builder ...
Verifying artifact src/Python-3.12.0.tgz: PASSED

# Success! 🥳
PASSED: Verified SLSA provenance

Now that we've verified the provenance of these "reference" builds, we can check their contents against the actual builds of CPython:

# Make two directories to make comparing the
# two tarball's contents possible.
$ mkdir a/ b/

# Download the Python tarball from python.org
# and verify it's checksum
$ wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
$ echo '51412956d24a1ef7c97f1cb5f70e185c13e3de1f50d131c0aac6338080687afb  Python-3.12.0.tgz' | sha256sum --check
Python-3.12.0.tgz: OK

# Extract the two tarballs into directories a and b.
# Remember the one in 'a/' is the reference, 'b/' is actual.
$ tar -xzvf src/Python-3.12.0.tgz -C a/
$ tar -xzvf Python-3.12.0.tgz -C b/

# Do a recursive content diff. Unfortunately we can't
# check the tarball checksums directly because metadata
# will be different between the two. No output means
# the two directories :tada:are identical in content! 🥳
$ git diff --no-index a/ b/

Success! The tgz source tarbll contents match exactly what was created by the reference build. Comparing the tar.xz contents are left as an exercise to the reader. ;)

These source tarballs are important because they're the "source of truth" for many installs of Python, especially downstream distributions like Debian or Fedora. These source tarballs also get used automatically by pyenv when compiling from source. So not being verifiable against the commit or tag means that if an attack were to inject code into this build, not having a trace against the public commit would make this type of attack much tougher to detect.

Tracking how Python subcomponents change using SBOMs

Even though the Python APIs haven't changed, there's been lots of movement below the surface. We can use Software Bill-of-Materials (SBOMs) to track the subcomponents of a software distribution and how they change between releases.

I created an SBOM for Python 3.12.0 and then compared the components against the ones included in Python 3.11.6. Comparing the two SBOM documents revealed the differences between the two release streams:

CPython 3.11.6
CPython 3.11.6
mpdecimal 2.5.1
mpdecimal 2.5.1
expat 2.5.0
expat 2.5.0
macholib 1.0
macholib 1.0
pip 23.2.1
pip 23.2.1
libb2 0.98.1
libb2 0.98.1
tiny_sha3
tiny_sha3
libffi 1.20
libffi 1.20
setuptools 65.5.0
setuptools 65.5.0
certifi 2023.7.22
certifi 2023.7.22
chardet 5.1.0
chardet 5.1.0
requests 2.31.0
requests 2.31.0
urllib3 1.26.16
urllib3 1.26.16
cachecontrol 0.13.1
cachecontrol 0.13.1
truststore 0.8.0
truststore 0.8.0
and 18 more...
and 18 more...
packaging 23.1
packaging 23.1
ordered-set 3.1.1
ordered-set 3.1.1
more_itertools 8.8.0
more_itertools 8.8.0
jaraco.text 3.7.0
jaraco.text 3.7.0
zipp 3.7.0
zipp 3.7.0
tomli 2.0.1
tomli 2.0.1
and 2 more...
and 2 more...
CPython 3.12.0
CPython 3.12.0
mpdecimal 2.5.1
mpdecimal 2.5.1
expat 2.5.0
expat 2.5.0
macholib 1.0
macholib 1.0
pip 23.2.1
pip 23.2.1
libb2 0.98.1
libb2 0.98.1
hacl-star
hacl-star
certifi 2023.7.22
certifi 2023.7.22
chardet 5.1.0
chardet 5.1.0
requests 2.31.0
requests 2.31.0
urllib3 1.26.16
urllib3 1.26.16
cachecontrol 0.13.1
cachecontrol 0.13.1
truststore 0.8.0
truststore 0.8.0
and 18 more...
and 18 more...
Text is not SVG - cannot display

The graphic uses diff notation, red is removals green is additions.

Let's dive into those changes:

Seeing mostly removals and replacements with secure implementations is a great sign! Nice work core developers. I'm hoping to get more visibility into the macOS and Windows installers using SBOMs as well, so changes there can also be tracked.

Other items

That's all for this week! 👋 If you're interested in more you can read next week's report or last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.


This work is licensed under CC BY-SA 4.0