Published 2023-03-18 by Seth Michael Larson
Reading time: 10 minutes
Subscribe for more content like this through the mailing list or RSS.
Supply chain Layers for Software Artifacts (SLSA) is a framework of tools to generate and verify provenance for software artifacts. In the Python ecosystem there are two main types of software artifacts: wheels and source distributions.
How can we use the SLSA framework to generate and verify the provenance of Python artifacts?
NOTE: This article primarily covers Python projects which are hosted on GitHub. The SLSA framework works out of the box with GitHub Actions and GitHub OpenID Connect with minimal configuration. You can use the SLSA framework without using GitHub, but will potentially require more configuration.
BelowTo the right is what the end-to-end workflow for both maintainers and users looks like going from building the distributions, creating a provenance attestation, publishing to PyPI, and installing a wheel after verifying its provenance. Let's walk through each step together!
If you're curious about terminology or processes for Python packaging the Python Packaging User Guide is the definitive place to learn more.
End-to-end flow from building dists, generating and verifying provenance, and installing with pip.
Pure Python packages typically only have two artifacts, a source distribution and a pure Python wheel.
Pure Python packages can be built from source using a package called build
.
Below is the GitHub Actions job definition that builds a pure Python wheel and source distribution and creates SHA-256 hashes of each artifact:
jobs:
build:
steps:
- uses: actions/checkout@...
- uses: actions/setup-python@...
with:
python-version: 3.x
- run: |
# Install 'build' and create sdist & wheel
python -m pip install build
python -m build
# Gather hashes of all files
cd dist && echo "hashes=$(sha256sum * | base64 -w0)" >> $GITHUB_OUTPUT
- uses: actions/upload-artifacts@...
with:
path: ./dist
The built distributions get uploaded to GitHub Artifacts storage to be used later in the "upload to PyPI" job.
We also store the hashes of each distribution in the hashes
output so it can be used
as an input to the provenance
job.
NOTE: SLSA uses the output of
sha265sum
as the input to the "subject" field in the provenance attestation. The output ofsha256sum
is one or more (name, hash) pairs so thecd dist/ && sha256sum *
is done to avoid havingdist/...
in the subject name of each artifact. This is purely for aesthetic reasons, as the hash is the actual identifier for the artifact.
Now that we've built our sdist and wheel we can generate a provenance attestation from the file hashes.
Since we're taking the output of another GitHub Action job as an input we configure the needs
option
for the provenance
job. See the hashes being used below in subject-base64
:
jobs:
provenance:
needs: [build]
uses: slsa-framework/slsa-github-builder/.github/workflows/generator_generic_slsa3.yml@v1.5.0
permissions:
# Needed to detect the GitHub Actions environment
actions: read
# Needed to create the provenance via GitHub OIDC
id-token: write
# Needed to create and upload to GitHub Releases
contents: write
with:
# SHA-256 hashes of the Python distributions.
subject-base64: ${{ provenance.needs.build.output.hashes }}
# Uploads the provenance file to a draft GitHub Release.
upload-assets: true
You'll notice that this doesn't define any individual steps like is typical for a GitHub Workflow. Instead, SLSA builders use reusable workflows feature in order to prove that a given builder behavior can't be modified by the user or another process.
Provenance attestations files are in-toto JSON lines files that end in .intoto.jsonl
.
*.intoto.jsonl
files can contain attestations for multiple artifacts (as will almost always be the case for Python due to sdist/wheels)
but can also contain multiple provenance attestations within the same file. The .jsonl
format means
that this file is a "JSON lines" file, meaning one JSON document per line.
NOTE: Confusingly the GitHub job permission "
id-token
" requires the "write
" permission to read the GitHub OIDC token. A value of "read
" doesn't allow you to read the value... 🤷. See the GitHub documentation for more info on the "id-token
" job permission.
We're using the official pypa/gh-action-pypi-publish
GitHub Action to upload our wheels to PyPI.
Notice that the publish
job requires both the build
and provenance
jobs to be complete before starting.
This means that we can assume that the provenance job has already created a GitHub Release draft for us (thanks to the upload-assets: true
setting)
and we can assume that the job succeeded. We wouldn't want to upload these distributions to PyPI without creating a provenance file first
so we upload to PyPI last.
publish:
needs: ["build", "provenance"]
permissions:
contents: write
runs-on: "ubuntu-latest"
steps:
# Download the built distributions
- uses: "actions/download-artifact@..."
with:
name: "dist"
path: "dist/"
# Upload distributions to the GitHub Release
- env:
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
run: |
gh release upload ${{ github.ref_name }} \
dist/* --repo ${{ github.repository }}
# Publish distributions to PyPI
- uses: "pypa/gh-action-pypi-publish@..."
with:
user: __token__
password: ${{ secrets.PYPI_TOKEN }}
Notice that the publish
job requires both the build
and provenance
jobs to be complete before starting.
This means that we can assume that the provenance job has already created a GitHub Release draft for us (thanks to the upload-assets: true
setting)
and we can assume that the job succeeded. We wouldn't want to upload these distributions to PyPI without creating a provenance file first
so we upload to PyPI last.
Let's take a real Python package and verify its provenance. urllib3 has published
a provenance attestation to GitHub Releases for the past few releases. Let's
use v1.26.14 for this example. To start we need
to download the slsa-verifier
tool for our architecture (Using `linux-amd64` in the below examples).
After we download the slsa-verifier
tool let's fetch the urllib3 wheel from PyPI
without installing it using pip download
. We use the --only-binary
option to
force pip to download a wheel rather than a source distribution.
$ python -m pip download \
--only-binary=:all: urllib3
Collecting urllib3
...
Saved ./urllib3-1.26.14-py2.py3-none-any.whl
Successfully downloaded urllib3
After we've downloaded the package we need to download the provenance attestation from the GitHub Release. We need to use the same GitHub Release as the package version to receive the correct attestation:
$ curl --location -O \
https://github.com/urllib3/urllib3/releases/download/1.26.14/multiple.intoto.jsonl
NOTE: The
-O
option for curl saves the response data into a file based on the requested path, in this case the data will be saved tomultiple.intoto.jsonl
. The--location
flag means that curl will follow any redirects.
The name multiple.intoto.jsonl
is the standard name for a provenance attestation that contains attestations for multiple
artifacts. This will almost always be the case for Python projects since there is almost always a source distribution and
at least one wheel.
At this point we should have two files in our current working directory, the wheel and the provenance attestation. Let's take a quick peek just to make sure:
$ ls
multiple.intoto.jsonl
urllib3-1.26.14-py2.py3-none-any.whl
Looking good so far!
From here we can verify the provenance using slsa-verifier
. We
can verify the most important thing, which GitHub repository actually built the wheel,
along with other bits of information like the git tag, branch, and builder ID:
--source-uri
)--builder-id
)--source-branch
)--source-tag
)So if we wanted to verify the source repository of the wheel we use --source-uri
:
# Verifying just the GitHub repository
$ slsa-verifier-linux-amd64 verify-artifact \
--provenance-path multiple.intoto.jsonl \
--source-uri github.com/urllib3/urllib3 \
urllib3-1.26.14-py2.py3-none-any.whl
Verified signature against tlog entry index 10945443 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77aa7bf06593bb44761ad6ef9f7f0f9ac8fe028a52cc6f926039cdef1652a064bc1
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.2.1 at commit f96a1cfc568beddf1e17ce7609609eca40780be5
PASSED: Verified SLSA provenance
Success! 🥳 We've verified this wheel's provenance so now we can install it knowing that it was built as we expect:
$ python -m pip install ./urllib3-1.26.14-py2.py3-none-any.whl
Processing ./urllib3-1.26.14-py2.py3-none-any.whl
Installing collected packages: urllib3
Successfully installed urllib3-1.26.14
Python wheels are not all pure Python! One of Python's greatest strengths, being the glue language for C, C++, Fortran, Rust, Go (and more), complicates Python's packaging and provenance story compared to some other programming languages.
Binary wheels need to be built for multiple platforms, architectures, and if the project isn't able to use the Stable ABI then new wheels need to be compiled for every new Python version. To get a sense of how many wheels are needed by projects in this situation you can take a look at MarkupSafe which for v2.1.2 shipped nearly 50 wheels. When a new Python version is released there will be at least 10 more to cover all the platforms for Python 3.12 as well.
Unfortunately this means we'll need to create new artifacts some time after the initial release, which doesn't have a straightforward story in SLSA. There are two solutions to this problem:
The simplest way is to create a new release for the package
after updating the cibuildwheel
action to the latest to support all the new wheel targets.
This will create a new provenance attestation, sdist, and wheels. However, this feels
like an unfortunate outcome if nothing about the source code of the project has changed.
Compared to new wheels, new releases cause much more churn:
So what can be done if we don't want to create a new release just for provenance of new wheels?
Markupsafe's solution for this is adding a manual workflow_dispatch
trigger
to run the typical wheel-building workflows but configured to only build new wheels for a given Python version (ie cp312
for CPython 3.12).
These wheels then get attested to with SLSA and uploaded to PyPI like normal, but the new provenance attestation file gets added to the
existing GitHub release artifacts.
This works in that all artifacts on PyPI have a provenance attestation. There are a few downsides:
The new provenance attestation is in a separate artifact on the GitHub Release rather than in the existing provenance attestation. This means that users looking to verify the provenance of the new wheels would likely need some manual discovery and steps to download the right provenance attestation file.
Typically new architectures or platforms require a new version of cibuildwheel
. Since there will need to be updates
to the source code in order to bump the cibuildwheel
version in use the git tag for the past release will not
match the exact git commit needed to build the new wheels. This means that the provenance attestation won't include information
about the git tag so verifying with the --source-tag
option won't work as expected.
Another potential solution to the first point above is to create a completely new provenance attestation containing hashes of both old and new artifacts. I'll be experimenting with this more and if successful providing guidance, stay tuned!
These are all the GitHub Actions and tools that this article uses:
The following projects are used as examples of how to configure Python releases with SLSA in GitHub Actions:
All the SVG diagrams above are available on GitHub and were made with drawio.