Python and SLSA 💃

Published 2023-03-18 by Seth Larson
Reading time: 10 minutes

Supply chain Layers for Software Artifacts (SLSA) is a framework of tools to generate and verify provenance for software artifacts. In the Python ecosystem there are two main types of software artifacts: wheels and source distributions.

How can we use the SLSA framework to generate and verify the provenance of Python artifacts?

Contents

NOTE: This article primarily covers Python projects which are hosted on GitHub. The SLSA framework works out of the box with GitHub Actions and GitHub OpenID Connect with minimal configuration. You can use the SLSA framework without using GitHub, but will potentially require more configuration.

is what the end-to-end workflow for both maintainers and users looks like going from building the distributions, creating a provenance attestation, publishing to PyPI, and installing a wheel after verifying its provenance. Let's walk through each step together!

If you're curious about terminology or processes for Python packaging the Python Packaging User Guide is the definitive place to learn more.

python -m build
python -m build
Source Code
Source Code
cibuildwheel
cibuildwheel
Source Distribution & Wheels
Source Distribution...
sha256sum *
sha256sum *
SLSA GitHub Builder
SLSA GitHub Builder
PyPI Publish GitHub Action
PyPI Publish GitHub...
GitHub Release
GitHub Release
PyPI Release
PyPI Release
Provenance Attestation
Provenance Attestati...
Wheel
Wheel
Provenance Attestation
Provenance Attestati...
pip download
pip download
curl
curl
SLSA Verifier
SLSA Verifier
Verified
Verified
pip install
pip install
Verify & Install
Verify & Install
Publish
Publish
Build
Build
Text is not SVG - cannot display

End-to-end flow from building dists, generating and verifying provenance, and installing with pip.

Building pure Python packages

Pure Python packages typically only have two artifacts, a source distribution and a pure Python wheel. Pure Python packages can be built from source using a package called build.

Below is the GitHub Actions job definition that builds a pure Python wheel and source distribution and creates SHA-256 hashes of each artifact:

jobs:
  build:
    steps:
      - uses: actions/checkout@...
      - uses: actions/setup-python@...
        with:
          python-version: 3.x
      - run: |
          # Install 'build' and create sdist & wheel
          python -m pip install build
          python -m build

          # Gather hashes of all files
          cd dist && echo "hashes=$(sha256sum * | base64 -w0)" >> $GITHUB_OUTPUT
      - uses: actions/upload-artifacts@...
        with:
          path: ./dist

The built distributions get uploaded to GitHub Artifacts storage to be used later in the "upload to PyPI" job. We also store the hashes of each distribution in the hashes output so it can be used as an input to the provenance job.

NOTE: SLSA uses the output of sha265sum as the input to the "subject" field in the provenance attestation. The output of sha256sum is one or more (name, hash) pairs so the cd dist/ && sha256sum * is done to avoid having dist/... in the subject name of each artifact. This is purely for aesthetic reasons, as the hash is the actual identifier for the artifact.

Generating a provenance attestation

Now that we've built our sdist and wheel we can generate a provenance attestation from the file hashes. Since we're taking the output of another GitHub Action job as an input we configure the needs option for the provenance job. See the hashes being used below in subject-base64:

jobs:
  provenance:
    needs: [build]
    uses: slsa-framework/slsa-github-builder/.github/workflows/generator_generic_slsa3.yml@v1.5.0
    permissions:
      # Needed to detect the GitHub Actions environment
      actions: read
      # Needed to create the provenance via GitHub OIDC
      id-token: write
      # Needed to create and upload to GitHub Releases
      contents: write
    with:
      # SHA-256 hashes of the Python distributions.
      subject-base64: ${{ provenance.needs.build.output.hashes }}
      # Uploads the provenance file to a draft GitHub Release.
      upload-assets: true

You'll notice that this doesn't define any individual steps like is typical for a GitHub Workflow. Instead, SLSA builders use reusable workflows feature in order to prove that a given builder behavior can't be modified by the user or another process.

Provenance attestations files are in-toto JSON lines files that end in .intoto.jsonl. *.intoto.jsonl files can contain attestations for multiple artifacts (as will almost always be the case for Python due to sdist/wheels) but can also contain multiple provenance attestations within the same file. The .jsonl format means that this file is a "JSON lines" file, meaning one JSON document per line.

NOTE: Confusingly the GitHub job permission "id-token" requires the "write" permission to read the GitHub OIDC token. A value of "read" doesn't allow you to read the value... 🤷. See the GitHub documentation for more info on the "id-token" job permission.

Uploading to PyPI

Source Distribution & Wheels
Source Distribution...
sha256sum *
sha256sum *
SLSA GitHub Builder
SLSA GitHub Builder
PyPI Publish GitHub Action
PyPI Publish GitHub...
GitHub Release
GitHub Release
PyPI Release
PyPI Release
Provenance Attestation
Provenance Attestati...
Publish
Publish
Text is not SVG - cannot display

We're using the official pypa/gh-action-pypi-publish GitHub Action to upload our wheels to PyPI.

Notice that the publish job requires both the build and provenance jobs to be complete before starting. This means that we can assume that the provenance job has already created a GitHub Release draft for us (thanks to the upload-assets: true setting) and we can assume that the job succeeded. We wouldn't want to upload these distributions to PyPI without creating a provenance file first so we upload to PyPI last.

publish:
  needs: ["build", "provenance"]
  permissions:
    contents: write
  runs-on: "ubuntu-latest"
  steps:
  # Download the built distributions
  - uses: "actions/download-artifact@..."
    with:
      name: "dist"
      path: "dist/"
  # Upload distributions to the GitHub Release
  - env:
      GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
    run: |
      gh release upload ${{ github.ref_name }} \
        dist/* --repo ${{ github.repository }}
  # Publish distributions to PyPI
  - uses: "pypa/gh-action-pypi-publish@..."
    with:
      user: __token__
      password: ${{ secrets.PYPI_TOKEN }}

Notice that the publish job requires both the build and provenance jobs to be complete before starting. This means that we can assume that the provenance job has already created a GitHub Release draft for us (thanks to the upload-assets: true setting) and we can assume that the job succeeded. We wouldn't want to upload these distributions to PyPI without creating a provenance file first so we upload to PyPI last.

Verifying provenance of a Python package

GitHub Release
GitHub Release
PyPI Release
PyPI Release
Wheel
Wheel
Provenance Attestation
Provenance Attestati...
pip download
pip download
curl
curl
SLSA Verifier
SLSA Verifier
Verified
Verified
pip install
pip install
Verify & Install
Verify & Install
Text is not SVG - cannot display

Let's take a real Python package and verify its provenance. urllib3 has published a provenance attestation to GitHub Releases for the past few releases. Let's use v1.26.14 for this example. To start we need to download the slsa-verifier tool for our architecture (Using `linux-amd64` in the below examples).

After we download the slsa-verifier tool let's fetch the urllib3 wheel from PyPI without installing it using pip download. We use the --only-binary option to force pip to download a wheel rather than a source distribution.

$ python -m pip download \
  --only-binary=:all: urllib3

Collecting urllib3
  ...
Saved ./urllib3-1.26.14-py2.py3-none-any.whl
Successfully downloaded urllib3

After we've downloaded the package we need to download the provenance attestation from the GitHub Release. We need to use the same GitHub Release as the package version to receive the correct attestation:

$ curl --location -O \
  https://github.com/urllib3/urllib3/releases/download/1.26.14/multiple.intoto.jsonl

NOTE: The -O option for curl saves the response data into a file based on the requested path, in this case the data will be saved to multiple.intoto.jsonl. The --location flag means that curl will follow any redirects.

The name multiple.intoto.jsonl is the standard name for a provenance attestation that contains attestations for multiple artifacts. This will almost always be the case for Python projects since there is almost always a source distribution and at least one wheel.

At this point we should have two files in our current working directory, the wheel and the provenance attestation. Let's take a quick peek just to make sure:

$ ls

multiple.intoto.jsonl
urllib3-1.26.14-py2.py3-none-any.whl

Looking good so far!

From here we can verify the provenance using slsa-verifier. We can verify the most important thing, which GitHub repository actually built the wheel, along with other bits of information like the git tag, branch, and builder ID:

So if we wanted to verify the source repository of the wheel we use --source-uri:

# Verifying just the GitHub repository
$ slsa-verifier-linux-amd64 verify-artifact \
    --provenance-path multiple.intoto.jsonl \
    --source-uri github.com/urllib3/urllib3 \
    urllib3-1.26.14-py2.py3-none-any.whl

Verified signature against tlog entry index 10945443 at URL: https://rekor.sigstore.dev/api/v1/log/entries/24296fb24b8ad77aa7bf06593bb44761ad6ef9f7f0f9ac8fe028a52cc6f926039cdef1652a064bc1
Verified build using builder https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v1.2.1 at commit f96a1cfc568beddf1e17ce7609609eca40780be5
PASSED: Verified SLSA provenance

Success! 🥳 We've verified this wheel's provenance so now we can install it knowing that it was built as we expect:

$ python -m pip install ./urllib3-1.26.14-py2.py3-none-any.whl

Processing ./urllib3-1.26.14-py2.py3-none-any.whl
Installing collected packages: urllib3
Successfully installed urllib3-1.26.14

Binary Python wheels

Python wheels are not all pure Python! One of Python's greatest strengths, being the glue language for C, C++, Fortran, Rust, Go (and more), complicates Python's packaging and provenance story compared to some other programming languages.

Binary wheels need to be built for multiple platforms, architectures, and if the project isn't able to use the Stable ABI then new wheels need to be compiled for every new Python version. To get a sense of how many wheels are needed by projects in this situation you can take a look at MarkupSafe which for v2.1.2 shipped nearly 50 wheels. When a new Python version is released there will be at least 10 more to cover all the platforms for Python 3.12 as well.

Build
Build
python -m build --sdist
python -m build --sd...
cibuildwheel
cibuildwheel
Source Code
Source Code
Source Distribution & Wheels
Source Distribu...
GitHub Actions
GitHub Actions
Linux
Linux
macOS
macOS
Windows
Windows
Text is not SVG - cannot display

Unfortunately this means we'll need to create new artifacts some time after the initial release, which doesn't have a straightforward story in SLSA. There are two solutions to this problem:

Creating a new release

The simplest way is to create a new release for the package after updating the cibuildwheel action to the latest to support all the new wheel targets. This will create a new provenance attestation, sdist, and wheels. However, this feels like an unfortunate outcome if nothing about the source code of the project has changed.

Compared to new wheels, new releases cause much more churn:

Provenance for post-release wheels

So what can be done if we don't want to create a new release just for provenance of new wheels?

Markupsafe's solution for this is adding a manual workflow_dispatch trigger to run the typical wheel-building workflows but configured to only build new wheels for a given Python version (ie cp312 for CPython 3.12). These wheels then get attested to with SLSA and uploaded to PyPI like normal, but the new provenance attestation file gets added to the existing GitHub release artifacts.

This works in that all artifacts on PyPI have a provenance attestation. There are a few downsides:

Another potential solution to the first point above is to create a completely new provenance attestation containing hashes of both old and new artifacts. I'll be experimenting with this more and if successful providing guidance, stay tuned!

Projects in use

These are all the GitHub Actions and tools that this article uses:

The following projects are used as examples of how to configure Python releases with SLSA in GitHub Actions:

Diagrams

All the SVG diagrams above are available on GitHub and were made with drawio.

Wow, you made it to the end!

If you're like me, you don't believe social media should be the way to get updates on the cool stuff your friends are up to. Instead, you should either follow my blog with the RSS reader of your choice or via my email newsletter for guaranteed article publication notifications.

If you really enjoyed a piece I would be grateful if you shared with a friend. If you have follow-up thoughts you can send them via email.

Thanks for reading!
— Seth