sethmlarson.dev

A blog about Python and the Internet

AboutBlogTwitterGitHubLinkedIn

Experimental APIs in Python 3.10 and the future of trust stores

Published 2021-11-27

⚠️ The APIs mentioned below aren't documented in the Python docs or release notes and are experimental. I recommend not using them until they are stable.

In Python 3.10.0 there were a few new APIs added to the ssl module related to certificate chains that weren't listed in the Python 3.10 release notes due to being experimental. I discovered these APIs because I follow the ssl module closely as any changes to the module will likely create a feature request or additional test case for urllib3.

Let's see the new APIs in action and how their addition could mean a bright future for trust stores in Python:

Using the new APIs

To see these APIs in action let's create a typical TLS connection to example.com:

>>> import socket
>>> import ssl
>>>
>>> ctx = ssl.create_default_context()
>>> sock = socket.create_connection(("example.com", 443))
>>> sock = ctx.wrap_socket(sock, server_hostname="example.com")
>>>
>>> sock
<ssl.SSLSocket fd=3, family=AddressFamily.AF_INET6, ...>

Nothing out of the ordinary here yet. Now having looked at the PR that added these APIs, they're not yet available on an SSLSocket object directly. Instead they're exposed in ssl.SSLObject which is a reduced-scope variant of SSLSocket meant to be used as an interface for TLS with memory buffers.

But we can access the internal SSLObject instance on an SSLSocket via the _sslobj property:

>>> sslobj = sock._sslobj
>>> sslobj
<_ssl._SSLSocket object at 0x7f167d8a5c40>

From here we can start poking around with the new certificate APIs. The following are the new APIs I know about:

import _ssl

_ssl.ENCODING_PEM = 1 # enum
_ssl.ENCODING_DER = 2 # enum

_ssl.Certificate # class

# see: SSLSocket.getpeercert(binary_form=True)
_ssl.Certificate.public_bytes(encoding: int) -> Union[bytes, str]

# see:  SSLSocket.getpeercert(binary_form=False)
_ssl.Certificate.get_info() -> Dict[str, Any]

ssl.SSLObject.get_unverified_chain() -> List[_ssl.Certificate]
ssl.SSLObject.get_verified_chain() -> List[_ssl.Certificate]

If we try using these APIs we can get the following information:

>>> unverified_chain = sslobj.get_unverified_chain()
>>> verified_chain = sslobj.get_verified_chain()

# In our case, the 'unverified_chain' and 'verified_chain'
# are the same we'll discuss the difference below.
>>> assert unverified_chain == verified_chain

# The chains go in order from leaf -> root.
# verified_chain[0] is the same as sock.getpeercert()
>>> assert sock.getpeercert(True) == \
        verified_chain[0].public_bytes(_ssl.ENCODING_DER)

# The individual certificates in the chain have two methods:
# .get_info() and .public_bytes(encoding)
>>> verified_chain[0].public_bytes(_ssl.ENCODING_PEM)
'-----BEGIN CERTIFICATE-----\nMIIG1TC ...'

# Using _ssl.ENCODING_DER is the same as socket.getpeercert(True)
>>> verified_chain[0].public_bytes(_ssl.ENCODING_DER)
b'0\x82\x06\xd50\x82 ... \x05\x0f\xe3E#\xc0d_'

# _ssl.Certificate.get_info() is the same as sock.getpeercert(False)
>>> verified_chain[0].get_info()
{
  "OCSP": ["http://ocsp.digicert.com"],
  "caIssuers": ["http://cacerts.digicert.com/..."],
  "crlDistributionPoints": [
    "http://crl3.digicert.com/...crl",
    "http://crl4.digicert.com/...crl"
  ],
  "issuer": [
    [["countryName", "US"]],
    [["organizationName", "DigiCert Inc"]],
    [["commonName", "DigiCert TLS RSA SHA256 2020 CA1"]]
  ],
  "notAfter": "Dec 25 23:59:59 2021 GMT",
  "notBefore": "Nov 24 00:00:00 2020 GMT",
  "serialNumber": "0FBE08B0854D05738AB0CCE1C9AFEEC9",
  "subject": [
    [["countryName", "US"]],
    [["stateOrProvinceName", "California"]],
    [["localityName", "Los Angeles"]],
    [["organizationName", "..."]],
    [["commonName", "www.example.org"]]
  ],
  "subjectAltName": [
    ["DNS", "www.example.org"],
    ["DNS", "example.com"],
    ["DNS", "example.edu"],
    ["DNS", "example.net"],
    ["DNS", "example.org"],
    ["DNS", "www.example.com"],
    ["DNS", "www.example.edu"],
    ["DNS", "www.example.net"]
  ],
  "version": 3
}

What is new with these APIs?

Before Python 3.10 the only certificate information we could gather from an SSLSocket was from the leaf certificate via the getpeercert() method. The complete certificate chain that was sent during the handshake wouldn't be available from Python. This meant that only the leaf certificate could be used in trust decisions from applications.

With these new APIs applications and libraries can make trust decisions with the entire cert chain. Root CA pinning and using systems besides OpenSSL for trust decisions are now possible in Python! 🎉

To use a separate API for verifying cert chains we can configure an ssl.SSLContext to not verify certificates during the handshake with SSLContext.verify_mode = ssl.CERT_NONE flag and instead use the unverified_certificates() method capture to forward all certificates to the separate API for verifying certs:

import ssl

# Disable cert verification (enabled by default)
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# Handshake as normal, still set `server_hostname` for SNI
sock = ctx.wrap_socket(sock, server_hostname="example.com")
cert_chain = sock._sslobj.unverified_chain()

# Use a different API to verify certificates:
verify_cert_chain_api(cert_chain)

Difference between verified_chain() and unverified_chain() methods

Despite similar names and return types, the verified_chain() and unverified_chain() methods have two distinct uses. The verified_chain() method uses OpenSSL's SSL_get0_verified_chain function. The documentation of this function says that it returns "the verified certificate chain of the peer including the peer's end entity certificate". The unverified_chain() method uses the SSL_get_peer_cert_chain() OpenSSL function which returns a certificate chain that is not verified.

The difference between a verified and unverified chain is whether the chain is the minimal number of certificates between the target entity certificate and a trust anchor in the trust store. All certificates within a verified chain are valid, unexpired, and (as far as OpenSSL knows) not revoked. An unverified chain can include certificates that aren’t valid, are expired or revoked, or simply not necessary to create a chain of trust.

For example if the server provides 4 certificates during the TLS handshake (named L, A, B, and C) where C is in the trust store as a trust anchor and A and B are both intermediate certificates signed by C. The L certificate is the leaf certificate being used by the server and is signed by A. Here’s an ASCII-art diagram for the above situation:

                ┌───────┐   ┌───────┐
                │       │   │       │
                │       │   │       │
                │   A   ├───►   L+  │
    ┌───────┐   │       │   │       │
    │       ├───►       │   │       │
    │       │   └───────┘   └───────┘
    │   C*  │
    │       │   ┌───────┐
    │       ├───►       │     Legend
    └───────┘   │       │   ──────────
                │   B   │   ─► Signs
                │       │   +  Entity/leaf
                │       │   *  Trusted
                └───────┘

This means that B is not necessary to create a chain of trust for the handshake. In this case verified_chain() method would include L, A, and C and the unverified_chain() method would return L, A, B, and C.

Why are OS trust stores better?

Why are OS trust stores superior to OpenSSL on platforms where these APIs are available? OS trust store APIs include many more features that are automatically handled in the background that make the experience better for both application developers and system operators.

Windows automatically downloads missing intermediate certificates when they’re detected and keeps the trust store up to date automatically via the Windows update process. This means that a new installation of Windows will still be able to make TLS handshakes without first manually downloading or updating certificates.

Windows and macOS both check certificate revocation lists (CRL) for certificates in a chain. If you’re using CRLs with OpenSSL you’ll have to implement the functionality yourself instead of getting it for free from the OS.

For full details on these APIs you can read the documentation on Secure Channel and CryptoAPI for Windows and Secure Transport and Security framework for macOS.

What’s next for trust stores in Python?

Python's ssl module is married to the OpenSSL API in multiple places. Even on Windows and macOS Python ships with its own version of OpenSSL for each platform to use with the ssl module. Because of this Python's trust store APIs are similarly OpenSSL-centric, allowing you to specify "verify locations" as a file, directory, or in-memory certificate data.

Non-Python native applications are unlikely to use this method, instead they defer to the operating system's trust store implementation to verify certificates, establish a TLS connection, or send an HTTP request over HTTPS.

Prior art in PEP 543

PEP 543 put forward a proposal for new TLS APIs for Python which are implementation agnostic and not tied to OpenSSL to open the door for alternate TLS implementations. The PEP contains a lot of discussion about trust stores and wanted to credit Cory Benfield and all the authors and reviewers for their work on the PEP.

The problems listed in the PEP are mostly in the same state as they were in 2016 when the PEP was proposed. I recommend reading the PEP for historical context.

The end of certifi

Certifi is a repackaging of Mozilla’s CA bundle meant to be a stopgap for the problem on Windows and macOS not having a single CA bundle to configure OpenSSL to use. Certificates are bundled into the certifi package and then a single API, certifi.where() will return the location on disk of the unbundled certifi certificates, typically somewhere within your venv/lib/python/.../site-packages/... directory.

This seems like a fine solution at first, but from the perspective of a system operator this is a nightmare. You now have tons of different CA bundles that are tough to track, usually per-application, and can easily get out of date. This is another big win when all applications use a single OS trust store instead of one trust store per application.

A small experimental package written by Python core developer Christian Heimes tries to solve this problem. The package certifi-system-store will rewrite the dist-info of the certifi package to point to the "actual" OS trust store instead of certifi's bundled trust store. However this solution is experimental, requires running a command after installation, and only works on Linux which is a platform that's already covered by not using certifi and using ssl.create_default_context() instead.

The future is OS trust stores

The APIs mentioned above will likely stabilize and be available in a future Python version. My hope is before Python 3.10+ becomes pervasive there will be an effort to implement OS trust stores such that they can be used seamlessly by libraries and applications. We have many developers (myself included) that would be interested in helping make certificate verification better for everyone in our ecosystem.

Tests aren’t enough: Case study after adding type hints to urllib3

Published 2021-10-18

Since Python 3.5 was released in 2015 including PEP 484 and the typing module type hints have grown from a nice-to-have to an expectation for popular packages. To fulfill this expectation our team has committed to shipping type hints for the v2.0 milestone. What we didn’t realize is the amount of value we’d derive from this project in terms of code correctness.

We wanted to share the journey, what we learned, and problems we encountered along the way as well as celebrate this enormous milestone for the urllib3 project.

Hasan Ramenzani has taken on the mantle of being the primary contributor to urllib3’s type hints along with help from other team members and community contributors Ran Benita, Ezzeri Esa, Franek Magiera, and Quentin Pradet. Thanks to them for all their work!

Why type hints?

Python doesn’t have a concept of “type safety”: if a function accepts a string parameter and you pass an integer parameter instead the language will not complain (although the function likely won’t work as intended!) Even when using type hints the “type safety” they provide is completely opt-in with tools like Mypy or Pyright. You can continue incorrectly passing an integer parameter, your tools just won’t be very happy about it.

Tests aren't a substitute for type hints

When we originally started this journey our primary goal was to provide a good experience for users who wanted to use type checking tools with urllib3. We weren’t expecting to find and fix many defects.

After all, urllib3 has over 1800 test cases, 100% test line coverage, and is likely some of the most pervasive third-party Python code in the solar system.

Despite all of the above, the inclusion of type checking to our CI has yielded many improvements to our code that likely wouldn't have been discovered otherwise. For this reason we recommend projects evaluate adding type hints and strict type checking to their development workflow instead of relying on testing alone.

For a great visual explanation of how types help catch issues differently than tests see this PyCon Cleveland 2018 talk by Carl Meyer.

How we developed and iterated on type hints

This section should serve as a guide to developers looking to add type hints to a medium-to-large size project. This effort took many months for our team to complete so make sure to allocate enough time.

Incrementally adding types to existing projects

We wanted to add Mypy to our continuous integration to ensure new contributors wouldn’t accidentally break type hints but we also wanted to add Mypy incrementally so it wouldn’t grind all our development to a halt.

If you’ve ever tried running Mypy on a single file in a project without any type hints you’re unlikely to only receive errors from the file you ran Mypy on. Instead you’re likely to receive type errors from files where you imported other APIs, both within the project and third-party modules.

This makes adding type hints seem like a daunting task, either needing to be added all at once or with tons of temporary # type: ignore annotations along the way to ensure Mypy continues to pass in CI.

Our solution to this was to maintain a list of files in our project that we knew had correct type hints. Mypy would be run one by one on every file and the list of issues that were detected by Mypy would be gathered up, de-duplicated, and crucially we’d filter out files that weren’t on the “known-good” list.

This meant that once a file was complete we’d add the path to the known-good list to ensure that future contributions wouldn’t regress on our type hints.

Reviewing type hint additions

Reviewing large diffs in GitHub, especially ones where very small changes are made to large numbers of lines, is a difficult task because of how little context is given within the GitHub UI by default (usually 2-3 lines above and below the diff). Take your time here as a mistake may leak out to users if you aren’t adding types to your test suite too.

If it wasn’t obvious why a # type: ignore annotation was added a comment would be added to Github by the author to let reviewers know why the decision was made. This made for a less back-and-forth on individual changed lines.

Type your tests!

Once we completed the addition of type hints to the source code of urllib3 we set our sights on the test suite. This may seem strange as our users are very unlikely to directly benefit from us adding type hints there, however there’s one big benefit: we’re able to find issues with types from a user perspective!

In our case the test suite contained more use-cases than what we originally thought for multiple APIs which meant we had to change or loosen the type hints in those cases. Doing this work up-front meant we were less likely to release type hints that were too strict which would likely cause issues for users.

Backwards-compatibility types

Many times we had to make a decision about whether to advertise support for a certain type, especially types which were allowed for backwards compatibility but not what we want users to start using in newly written code. Consciously excluding a type is a good way to push users in the right direction without introducing breaking changes.

Use strict mode if possible

Mypy includes a --strict parameter which enables all the optional error checking flags. This provides the best coverage of typing errors and also means when you upgrade Mypy you’ll automatically start checking errors that were added in the latest version.

Remember to pin the version of Mypy you’re using for your project so you won’t be caught flat-footed when CI starts failing due to new type errors being checked in a new Mypy release.

Specify error codes on type ignore annotations

Instead of adding a blanket annotation to ignore all type issues every type: ignore annotation should specify the error codes to narrow down the error that Mypy is ignoring. This means the code will continue to be checked for all other error and if the error ever changes then Mypy will be able to signal the situation.

You can see which error code to use by using the --show-error-codes option with Mypy.

Reference: urllib3#2363

Anything but Any

Using typing.Any in a particularly complicated type situation is a tempting option. Resist the easy-out using Any because complicated situations for you are likely to translate to complicated situations for your users.

Python typing has come a long way since it was introduced: read up on the new features available for modeling complex types and try your best to keep Any out of your code.

What we found and learned

Our whole team learned a bunch about how Mypy and Python typing works during this project. Below are some of the interesting issues and features that we found that are worthy of sharing:

Bytes comparison warnings

Python includes a warning called “BytesWarning” which among other uses can warn against using equality (==) to compare bytes and string types. This warning can help you find subtle type issues in your own code and third-party libraries.

Quentin attempted to enable this feature for urllib3 and immediately we saw some issues with urllib3 code and the brotlicffi package (brotlicffi#177). The fixes in urllib3 (urllib3#2145) were mostly related to how we handle headers, we accept both bytes and strings for header names but this leads to issues when header retrieval occurs.

After fixing all the issues we were able to enable the -bb option which raises an error for bytes comparisons instead of only issuing a warning.

Adding type hints to trustme

urllib3 uses the package trustme for generating realistic CA certificates on-demand for our test suite. As a part of the effort to add type hints to our test suite we also wanted to add type hints to packages used by our test suite to avoid using # type: ignore.

The trustme package at the time supported Python 2 so we got to use Mypy’s Python 2 mode and Python 2 compatible type hints using comments.

Ran Benita created a pull request (trustme#341) adding the types which was reviewed, merged, and released by Quentin Pradet. It’s a small world!

Untested but documented feature with Retry.allowed_methods

Comparing the types to the documentation for a parameter helped us discover a feature that didn’t have a test case but was documented within a docstring. After we discovered this we added the case to our test suite to ensure we never regressed on our advertised feature.

Reference: urllib3#2215

Bad default parameter values

This case shows how few people are using our PoolManager.connection_from_X() APIs as the default value for the only parameter would immediately cause an exception. Mypy helped us find and fix this issue which had been silently missed in our codebase for some time.

Reference: urllib3#2232

Mypy providing better protections for method=None case

Mypy alerted us to a missing check to see that method was non-None before calling a function with a parameter. Previously this behavior was only protected by knowing how the function should be called.

Reference: urllib3#2215

Boolean literals and context managers

Mypy has special handling for returning boolean literals from the context manager __exit__() method. If your __exit__() method returns a raw True or False you must annotate with Literal[True] or Literal[False] as using bool signals that the context manager may or may not swallow exceptions.

Reference: urllib3#2232

Mypy-friendly code is also human-friendly

Mypy signaled on a line of code that was difficult for even a human to understand from a quick glance. By simplifying the code it became easier for both humans and Mypy alike to understand what was intended.

Reference: urllib3#2251

Function signatures not matching mimicked API

Mypy found a mismatch between the APIs of our custom SSLTransport class which is meant to be used like a socket during TLS-in-TLS tunneling and the socket API:

Reference: urllib3#2443

Making types more general should require additional test cases

Whenever loosening a type from strict to general make sure your test suite grows to cover that case as well. In our example we loosened socket_options from List[...] to Sequence[...] as technically passing a tuple was acceptable and being done by some users.

Reference: urllib3#2232

Use @overload for filtered values types

Instead of using f(x: Union[int, str]) -> Union[int, str] take advantage of the @overload decorator to define the instances where certain input types always result in a certain output type for each case. This allows Mypy to give much better results when the interface is being used.

Reference: urllib3#2251

Explicit return None when a function can return other values

It’s always good practice to return None when you intend the result of the function to a variable. Mypy helpfully enforces this good practice when there are values being returned in other parts of the function but the default “drop through” return None isn’t explicitly added.

Reference: urllib3#2255

Don’t expose Generators unless you want Generator functionality

Generators have additional behaviors over iterables so if the API isn’t meant to be used like a generator then it’s best to keep this fact a secret and annotate with Iterable[X] instead of Generator[X, None, None].

Reference: urllib3#2255

Missing type hints in the standard library

Some of the less-traveled parts of the standard library don't have complete type coverage. These types are distributed in a library called typeshed so if you uncover a missing or incorrect type hint for the standard library they should be fixed here.

References: typeshed#6176, urllib3#2458

Conclusion

Adding type hints to urllib3 was clearly a huge amount of work, hundreds of engineer hours across several months. What we once thought would be a purely developer-facing change ended up making the codebase more robust than ever. Several non-trivial logic errors were fixed and our team is more confident reviewing and merging PRs. This is a big win for our users and a very worthy investment.

Portions of this work, including writing this case study, was funded by generous support on GitHub Sponsors, GitCoin Grants, and Open Collective. Thank you for your support!

The problem with Flask async views and async globals

Published 2021-08-01

Starting in v2.0 Flask has added async views which allow using async and await within a view function. This allows you to use other async APIs when building a web application with Flask.

If you're planning on using Flask's async views there's a consideration to be aware of for using globally defined API clients or fixtures that are async.

Creating an async Flask application

In this example we're using the async Elasticsearch Python client as our global fixture. We initialize a simple Flask application with a single async view that makes a request with the async Elasticsearch client:

# app.py
from flask import Flask, jsonify
from elasticsearch import AsyncElasticsearch

app = Flask(__name__)

# Create the AsyncElasticsearch instance in the global scope.
es = AsyncElasticsearch(
    "https://localhost:9200",
    api_key="..."
)

@app.route("/", methods=["GET"])
async def async_view():
    return jsonify(**(await es.info()))

Running with gunicorn via $gunicorn app:app and then visiting the app via http://localhost:8000. After the first request everything looks fine:

{
  "cluster_name": "d31d9d6abb334a398210484d7ac8567b",
  "cluster_uuid": "K5uyniiMT9u2grNBmsSt_Q",
  "name": "instance-0000000001",
  "tagline": "You Know, for Search",
  "version": {
    "build_date": "2021-04-20T20:56:39.040728659Z",
    "build_flavor": "default",
    "build_hash": "3186837139b9c6b6d23c3200870651f10d3343b7",
    "build_snapshot": false,
    "build_type": "docker",
    "lucene_version": "8.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1",
    "minimum_wire_compatibility_version": "6.8.0",
    "number": "7.13.1"
  }
}

However when you refresh the page to send a second request you receive an InternalError and the following traceback:

Traceback (most recent call last):
  ...
  File "/app/app.py", line 13, in async_route
    return jsonify(**(await es.info()))
  File "/app/venv/lib/...", line 288, in info
    return await self.transport.perform_request(
  File "/app/venv/lib/...", line 327, in perform_request
    raise e
  File "/app/venv/lib/...", line 296, in perform_request
    status, headers, data = await connection.perform_request(
  File "/app/venv/lib/...", line 312, in perform_request
    raise ConnectionError("N/A", str(e), e)

ConnectionError(Event loop is closed) caused by:
  RuntimeError(Event loop is closed)

Why is this happening?

The error message mentions the event loop is closed, huh? To understand why this is happening you need to know how AsyncElasticsearch is implemented and how async views in Flask work.

No global event loops

Async code relies on something called an event loop. So any code using async or await can't execute without an event loop that is "running". The unfortunate thing is that there's no running event loop right when you start Python (ie, the global scope).

This is why you can't have code that looks like this:

async def f():
    print("I'm async!")

# Can't do this!
await f()

instead you have to use asyncio.run() and typically an async main/entrypoint function to use await like so:

import asyncio

async def f():
    print("I'm async!")

async def main():
    await f()

# asyncio starts an event loop here:
asyncio.run(main())

(There's an exception to this via python -m asyncio / IPython, but really this is running the REPL after starting an event loop)

So if you need an event loop to run any async code, how can you define an AsyncElasticsearch instance in the global scope?

How AsyncElasticsearch allows global definitions

The magic of global definitions for AsyncElasticsearch is delaying the full initialization of calling asyncio.get_running_loop(), creating aiohttp.Session, sniffing, etc until after we've received our first async call. Once an async call is made we can almost guarantee that there's a running event loop, because if there wasn't a running event loop the request wouldn't work out anyways.

This is great for async programs especially, as typically a single event loop gets used throughout a single execution of the program and means you can create your AsyncElasticsearch instance in the global scope how users create their synchronous Elasticsearch client in the global scope.

Using multiple event loops is tricky and would likely break many other libraries like aiohttp in the process for no (?) benefit, so we don't support this configuration. Now how does this break when used with Flask's new async views?

New event loop per async request

The simple explanation is that Flask uses WSGI to service HTTP requests and responses which doesn't support asynchronous I/O. Asynchronous code requires a running event loop to execute, so Flask needs to get a running event loop from somewhere in order to execute an async view.

To do so, Flask will create a new event loop and start running the view within this new event loop for every execution of the async view. This means all the async and await calls within the view will see the same event loop, but any other request before or after this view will see a different event loop.

The trouble comes when you want to use async fixtures that are in the global scope, which in my experience is common in small to medium Flask applications. Very unfortunate situation! So what can we do?

Fixing the problem

The problem isn't with Flask or the Python Elasticsearch client, the problem is the incompatibility between WSGI and async globals. There are a couple of solutions, both of which involve Async Server Gateway Interface (ASGI), WSGI's async-flavored cousin which was designed with async programs in mind.

Use an ASGI framework and server

One way to avoid the problem with WSGI completely is to simply use a native ASGI web application framework instead. There are a handful of popular and widely used ASGI frameworks you can choose from:

If you're looking for an experience that's very similar to Flask you can use Quart which is inspired by Flask. Quart even has a guide about how to migrate from a Flask application to using Quart! Flask's own documentation for async views actually recommends using Quart in some cases due to the performance hit from using a new event loop per request.

If you're looking to learn something new you can check out FastAPI which includes a bunch of builtin functionality for documenting APIs, strict model declarations, and data validation.

Something to keep in mind when developing an ASGI application is you need an ASGI-compatible server. Common choices include Uvicorn, Hypercorn, and Daphne. Another option is to use the Gunicorn with Uvicorn workers.

All the options mentioned above function pretty similarly so pick whichever one you like. My personal choice has historically been Gunicorn with Uvicorn workers because of how widely used and mature Gunicorn is relative to how new the other libraries are.

You can do so like this:

$ gunicorn app:app -k uvicorn.workers.UvicornWorker

Use WsgiToAsgi from asgiref

If you really love Flask and want to continue using it you can also use the asgiref package provides an easy wrapper called WsgiToAsgi that converts a WSGI application to an ASGI application.

from flask import Flask, jsonify
from elasticsearch import AsyncElasticsearch

# Same definition as above...
wsgi_app = Flask(__name__)
es = AsyncElasticsearch(
    "https://localhost:9200",
    api_key="..."
)

@wsgi_app.route("/", methods=["GET"])
async def async_view():
    return jsonify(**(await es.info()))

# Convert the WSGI application to ASGI
from asgiref.wsgi import WsgiToAsgi

asgi_app = WsgiToAsgi(wsgi_app)

In this example we're converting the WSGI application wsgi_app into an ASGI application asgi_app which means when we run the application a single event loop will be used for every request instead of a new event loop per request.

This approach will still require you to use an ASGI-compatible server.

Everything to know about Requests v2.26.0

Published 2021-07-13

Requests v2.26.0 is a large release which changes and removes many features and dependencies that you should know about when upgrading. Read on to find out all about the changes and what you should do if you're a user of Requests.

Summary of the release

What changes are important?

  • Changed the requests[security] extra into a no-op. Can be safely removed for v2.24.0+ for almost all users (OpenSSL 1.1.1+ and not relying on specific features in pyOpenSSL)
  • Dropped support for Python 3.5
  • Changed encoding detection library for Python 3.6+ from chardet to charset_normalizer
  • Added support for Brotli compressed response bodies

What should you do now?

  • Upgrade to Requests v2.26.0 if you're not using Python 3.5
  • Stop using requests[security] and instead install just requests
  • Regenerate your lock files and pinned dependencies if you're using pip-tools, poetry, or pipenv
  • Read the full set of changes for v2.26.0

Encoding detection with charset_normalizer

The following section has a brief discussion of licensing issues. Please remember that I am not a lawyer and don't claim to understand anything about open source licenses.

Requests uses character detection of response bodies in order to reliably decode bytes to str when responses don't define what encoding to use via Content-Type. This feature only gets used when you call the Response.text() API.

The library that Requests uses for content encoding detection has for the past 10 years been chardet which is licensed LGPL-2.1.

The LGPL-2.1 license is not a problem for almost all users, but an issue arises with statically linked Python applications which are pretty rare but becoming more common. When Requests is bundled with a static application users can no longer "switch out" chardet for a different library which causes a problem with LGPL.

Starting in v2.26.0 for Python 3 the new default library for encoding detection will be charset_normalizer which is MIT licensed. The library itself is relatively young so a lot of work has gone into making sure users aren't broken with this change including extensive tests against real-life websites and comparing the results against chardet to ensure better performance and accuracy in every case.

Requests will continue to use chardet if the library is installed in your environment. To take advantage of charset_normalizer you must uninstall chardet from your environment. If you want to continue using chardet with Requests on Python 3 you can install chardet or install Requests using requests[use_chardet_on_py3]:

$ python -m pip install requests chardet

- OR -

$ python -m pip install requests[use_chardet_on_py3]

Removed the deprecated [security] extra

Before Requests v2.24.0 the pyOpenSSL implementation of TLS was used by default if it was available. This pyOpenSSL code is packaged along with urllib3 as a way to use Subject Name Identification (SNI) when Python was compiled against super-old OpenSSL versions that didn't support it yet.

Thankfully these super-old versions of OpenSSL aren't common at all anymore! So now that pyOpenSSL code that urllib3 provides is a lot less useful and now a maintenance burden for our team, so we now have a long-term plan to eventually remove this code. The biggest dependency using this code was Requests, a logical first place to start the journey.

Starting in Requests v2.24.0 pyOpenSSL wouldn't be used unless Python wasn't compiled with TLS support (ie, no ssl module) or if the OpenSSL version that Python was compiled against didn't support SNI. Basically the two rare scenarios where pyOpenSSL was actually useful!

The release of v2.24.0 came and went quietly which signaled to our team that our long-term plan of actually removing pyOpenSSL will likely go smoothly. So in Requests v2.25.0 we officially deprecated the requests[security] extra and in v2.26.0 the [security] extra will be a no-op. Instead of installing pyOpenSSL and cryptography no dependencies will be installed.

What this means for you is if you've got a list of dependencies that previously used requests[security] you can remove the [security] and only install requests. If you have a lock file via a tool like pip-tools or poetry you can regenerate the lock file and potentially see pyOpenSSL and cryptography removed from your lock file. Woo!

Dropped support for Python 3.5

Starting in Requests v2.25.0 there was a notice for Python 3.5's deprecation in the changelog. Now that 2.26.0 has arrived Requests will only be supported with Python 2.7.x and 3.6+.

This is a big win for Requests maintainers as it progressively becomes more and more difficult to maintain a codebase that supports a wide range of Python versions.

Brotli support via urllib3

Since v1.25 urllib3 has supported automatically decoding Brotli-compressed HTTP response bodies using either Google's brotli library or the brotlicffi library (previously named brotlipy).

Before v2.26.0 Requests would never emit an Accept-Encoding header with br signaling Brotli support even if urllib3 would have been able to decode the response. Now Requests will use urllib3's feature detection for Brotli and emit Accept-Encoding: gzip, deflate, br. This is great news for servers that support Brotli on pre-compressed static resources like fonts, CSS, and JavaScript. Woo!

To take advantage of Brotli decoding you need to install one of the Brotli libraries mentioned above. You can ensure you're getting the right library for your Python implementation by installing like so:

$ python -m pip install urllib3[brotli]

urllib3 Newsletter #5

Published 2021-06-29

Fifth newsletter, commence! If you'd like to discuss this edition of our newsletter you can join our community Discord.

Thanks to our Sponsors

The urllib3 team is very grateful for all of our sponsors and supporters. If you'd like to support our team we have a GitHub Sponsors, GitCoin Grants, and Open Collective.

Notable updates to our sponsors include:

GitCoin Grant Round 10 included urllib3 which has raised >$2000 so far! 🎉

NewRelic started sponsoring our team on GitHub Sponsors 👏

We paid someone to work on Open Source

David Lord who is known for his work on Flask, Jinja and other Pallets projects worked on one of our v2.0 issues related to how we encode fields into the URL. We wanted to modernize how urllib3 does things, you'd think that wouldn't be too tough... However it took a ton of time to unravel what urllib3 was doing and why that had deviated from the current standard WHATWG HTML. You can read all of the discussion and discoveries that went into untangling this pile of standard spaghetti and code archaeology.

The most exciting part of all this is that this is the first time we've paid a contributor who's not a part of our team to work on Open Source, woohoo! 🥳

If you're interested in getting paid to work on urllib3 v2.0 issues you can join our Discord or reach out to the team and we'll walk you through everything. We're also working on making issues which we're willing to pay for work much more visible.

urllib3 v1.26.6 released

We've released another patch for the urllib3 v1.26.x series. This release included a few fixes for small bugs but also included a larger change in deprecated the urllib3.contrib.ntlmpool module, more on that below.

Quentin has been working on migrating the downstream integration tests that are run before every urllib3 release from Travis which have been defunct for some time now to GitHub Actions. This will greatly reduce the amount of manual work required to release urllib3 and drastically reduce maintainer stress, thanks Quentin! 🙇

Quentin and I also did the release together this time around and we've created a complete checklist to make executing releases by other collaborators easier.

Deprecating NTLMConnectionPool in v1.26.6

The urllib3.contrib.ntlmpool module will now unconditionally raise a DeprecationWarning pointing users to a specific issue where we justify this change and we'd like for users to comment if they're actually relying on the module.

The module itself was contributed a long time ago and hasn't had many issues, pull requests, or maintenance and we actually don't have any test cases so we're not even sure how well it works anymore...

Given that NTLM has been deprecated for 10 years we'd like to remove the module in v2.0 but aren't sure if it should live somewhere else or if it should be deleted completely. Please let us know!

CVE-2021-33503

A security vulnerability was reported by Nariyoshi Chida in our URL parser. We coordinated with Nariyoshi and our Tidelift security contact to verify the vulnerability and provide a suitable fix for the issue and released v1.26.5 which included the fix.

Read the full GitHub Security Advisory for more information.

New collaborators and contributors

We've invited a few of our contributors to become collaborators on the project after consistent high-quality contributions. Welcome Bastian Venthur and Ran Benita! Thank you for everything you've done so far for urllib3 👏

We also had many first time contributors in the past month after a couple of tweets brought in a bunch of new faces. Thanks to everyone who contributed! If you're interested in getting started contributing to urllib3 we announce all the new "Contributor Friendly" issues in the community Discord.

If you enjoyed these posts there's more where that came from!
You can subscribe via Email and RSS