The week of November 7th to the 11th, 2022 Quentin Pradet and I took time off of our regular day jobs at Elastic to work full-time on urllib3 v2.0. If you enjoy this article Quentin also wrote about his experience.
We decided to do this for multiple reasons, the biggest being that v2.0 development had stalled due to a lack of reviewer time to push community PRs across the finish line and a lack of contiguous time to work on the more difficult remaining tasks on v2.0.
Taking a week together also meant we'd be able to review PRs and discuss roadblocks throughout the week without the chance of life getting in the way for reviewers (as it tends to do). This made for a super enjoyable week of open source development that you can't really replicate outside in-person sprints at conferences.
Without the generous support we receive from sponsors like Spotify we wouldn't be able to accomplish everything we did in the span of months, let alone a week. Thanks to everyone who supports our project!
Going into the week Quentin and I had the goal of completing all the tasks necessary to release the first alpha of urllib3 v2.0. We split up the tasks so we could work concurrently and coordinated a time each day to discuss and review each other's work from the previous day.
These were the three tasks I wanted to complete during the week:
The first task in the list we knew was going to be a large one due to how complicated the existing HTTPConnection API was as a subclass of the standard library http.client.HTTPConnection
class.
The task was to design the API so it could be extended in the future to support other HTTP implementations beyond the one provided by the standard library.
This work was a follow-up to the community-contributed pull request to make the urllib3's HTTPConnection.getresponse()
method return an instance of urllib3.HTTPResponse
instead of an http.client.HTTPResponse
. Previously the urllib3.HTTPConnectionPool
class would transform the standard library response class into
our own response class but that tied our own connection class very strongly to the standard library. Thanks to @shadycuz for contributing this change.
Breaking down the task into pieces I was left with the following items to complete:
BaseHTTPConnection
and BaseHTTPSConnection
protocols which can be used for type-hinting our HTTPConnectionPool
classes.The full set of changes were made in this pull request.
This involved looking at HTTPConnectionPool
, HTTPResponse
, and our retries and utility functions.
I documented many of my findings over the first day in a notes/
directory. Information on the
HTTPConnection
lifecycle can be found in this document.
I decided to make the following changes to not limit future development of urllib3:
HTTPConnection.sock
for the inner socket.socket
instance.
to check whether the connection was alive, setting timeouts, etc.HTTPSConnection.set_cert()
, HTTPSConnection.tls_in_tls_required
HTTPSConnectionPool._prepare_conn()
which only existed to call HTTPSConnection.set_cert()
.preload_content
and decode_content
parameters explicitly to the HTTPConnection.request()
method instead of
using **kwargs
.We have already been bitten hard by subclassing the standard library http.client.HTTPConnection
library so we knew we didn't want to make the same mistake there again. You can read Hynek's excellent article on subclassing, composition, and structural typing to learn more. Highly recommended to all Pythonistas!
Instead I opted to use the new typing.Protocol
feature to define the structural type which the urllib3.HTTPConnection
and HTTPSConnection
classes then implement. We can then change all type hints on HTTPConnectionPool
classes to use the BaseHTTPConnection
protocol instead of HTTPConnection
to ensure none of the APIs being accessed are private or leaking from the parent class.
The typing.Protocol
feature is only available in Python 3.8 and later, but I didn't want to wait for that version to be our earliest supported version (Python 3.7 still has ~6 months of support left) so I used typing_extensions
and optional type hints to not cause issues for Python 3.7.
You can see the entire definition of the BaseHTTPConnection
protocols in this file.
We currently use the "final enum value as typed sentinel" trick in order to represent the concept of a "default" timeout (ie socket.getdefaulttimeout()
). It looked something like this previously:
import enum
import typing
class DEFAULT_TIMEOUT_TYPE(enum.Enum):
token = 0
DEFAULT_TIMEOUT: typing.Final[DEFAULT_TIMEOUT_TYPE] = DEFAULT_TIMEOUT_TYPE.token
TYPE_TIMEOUT = float | None | DEFAULT_TIMEOUT
def request(..., timeout: TYPE_TIMEOUT = DEFAULT_TIMEOUT):
...
Eventually on the HTTPConnection
layer the timeout
value will either be passed to socket.settimeout()
if it's a float
or None
or if the value isn't given by the user the connection will use the configured default timeout via socket.getdefaulttimeout()
. Can you imagine a scenario where the above logic might go wrong?
I ran into the problem while modifying the way HTTPConnection
instances manage and update their socket's timeout value. If you run the following code there is no error:
sock = socket.create_connection(...)
sock.settimeout(DEFAULT_TIMEOUT) # There is no error here, the value is 0!
Now our socket has a timeout value of 0
instead of the value of socket.getdefaulttimeout()
. Luckily this mistake was caught in tons of places throughout our test suite, but in addition to fixing the problem where our default timeout sentinel was getting set directly to the socket timeout I wanted to do one better and make this situation raise an error instead of silently moving on.
Looking at the socket.settimeout()
documentation it says: "The value argument can be a nonnegative floating point number expressing seconds, or None
" Non-negative values aren't valid timeout values so we can change the sentinel's token
value to be -1
to receive an error instead of the socket silently accepting our default timeout sentinel value:
class DEFAULT_TIMEOUT_TYPE(enum.Enum):
token = -1 # This value was changed to -1
...
# Now setting the sentinel value directly in settimeout() raises an error.
>>> sock = socket.create_connection(...)
>>> sock.settimeout(DEFAULT_TIMEOUT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Timeout value out of range
This change improves nothing for end-users because the behavior is the same, however it makes our timeout code more robust to future changes now that a situation we never want to occur raises an error instead of being silently accepted.
Creating the migration guide from v1.26.x to v2.0.0 involved combing through all of the changes to v2.0 which was much easier thanks to our use of a "newsfragment" changelog. The migration guide covered the following:
You can read the v2.0 migration guide on Readthedocs.
This task was mostly curating the user-facing changelog that would go along with v2.0.0a1. This required reading through ~55 newsfragments many of which contained multiple user-facing changes, and putting that into the "Keep a Changelog" format that the urllib3 project uses today.
This alpha release was uploaded to PyPI on November 15th instead of on Friday or the weekend to allow developers to react to the new major version in case we broke builds somehow.
As Quentin and I have both written about multiple times by now, getting paid to work on open source feels great for maintainers to have rare dedicated time to focus on a project and move mountains relative to the slow drip (or more often than not: absence!) of time that is normally allocated to open source work.
Don't let social media algorithms decide what you want to see.
Get notified of new publications by subscribing to the RSS feed or the email newsletter:
This work is licensed under