OEP-18: Python Dependency Management
OEP-18: Python Dependency Management#
Python Dependencies Management
Jeremy Bowman <email@example.com>
Calen Pennington <firstname.lastname@example.org>
2018-03-27 - 2018-04-27
Proposes best practices for declaring and maintaining dependencies on other Python packages in Open edX software repositories.
The Open edX project includes dozens of Python software repositories, most of which depend on certain other Python packages being installed in order to function correctly. The simple methods we originally used to do this have assorted drawbacks that have repeatedly caused problems over the past few years: accidental upgrades to incompatible versions, strict installation requirements that restrict the ability of downstream packages to manage their own dependency versions, lack of clarity regarding the full set of packages actually depended upon, etc.
Outlined here is a recommended standard for declaring dependencies on other Python packages which resolves most of these issues and will let us make all the Open edX Python packages consistent with each other (and many other open source Python projects) for ease of understanding and maintenance.
The key to successful Python dependency management in Open edX repositories is to break it down into five parts:
Identify the different contexts in which dependencies will need to be installed.
For each of these contexts, declare the direct dependencies that will be needed. Use the least restrictive constraints which should allow pip to install a working set of dependencies. We generally no longer want or need to manually pin specific versions;
make upgradewill do this for us.
make upgradeto generate from the high-level dependencies declarations a separate set of requirements files which specify the exact package versions that are known to work for a Python virtualenv created for that context.
Automate updates of the detailed requirements files for each context.
When making deliberate changes to the repository’s dependencies, only edit the files given to
*.infiles in the
requirementsdirectory, never the generated
*.txtfiles. Both types of files must be committed when changed.
The dependencies of Python software are typically installed or run in a
variety of different contexts over the course of developing and using it.
The set of dependencies needed to perform a task can easily vary between these
contexts. The dependencies for each context will be captured in a
separate file in the
requirements directory. Here are some common
contexts and the file names often used for them:
Just the standard set of core dependencies for execution on a production server to perform its primary purpose (
Additional dependencies which are only needed when optional extra features of a package are desired (the file is typically named after the corresponding key in
Assorted testing libraries to run automated test suites (
Static code analysis tools to perform code quality checks (
The utilities called directly by a CI server to create and use one or more virtualenvs and report code coverage statistics to a 3rd-party service (
Sphinx and other utilities used to generate developer documentation (
Additional utilities needed to perform common development tasks (
Utilities that a particular developer likes to use with a repository, but aren’t strictly needed for any of the regular contexts (
As indicated above, some of the usage contexts have a standard filename used in
requirements directory of an Open edX repository to list dependencies.
Others will have an appropriate filename custom to that repository’s unique
context. Each of these is a
pip-compatible requirements file listing
the direct dependencies needed for that context. Beyond complying with the
file format, there are a few guidelines each of these files should follow:
The file should start with a brief comment explaining the context in which these dependencies are needed. Examples can be found in the edx-cookiecutters repository, e.g. under
Each listed dependency should have a brief end-of-line comment explaining its primary purpose(s) in this context. These comments typically start at the 37th character, which allows enough room for most package name plus constraint specifications while keeping the comments visually aligned.
Avoid direct links to packages in local directories, GitHub, or other version control systems if at all possible; all dependencies should be installed from PyPI. If you think you’re in one of the rare circumstances where installing a package from a URL is appropriate, see the notes below on Installing Dependencies from URLs
If the dependencies in one context are a superset of those in another one, do not repeat the dependencies. Instead, explicitly include the file produced by
make upgradefor the smaller set of dependencies in the requirements file for the larger set of dependencies. For example,
test.inoften includes a line like the following to ensure that the same versions of packages used in production for a service will also be used when testing it:
-r base.txt # Core dependencies of the service being tested
If the repository contains a
setup.py file defining a Python package, the
base dependencies also need to be specified there. These can be derived from
requirements/base.in with a Python function declared in
setup.py itself, such as the following:
def load_requirements(*requirements_paths): """ Load all requirements from the specified requirements files. Returns a list of requirement strings. """ requirements = set() for path in requirements_paths: with open(path) as reqs: requirements.update( line.split('#').strip() for line in reqs if is_requirement(line.strip()) ) return list(requirements) def is_requirement(line): """ Return True if the requirement line is a package requirement; that is, it is not blank, a comment, a URL, or an included file. """ return line and not line.startswith(('-r', '#', '-e', 'git+', '-c'))
This can be used to define
install_requires as follows:
Although we usually want to use the latest available version of our
dependencies in order to take advantage of the latest bug fixes, performance
improvements, and security fixes, we sometimes need to impose some constraints
on the version to be used. These should be collected in
requirements/constraints.txt so they can be imposed uniformly across all
the repository’s requirements files; this is done via a
line just under the summary comment of each
*.in file in the
requirements directory. Some guidelines to keep in mind when populating
Version constraints should only be used to exclude dependency versions which are known (or strongly suspected) to not work in at least one context.
Constraints on indirect dependencies (used by dependencies but not directly by the code in the repository itself) can be added if needed to enforce a compatible version.
Environment markers should be used as necessary to indicate dependencies which should only be installed on specific operating systems, Python versions, etc.
If a dependency is maintained by edX and only used in a few repositories, consider if it should stay pinned to a specific version to facilitate managing new releases. Best practice is to avoid making backwards-incompatible new releases whenever possible, but this can require excessive effort for a package only used in 1-2 repositories.
Each constraint should be preceded by a comment explaining why the constraint has been imposed. If there is an issue (either in Jira or an upstream issue tracker) for resolving the problem, a link to it should be included in the comment.
Minimum versions should generally not be included here;
pip-compilealways tries to use the latest compatible version in the generated requirements files. If minimum versions need to be specified for use in
setup.py, those constraints should go in
requirements/base.inas explained above.
This file should be periodically reviewed to determine if some of the constraints are no longer required.
Although we want to keep our manually edited requirements files very simple, we need a separate set of requirements files which list every single package needed for each usage context, with exact versions of each for reproducible test runs and consistent development and production environments. We can generate these automatically using pip-tools, which consists of two related utilities:
pip-compilegenerates a requirements file from one or more high-level input requirements files, listing exact versions of every listed and indirect dependency needed to satisfy the given constraints.
pip-syncensures that the current virtualenv contains exactly (and only) the packages listed in the given requirements files, installing, upgrading, and uninstalling packages as needed.
Open edX packages should use an
upgrade make target to use
to automatically update the detailed requirements files
requirements/*.txt) to use the newest available packages which satisfy
the constraints in the direct dependencies files. These generated files are
then used anywhere that runs a command to install dependencies:
requirements make target (for updating a local
development environment), etc.
pip-compile uses a cache of calculated dependency relationships
to improve the performance of subsequent runs. Unfortunately, the results of
this cache are sometimes used even after a new package release has changed the
set of packages it depends on. To avoid generating incorrect requirements
files due to this, it’s best to always use the
--rebuild option for the
first run of
pip-compile during an upgrade.
While we want all dependencies explicitly pinned in order to benefit from consistent testing and development environments, it isn’t acceptable to leave these versions untouched for long stretches of time. Packages we depend on routinely release new versions to address security issues, fix bugs, and add new features. While we don’t necessarily need to update our repositories every time a new dependency version is released, we do want to keep them current enough that upgrading a single package to fix a known issue doesn’t require suddenly adapting to a few years’ worth of API changes that we didn’t pay attention to.
Each Open edX repository should have the following:
upgrademake target as described above, to update the pinned versions of all dependencies (and account for any new or removed indirect dependencies).
An automated test suite with reasonably good code coverage, configured to be run on new GitHub pull requests.
A service configured to periodically auto-generate a GitHub pull request that tests the output of running
make upgrade(if it results in any changes). This can either be a service such as requires.io which tracks new releases of Python package dependencies, or a recurring scheduled job.
At least one designated maintainer who receives notifications of the generated pull requests and will merge or fix them as needed. This maintainer should scan the changelog for each upgraded package to look for changes that merit closer inspection; services like requires.io can make this easier.
In addition to the automation described above to keep dependencies current over time, developers will occasionally need to make deliberate changes to the set of dependencies. Common changes include:
A new dependency is needed to support recent code changes.
The need for an old dependency was removed.
A version constraint needs to be added to prevent upgrading to a backwards-incompatible release of a required package until appropriate code changes can be made.
The code has been updated to support a newer dependency package version which was previously blocked by a version constraint.
Whenever a developer needs to make a deliberate change to the repository’s Python package dependencies, they should do the following:
Make the appropriate changes to the
make upgradeto regenerate the detailed requirements files.
For each package for which the pinned version is changing in the
*.txtrequirements files, look at its changelog to make sure that there are no problematic backwards-incompatible changes. If there are, add a version constraint to one of the
.infiles to prevent it from being upgraded to that release, run
make upgradeagain, and file a ticket briefly describing the change that needs to be made in order to upgrade that package further. Similarly, if there are new features that the code depending on that package should start taking advantage of, file tickets explaining what should be done.
Check in all of the changed requirements files and wait for the automated test results. If one of the upgrades caused unexpected problems, follow the same process as if a backwards-incompatible change had been spotted in the changelog (add a version constraint,
make upgrade, file a ticket).
Manually editing the
make upgrade output files or only running
pip-compile on a single file should generally be avoided, since it risks
failing to account for changes in indirect dependencies or making the
different requirement files fall out of sync with each other. And in general,
we would rather err on the side of using newer versions of dependencies than
strictly necessary, rather than avoiding upgrades for fear of breaking things.
If the developer is not confident of their ability to assess whether a change
to the dependencies is appropriate, they should seek assistance from other
developers who are either more experienced or more familiar with that
make upgrade or
pip-compile will be unable to find a
suitable version of a dependency for the output file because there are
incompatible version constraints in the input files and/or the stated
installation requirements of the other dependencies. In cases like this,
--verbose) flag to
pip-compile to get more
detailed information about which dependencies imposed the conflicting
constraints, so you can decide which package(s) to upgrade or pin to resolve
the issue. Installing and running pipdeptree can also sometimes help
identify the problem.
As noted above, you should generally avoid installing requirements from a URL or local directory instead of PyPI. But there are a few circumstances where it can be appropriate:
You need to test a release candidate of the dependency to make sure it will work with your code.
You critically need a fix for a package which has not yet been included in a release, and you cannot arrange for a release to be made in a timely manner.
In most other circumstances, the package should be added to PyPI instead. If you do need to include a package at a URL, it should have both the package name and version specified (end with “#egg=NAME==VERSION”). For example:
The practices outlined here help prevent the following problems that we have encountered in the past:
A new deployment of an Open edX release fails because an unpinned indirect dependency recently released a backwards-incompatible version.
Tests unrelated to a new code change fail, because an unpinned dependency was upgraded to a backwards-incompatible version. This can be difficult to diagnose because the upgrade doesn’t appear in the diff of pending changes.
Tests have been running against a particular set of pinned versions for years, but we now need to upgrade one (like Django) which requires also upgrading several of the other dependencies. This can force dealing with a few years’ worth of backwards-incompatible changes in multiple packages all at once, whereas dealing with them one at a time every few months in smaller pull requests would have been more manageable.
We have a different version of a dependency installed than we expect, because the constraints imposed on pip for choosing a version vary between different requirements files and we install them one file at a time.
We keep using years-old package versions despite the availability of newer versions with accumulated bug fixes and performance improvements.
We install in production environments packages which are only needed for testing, because we didn’t make a clean distinction between the dependencies for different usage contexts. This slows down deployments.
We try to exhaustively pin all indirect dependencies manually, but miss some (especially when a seemingly innocuous upgrade adds some new dependencies).
We keep installing a package long after we stopped using it, because nobody remembers why it was added to the requirements file (especially true for indirect dependencies that were later dropped as requirements of the package we use directly).
We install an exhaustive set of testing dependencies in Travis, even though we really only need it to run tox and codecov; the rest of the testing dependencies are installed in a separate virtualenv created by tox, which should have a separate requirements file.
An attempt to pin dependencies in setup.py (or parse its dependencies automatically from a requirements file) forces us to change that package before we can upgrade one of those dependencies in another repository using that package.
We add a dependency without realizing that it requires multiple additional indirect dependencies; we may have chosen an alternative if that had been apparent.
There are several good reasons for the recommendation to avoiding installing packages from URLs whenever possible:
Specified VCS branches, commits, and tags can all be deleted from a repository at any time, suddenly making it impossible to find and install the dependency.
Editable requirements (starting with “-e “) are downloaded and/or inspected with each installation of the requirements file, even if the correct version is already installed. This can significantly slow down updates of installed requirements.
Packages installed from local directories don’t reflect any changes to package metadata (like required package versions) until the version number is incremented or the package is uninstalled; just installing again doesn’t help.
Package URLs tend to be long and difficult to read, with the actual name of the package hidden in the middle or not even present at all.
As of this writing,
pip-toolsstill has a bug in handling packages installed from local directories that requires special care to work around: relative local paths are expanded to absolute paths. This can be partially worked around via a post-processing script for the generated requirements files; an example can be found in edx-platform at
When installing a package from PyPI, pip will not pull requirements from URLs for security reasons (the content of the URLs can change). It will only pull requirements from PyPI.
Many of the Open edX repositories have already begun to comply with the recommendations outlined here. In particular, repositories generated using edx-cookiecutters should already be configured correctly. These may also be useful for reference:
pipenv is a relatively new utility for managing Python dependencies, written by Kenneth Reitz (author of the requests package). Although it recently became the default dependency management tool recommendation of the Python Packaging User Guide, it lacks some features that we strongly want for Open edX:
The ability to specify more than 2 sets of dependencies (core and development)
The ability to add comments to the dependencies listing explaining why each one is needed
Indication of which other dependencies caused the inclusion of indirect dependencies in the full set of requirements
Easy interoperability with tox, especially for testing multiple versions of a major dependency
As a younger package than
pip-tools, it also seems to have more
significant still-unresolved problems, although those are gradually being