|Title||Python Dependencies Management|
|Authors||Jeremy Bowman <email@example.com>|
|Arbiter||Calen Pennington <firstname.lastname@example.org>|
|Review Period||2018-03-27 - 2018-04-27|
Proposes best practices for declaring and maintaining dependencies on other Python packages in Open edX software repositories.
The Open edX project includes dozens of Python software repositories, most of which depend on certain other Python packages being installed in order to function correctly. The simple methods we originally used to do this have assorted drawbacks that have repeatedly caused problems over the past few years: accidental upgrades to incompatible versions, strict installation requirements that restrict the ability of downstream packages to manage their own dependency versions, lack of clarity regarding the full set of packages actually depended upon, etc.
Outlined here is a recommended standard for declaring dependencies on other Python packages which resolves most of these issues and will let us make all the Open edX Python packages consistent with each other (and many other open source Python projects) for ease of understanding and maintenance.
The key to successful Python dependency management in Open edX repositories is to break it down into five parts:
make upgradewill do this for us.
make upgradeto generate from the high-level dependencies declarations a separate set of requirements files which specify the exact package versions that are known to work for a Python virtualenv created for that context.
*.infiles in the
requirementsdirectory, never the generated
*.txtfiles. Both types of files must be committed when changed.
The dependencies of Python software are typically installed or run in a
variety of different contexts over the course of developing and using it.
The set of dependencies needed to perform a task can easily vary between these
contexts. The dependencies for each context will be captured in a
separate file in the
requirements directory. Here are some common
contexts and the file names often used for them:
As indicated above, some of the usage contexts have a standard filename used in
requirements directory of an Open edX repository to list dependencies.
Others will have an appropriate filename custom to that repository’s unique
context. Each of these is a
pip-compatible requirements file listing
the direct dependencies needed for that context. Beyond complying with the
file format, there are a few guidelines each of these files should follow:
make upgradefor the smaller set of dependencies in the requirements file for the larger set of dependencies. For example,
test.inoften includes a line like the following to ensure that the same versions of packages used in production for a service will also be used when testing it:
-r base.txt # Core dependencies of the service being tested
If the repository contains a
setup.py file defining a Python package, the
base dependencies also need to be specified there. These can be derived from
requirements/base.in with a Python function declared in
setup.py itself, such as the following:
def load_requirements(*requirements_paths): """ Load all requirements from the specified requirements files. Returns a list of requirement strings. """ requirements = set() for path in requirements_paths: with open(path) as reqs: requirements.update( line.split('#').strip() for line in reqs if is_requirement(line.strip()) ) return list(requirements) def is_requirement(line): """ Return True if the requirement line is a package requirement; that is, it is not blank, a comment, a URL, or an included file. """ return line and not line.startswith(('-r', '#', '-e', 'git+', '-c'))
This can be used to define
install_requires as follows:
Although we usually want to use the latest available version of our
dependencies in order to take advantage of the latest bug fixes, performance
improvements, and security fixes, we sometimes need to impose some constraints
on the version to be used. These should be collected in
requirements/constraints.txt so they can be imposed uniformly across all
the repository’s requirements files; this is done via a
line just under the summary comment of each
*.in file in the
requirements directory. Some guidelines to keep in mind when populating
pip-compilealways tries to use the latest compatible version in the generated requirements files. If minimum versions need to be specified for use in
setup.py, those constraints should go in
requirements/base.inas explained above.
This file should be periodically reviewed to determine if some of the constraints are no longer required.
Although we want to keep our manually edited requirements files very simple, we need a separate set of requirements files which list every single package needed for each usage context, with exact versions of each for reproducible test runs and consistent development and production environments. We can generate these automatically using pip-tools, which consists of two related utilities:
pip-compilegenerates a requirements file from one or more high-level input requirements files, listing exact versions of every listed and indirect dependency needed to satisfy the given constraints.
pip-syncensures that the current virtualenv contains exactly (and only) the packages listed in the given requirements files, installing, upgrading, and uninstalling packages as needed.
Open edX packages should use an
upgrade make target to use
to automatically update the detailed requirements files
requirements/*.txt) to use the newest available packages which satisfy
the constraints in the direct dependencies files. These generated files are
then used anywhere that runs a command to install dependencies:
requirements make target (for updating a local
development environment), etc.
pip-compile uses a cache of calculated dependency relationships
to improve the performance of subsequent runs. Unfortunately, the results of
this cache are sometimes used even after a new package release has changed the
set of packages it depends on. To avoid generating incorrect requirements
files due to this, it’s best to always use the
--rebuild option when
While we want all dependencies explicitly pinned in order to benefit from consistent testing and development environments, it isn’t acceptable to leave these versions untouched for long stretches of time. Packages we depend on routinely release new versions to address security issues, fix bugs, and add new features. While we don’t necessarily need to update our repositories every time a new dependency version is released, we do want to keep them current enough that upgrading a single package to fix a known issue doesn’t require suddenly adapting to a few years’ worth of API changes that we didn’t pay attention to.
Each Open edX repository should have the following:
upgrademake target as described above, to update the pinned versions of all dependencies (and account for any new or removed indirect dependencies).
make upgrade(if it results in any changes). This can either be a service such as requires.io which tracks new releases of Python package dependencies, or a recurring scheduled job.
In addition to the automation described above to keep dependencies current over time, developers will occasionally need to make deliberate changes to the set of dependencies. Common changes include:
Whenever a developer needs to make a deliberate change to the repository’s Python package dependencies, they should do the following:
make upgradeto regenerate the detailed requirements files.
*.txtrequirements files, look at its changelog to make sure that there are no problematic backwards-incompatible changes. If there are, add a version constraint to one of the
.infiles to prevent it from being upgraded to that release, run
make upgradeagain, and file a ticket briefly describing the change that needs to be made in order to upgrade that package further. Similarly, if there are new features that the code depending on that package should start taking advantage of, file tickets explaining what should be done.
make upgrade, file a ticket).
Manually editing the
make upgrade output files or only running
pip-compile on a single file should generally be avoided, since it risks
failing to account for changes in indirect dependencies or making the
different requirement files fall out of sync with each other. And in general,
we would rather err on the side of using newer versions of dependencies than
strictly necessary, rather than avoiding upgrades for fear of breaking things.
If the developer is not confident of their ability to assess whether a change
to the dependencies is appropriate, they should seek assistance from other
developers who are either more experienced or more familiar with that
make upgrade or
pip-compile will be unable to find a
suitable version of a dependency for the output file because there are
incompatible version constraints in the input files and/or the stated
installation requirements of the other dependencies. In cases like this,
--verbose) flag to
pip-compile to get more
detailed information about which dependencies imposed the conflicting
constraints, so you can decide which package(s) to upgrade or pin to resolve
the issue. Installing and running pipdeptree can also sometimes help
identify the problem.
As noted above, you should generally avoid installing requirements from a URL or local directory instead of PyPI. But there are a few circumstances where it can be appropriate:
In most other circumstances, the package should be added to PyPI instead. If you do need to include a package at a URL, it should have both the package name and version specified (end with “#egg=NAME==VERSION”). For example:
The practices outlined here help prevent the following problems that we have encountered in the past:
There are several good reasons for the recommendation to avoiding installing packages from URLs whenever possible:
pip-toolsstill has a bug in handling packages installed from local directories that requires special care to work around: relative local paths are expanded to absolute paths. This can be partially worked around via a post-processing script for the generated requirements files; an example can be found in edx-platform at
Many of the Open edX repositories have already begun to comply with the recommendations outlined here. In particular, repositories generated using cookiecutter-django-app should already be configured correctly. These may also be useful for reference:
pipenv is a relatively new utility for managing Python dependencies, written by Kenneth Reitz (author of the requests package). Although it recently became the default dependency management tool recommendation of the Python Packaging User Guide, it lacks some features that we strongly want for Open edX:
As a younger package than
pip-tools, it also seems to have more
significant still-unresolved problems, although those are gradually being