OEP-3: Asynchronous Task Management
||Asynchronous Task Management
||Jeremy Bowman <email@example.com>
||Eddie Fagin <firstname.lastname@example.org>
Proposes a system which allows users to initiate, monitor, and retrieve the
results of various long-running tasks in a consistent manner. This system
leverages existing libraries where practical and makes the development of new
asynchronous tasks straightforward.
Course import and export are currently implemented such that they occur in the
scope of a web request, but often take long enough to time out. This is not
only annoying for the user, but can lose the result of an export or leave the
user uncertain of whether it’s safe to start editing the newly imported course
yet. Clearly the work should be done asynchronously, but we do not yet have a
standard solution for asynchronous tasks which allow the user to track their
progress and/or retrieve output upon completion. Upon further discussion, it
became clear that other such asynchronous tasks, existing and planned, would
benefit from a consistent framework. Examples include:
- Re-running a course
- Sending bulk email to course participants
- Generating certificates
- Grading (rescoring, reset attempts, delete state, calculated grades CSV)
- Data downloads (grades, student profiles)
- Open Response Assessment
- Discussion/forum notifier
- Uploading and encoding videos (Video Abstraction Layer, VEDA)
A new open source Django application will be written to support a user
interface for managing celery tasks in a consistent manner across services.
The main goals of the application are, roughly in order:
- Allow a long-running task triggered by a user to be run asynchronously,
while it is clear to the user that they are free to do other work on the
site in the meantime
- Allow the user to fetch any output of the task after it finishes
- Notify the user when the task is complete
- Give the user some indication of how far along an in-progress task is
- Support prevention of multiple identical tasks from executing concurrently
Specifically, the initial implementation is expected to have the following
- Celery will be used to run the asynchronous tasks.
- Database models will be used to represent tasks as well as their input,
output, and current status. See the tentative schema presented below.
- Upon requesting the task, the user will be redirected to a page showing
the current status of that task (which can be reloaded to see if the
status has changed). This page will only be accessible by the user who
requested the task (and system superusers).
- If the task has any output artifacts, they can be retrieved from the task
status page. The links to these artifacts will be restricted to the user
who requested the task; in the case of an artifact stored in S3, the link
will be generated when the user views the status page and only be valid for
a configurable duration (to minimize risk of the URL being discovered and
used by others).
- When the task is complete, the task requester will be sent an email to that
effect (including a link to the task status page).
- The user will be able to cancel a task. Celery’s
command will be used to cancel a task which hasn’t started execution yet,
and the implementation of this OEP will include an API which tasks should
use periodically to determine if execution should be aborted.
- Tasks will be automatically deleted (along with any artifacts) after a
configurable timeout duration. This will help keep database table sizes
and disk utilization manageable.
- It will be possible to present a fan-out of Celery tasks instantiated to
perform work in parallel as a single task from the user’s perspective,
with a meaningful overall progress indicator. This feature could also
be used to perform a request as a sequence of tasks, if the overall
duration of a particular task type could be long enough to interfere with
timely Celery restarts if it was all done in a single task.
Later enhancements will likely include the following:
- Dynamic updates of the task status page without needing to reload it.
- A dashboard listing all tasks instantiated by the user.
- The ability to delete a completed task and its artifacts (hence removing it
from the dashboard).
- Push notifications of task completion via other methods in addition to
email (and a UI for notification preferences). This would likely be
contingent on the completion and implementation of another OEP specifically
for push notifications in Open edX.
- Support for tasks which can be monitored by multiple users with sufficient
access priveleges. Different permissions could be created for status
viewing, generated artifact access, the ability to cancel the task, and the
ability to delete the task.
Course import and export will utilize the initial implementation as soon as
it’s ready, and other long-running tasks will switch to use it as needed.
Celery provides a pretty full-featured and well understood platform for
running asynchronous tasks, but celery’s
AsyncResult doesn’t store all
of the information we need for the desired features. Rather than try to
straddle the data we access across celery’s result backend and the MySQL
database, we’ll just store everything we need in database models and let
celery manage and clean up its own result store as normal. Presented here is
a suggested initial design for these models (they may be altered over time
via migrations as more features are added).
Each task that we want users to be able to monitor in the UI will have a
corresponding AsynchronousTask record:
- created (DateTimeField)
- modified (DateTimeField)
- user (ForeignKey)
- action (CharField - “rerun”, “import”, “export”, etc.)
- name (CharField, usually auto-generated from the task parameters)
- state (CharField - “Pending”, “In Progress”, “Succeeded”, “Failed”,
- finished_steps (PositiveSmallIntegerField)
- total_steps (PositiveSmallIntegerField)
- attempt (PositiveSmallIntegerField, starts at 1 for the first attempt to
execute the task)
If the task generates information which the user needs to be able to retrieve
once it finishes, they are stored in AsynchronousTaskArtifact records:
- created (DateTimeField)
- task (ForeignKey)
- name (CharField, used to distinguish multiple artifacts generated by the
- file (FileField, backed by django-storages)
- url (used if the output is a web page instead of a downloadable file)
- text (used if the output is a relatively small amount of text, such as an
error message or the ID of a record in an external system)
If there is a desire to impose database constraints on the parameters of a
task or just preserve them for future reference, a suitable model customized
for each task may be added. For example, CourseRerunParameters might contain:
- task (ForeignKey)
- source_course_key (CourseKeyField)
- course_key (CourseKeyField)
- display_name (CharField)
Celery is already in use for other asynchronous tasks, and makes a logical
choice as the underlying task execution engine. Course import and export
timeouts are already happening relatively often, so there is a desire to keep
the scope of the initial implementation small enough that it can be finished
Similar functionality has already been implemented for
and assorted LMS actions such as
bulk email and grade downloads,
but as one-off implementations that share no code. The new
implementation should combine the best features of these in a form that can be
easily reused for all the user-triggered asynchronous tasks in any service.
The Jobtastic add-on for Celery
includes some potentially useful functionality for status tracking and error
handling, but it’s not yet clear if it would be suitable to include as-is for
this system or just serve as a source of ideas. There doesn’t seem to be a
good existing library providing any user interface elements for managing
celery tasks which isn’t geared towards system administrators, although some
of the code in the task dashboard from
Flower may prove useful.
- Some documentation that deals with course import and export will need to be
updated to describe the new workflow. The task status page should contain
enough information to be pretty self-explanatory, though.
- There will be migrations to add new tables for task tracking.
- Existing implementations of asynchronous task management will probably be
gradually switched over to this new system. There’s no particular rush to
do so, but standardizing on a single implementation should yield a number
of benefits with respect to functionality and ease of maintenance. Any
partners who have implemented similar systems should be given the
opportunity to offer input on the new implementation, and may desire support
in switching to it when appropriate.
- There are a lot of asynchronous tasks already in the
package of the LMS, so particular care should be taken to produce something
compatible with that in both code and behavior (in order to simplify the
aforementioned standardization on a single implementation). The existing
code is local to the LMS and makes some assumptions about what kind of tasks
can be run, so it can’t just be used as is.