|Authors||Nimisha Asthagiri <email@example.com>|
|Arbiter||Brian Wilson <firstname.lastname@example.org>|
|Review Period||2018-11-29 - 2018-12-20|
Currently, the Open edX system captures LMS interactions in persisted event streams (tracking logs) of documented proprietary JSON structures, as specified in Events in the Tracking Logs. The events are emitted by backend servers and frontend apps, including web and mobile apps. The logs are shared and used asynchronously for various data analyses and processing.
Of late, there are emerging use cases that require notifying external systems of these LMS events in real-time, preferably in industry standard formats (such as xAPI and Caliper). By real-time events, we mean pushing data as fast as possible in an automated and synchronous communication at a speed within a few hundred milliseconds (to satisfy human perception). Use cases, such as Adaptive Learning, do require this sub-second communication time. For other use cases, however, near-real-time communication (with a delay of multiple minutes) is sufficient. Unless explicitly called out in the document, near-real-time is included when we refer to real-time. Both are orders of magnitude faster than the current batch processing available via the tracking logs.
A transition to real-time events is a natural evolution of Open edX’s eventing and API capabilities and its impact on connecting users, organizations, and learning services. In fact, Gartner reports that by 2020 50% of managed APIs in the industry will be event-driven APIs as necessary supplements to RESTful extensions.
This document describes a design proposal for supporting standard real-time events in the Open edX system.
Although the remainder of this section enumerates some of the immediate use cases, for the time being we will focus our efforts on the needs for adaptive learning, while bearing in mind that the general framework for real-time events will be extended and reused.
Requirement: Real-time communications (sub-second)
As captured in Open edX Adaptive Learning, adaptive learning in Open edX will enable the platform to respond to a learner’s interactions in real-time, automatically providing the learner with individualized support. Data analysis from earlier experimental attempts on edX.org and elsewhere show how adaptive learning mechanisms can significantly improve the learner’s efficiency, engagement, and retention, while providing a more effective learning experience with metacognitive support for life-long learning.
As an iterative path to this future, the immediate goal is to enable integrations with external adaptive engines. In the far future, we may also implement our own open-sourced adaptive engines. But in the meantime, we would like to establish standard APIs that (1) adaptive engines can use to receive real-time events from the Open edX LMS, (2) the Open edX LMS can use to query adaptive engines on what to present to learners, and (3) train adaptive engines based on standard events collected from multiple sources. This document focuses on #1 with the need to send scalable and real-time updates to adaptive engines. This is depicted by the left-hand flow in the diagram below. #3 is described in LRS (Future).
Requirement: Near-real-time communications (minutes)
As an integration option for Enterprise clients, real-time events allow the Open edX system to immediately notify Enterprises of high-interest learning activities, such as course enrollments and completions, for learners in their organizations. Note that typically Enterprises favor xAPI as their standard of choice.
Requirement: Near-real-time communications (minutes)
Real-time events also allow the Open edX system to immediately notify Universities of their high-interest learning activities, such as unenrollments. Note that, in contrast to Enterprises, Universities typically favor Caliper as their standard of choice.
Requirement: Near-real-time communications (minutes)
Another emerging use case is allowing unification of events across multiple LMS’s. As a specific example, an organization that uses multiple Open edX instances (as multiple options for their learners or for distributing their reusable content) may want to accumulate all their LMS events in a single Learning Record Store that is then queried in real-time.
Requirement: Real-time (sub-second) or near-real-time (minutes) communications depending on integration type
As we move towards creating a more extensible platform for Open edX, we want to provide integration options between the Open edX platform and external services. External services that enhance the learning, authoring, or LMS administrative experiences will benefit greatly from a real-time eventing framework. Some examples include an external social learning service and a commercial learning analytics tool. Theoretically, the other use cases are specific cases of this use case.
This section provides some of the technical details of the design proposal.
In order to enable a plug-n-play ecosystem with various external services (adaptive engines or otherwise), we will support both the ADL‘s Experience API (xAPI) standard and the IMS Global‘s Caliper Sensor APIs standard as the communication protocol between the Open edX LMS and registered external services.
For details on integrating with xAPI, please see the xAPI Real-time Events design document.
The LMS user_id will be used to uniquely identify a user in the Open edX system. This decision is detailed in `OEP-32: Unique Identifier for Users`_.
While the exact technical framework and tools for supporting real-time events at scale is to be determined, we can begin with implementing the various components of the eventing service from a modular perspective so they can be adopted into any scalable infrastructure (e.g., an Apache framework). So while we propose a framework in this section, each subcomponent is expected to be independent and modular so it can be recomposed as needed.
The proposed framework integrates into and builds upon the features of the Open edX Event Tracking library. The library’s RoutingBackend provides powerful and flexible tools with its two fundamental building blocks of processors and backends. The diagram below depicts a possibility of using these tools to implement our real-time Eventing subsytem.
Here is a description of each subcomponent in the Eventing subsystem:
The Synchronous Real-time Eventing Backend serves as an integration point between the Open edX system and the rest of the Eventing subsystem. It plugs into the Open edX Event Tracking framework and runs within the LMS Process.
The Asynchronous Real-time Eventing Routing Backend is a RoutingBackend component type that executes in an asynchronous process. It is configured to support multiple backends - one for each communication protocol. For example, there would be a communication backend for Caliper and another one for xAPI.
The Filter Processor component serves multiple purposes:
The complexity of this component may increase over time as we support (a) higher degrees of customization by consumers, such as filtering for only certain types of events, (b) varying degrees of permissions to activities, and (c) multi-tenancy requirements with consumers and providers associated with different organizations.
For the first iteration, we need the following permissions:
In the future, we may need the following:
Each communication protocol has its own Translator component. This component is responsible for translating from an Open edX event schema to the schema for the target communication protocol.
The validation component ensures we continue to support the event output schema expected by all participating real-time eventing consumers. In many ways, this is similar to what consumer-driven contract testing would enable and uses similar design principles. Essentially, a common validation library can be collaboratively maintained by participating consumers, including consuming adaptive engines. This ensures an end-to-end integration that can be maintained going forward.
The router forwards events to interested consumers, using the target communication protocol.
The administration component is responsible for allowing the Open edX site administrator to configure the list and permissions of registered event consumers.
Eventually, the registry of consumers may become a self-service portal where consumers may initiate their request for access. However, initially, this can be a simple interface, possibly implemented in Django Admin.
Event-driven APIs at Scale - The Context section describes the motivation and recent use cases for supporting real-time events. This capability, along with Frontend Pluggability (OEP - TBD), has the potential to provide a dramatic shift in how external services can integrate and extend the Open edX system.
A big consideration and concern that is sorely missing from this version of the OEP is explicit recommendations on the infrastructure that will be used to support scalability. On one hand, the advantage is that this agnostic approach allows Open edX instances to reuse the core capabilities (and modular subcomponents) without being tied to a specific scalable technology. On the other hand, we run the risk of needing to reimplement initial implementations if a chosen technology’s design is fundamentally counter to our choice of boundaries.
Emphasis on user privacy - We are taking a conservative approach by minimizing the PII that is sent to consumers. The trade-off is that consumers may find the received user identifiers limiting. However, at this time, it’s unclear whether adaptive engines, which are written generically for all users, need PII to be effective. They need the ability to bind events together and track pathways and progress for users, but they can do so with any unique identifier - hence the introduction of the `Anonymized User ID`_.
For Enterprise and other use cases, sharing PII may be required. We have chosen to keep those use cases in mind, but not target them initially, with the understanding that future work would be needed to address those needs.
Deferring implementation of an LRS - As mentioned in LRS (Future), we are consciously postponing implementation of an Open edX specific LRS at this time. Although the need for an LRS may be forthcoming, this initial iteration defers this work.
As a consequence, adaptive engines may need to maintain their own LRS if they need to refer back to previous events. Given our business research to date, it seems many adaptive engines are already maintaining their own custom-optimized storage of event data.