Archiving edX GitHub Repositories
Nimisha Asthagiri <firstname.lastname@example.org>
The openedx organization contains a large number of repositories, most of which are active and maintained, but some of which are now obsolete. To clarify the status of repositories, a process for archiving a repository is defined below.
Recently openedx.yaml files were added to edX repositories per OEP-2. In the course of deciding owners for those repositories, there was an ORA PR Discussion about how best to handle deprecated or obsolete repositories. In particular, do obsolete repositories need owners, and how can repositories be clearly marked as present for archive purposes only?
This discussion resurfaced related to edX’s usage of Gemnasium to report the usage of third-party libraries that have known security issues. All repositories under the Open edX organization were being monitored, but this added noise when trying to understand the number of third-party library updates required for actively maintained repositories.
When a repository under the openedx organization will no longer be maintained by the Open edX community because it is no longer in use by the latest version of the Open edX platform, the following steps should be followed.
This process is not for repositories that are currently still in use by either the latest release or the master branches of the Open edX platform. If a repository is in use, but planned to be removed, it should go through the deprecation process and when that is completed it can be archived as described by the process below.
First, if the repository is public, and a part of Open edX releases, follow these steps to see if anyone would like to take ownership of it:
Post a notice to Open edX Deprecation Announcements announcing that the repository will be archived, and inquiring if anyone would like to take ownership of the repo. Cross post to the
#open-edx-proposals channels in the Open edX slack. If there are no responses after 2 work days, skip to Archive Steps.
If someone does wish to take ownership of the repository, post a new notice to Open edX Deprecation Announcements, clearly indicating who the new proposed owner is, how much time they have to spend maintaining the repo, and when the transfer will take place. Cross post in the above mentioned Slack channels. Wait at least 2 business days before proceeding.
Create a new GitHub Request on the tCRIL board for the repository to be transferred to the new organization.
These steps should be followed for all repos within the Open edX organization (forks included). After some experiments with keeping archived repos in the
openedx organization, we’ve learned that having abandoned code show up in searches hinders work to understand the current state of the system and the risk around new work, particularly deprecations and API changes. Thus we decided to move all archived repositories to a separate org.
Update the README.rst file in the repository to add a brief note about why the repo is being archived, and what is serving as its replacement (where applicable). This may be as simple as a linking to the appropriate DEPR ticket.
Unless you have the relevant permissions to perform this step, create a new GitHub Request on the tCRIL board and ask them to do the following:
Archive the repository per GitHub’s archive process
Move the repository to the openedx-unsupported organization
If the repo is not coming from the openedx github org then before moving it, rename it with a prefix of the source org’s name. For example the
notifierrepo in the
edx-solutionsorg wolud be renamed to
Over the lifetime of Open edX, we may fork the same external open source repository multiple times. In this case, we may need to archive the fork multiple times as we move between our fork and following upstream. When this is necessary, if possible un-archive the old fork and update it. If you’ve already made a new fork, delete the old copy of the fork before you move the new repo to openedx-unsupported.
This may break some older version of Open edX. Some combination of copying branches between forks, renaming branches, and changing unsupported versions of Open edX would have to be done to keep things working. We opt not to take on this extra work by default though may do so under extenuating circumstances.
One such circumstance is if the previous fork is being used by a supported Open edX named release. In this case, one option would be to port any referenced branches in the old fork to the new fork before deleting the old fork.
We previously archived in place and move to this previously rejected alternative based on lessons learned in going through the deprecation process and major upgrades (Python 3, Django 2.x)
openedx organization is no longer littered with unsupported/obsolete repositories.
GitHub search results within the openedx organization do not include matches in archived repositories. This could decrease confusion, especially since repo descriptions are not included in search results.
Gemnasium monitoring may cease automatically (although this would need to be confirmed).
Pattern followed by Facebook, and thus might be familiar to others.
(see Rejected Alternatives for other options we considered).
This proposal does not introduce any backward compatibility issues.
There are a couple variations of this proposal that were originally discussed. Many of the steps of updating documentation and notifying the open source community are essentially the same; the major differences from the proposed process are outlined below.
Use Github’s archive feature and updated documentation to archive the repository in place.
Old code hasn’t moved so it can be easily found.
Old code can show up in searches to find historical context.
Through some experience with this method, we’ve learned that it’s less valuable than we expected.
Being able to know whether code is alive or dead is really helpful when making major changes and if dead code can’t easily be filtered from searches it slows us down.
Move the code from the master branch to an archived branch, while leaving the repository itself within openedx organization.
No need to create and maintain a new organization.
Gemnasium monitoring will cease automatically.
No help tickets to IT or DevOps are required.
This pattern was recommended on Anselm Hannemann’s blog, though it is not known how many organizations (if any) have adopted this process.
Non-intuitive, and could be confusing for developers to understand the state of the code, as cloning the repo or viewing it on GitHub would show an empty repository (Note: this could possibly be improved by changing the default branch for the repository, but that might reintroduce the Gemansium monitoring issue).
It is unclear what the implications would be for any existing forks.
Added steps for repositories that live in the edX org, but are forks of other, independent repositories
Updated to use GitHub’s archive capability.
Don’t ask the community about public repos in the edx org that are not a part of Open edX.
Decide to use the new edx-unsupported org for all archived repos. Old way we were doing things is now recorded as Alternative 1: Archive In Place.
Updated to provide more details around archiving the same fork multiple times.
Removed step of adding
[ARCHIVED] to the repo name. Github’s “archive this repo” setting is now available and is a sufficient indicator.
Removed step of adding paragraph to README about what archiving means now that we use Github’s “archived” marker; the concept of an unmaintained repository and its dangers should be familiar to developers. Keep recommendation to add an explanation of why it was archived.
openedx.yaml update steps, since the rest of the archive process is sufficient.
Update instructions to use the openedx-unsupported repo instead of the edx-unsupported repo.
Change references to
edx GitHub org to
Change internal edX procedures to community-based ones