Manual:Domain events/Hierarchy

The root of the domain event hierarchy is the class DomainEvent. Direct subclasses of DomainEvent are typically abstract classes that correspond to an entity which act as the base class for all events emitted by that entity.

Events defined by MediaWiki core are listed below, grouped based on the entity that emits them. Any new event types must follow the modeling principles.

Page Events

Page Aggregate Event Hierarchy

Page events are events emitted by the page aggregate entity which represents a proper wiki page (not a special page) with all revisions that belong to it. The page aggregate is currently (MW 1.45) purely conceptual and not modeled directly in code. The application logic that guarantees data consistency by applying changes in an atomic manner is implemented in data access objects like PageUpdater and ArchivePage as well as page command classes like MovePage and DeletePage. These are the classes that will emit the relevant events.

Changes to pages are modeled by several concrete event types as well as some abstract types that group together related event types in a hierarchy:

  • PageEvent: Base type for all events representing changes to the page aggregate
    • PageHistoryEvent: The change affects historical revisions of the page, potentially including the current revision. This event type is currently (MW 1.45) not modeled explicitly but could be added in the future to act as a shared base type for events that affect historical revisions.
    • PageProtectionChangedEvent: The page was protected (or unprotected). This doesn’t affect the current state of the page.
    • PageRecordChangedEvent: Changes to the current state of the page. This kind of change requires the page to be rerendered.
      • PageIdentityChangedEvent: The change affects the identity (title or id) of the page. This event type is currently (MW 1.45) not modeled explicitly but could be added in the future to act as a shared base type for events that affect the page identity.
      • PageLatestRevisionChangedEvent: The latest revision of the page changed - typically because of an edit, but may be caused by other actions as well, such as rollbacks or page moves

We may want to add more events in the future, e.g. for representing changes to deleted pages (or rather, archived revisions) or to model changes to the page history through imports or partial deletions.

Outlook

Include LogEntry objects in events

Events represent changes to persistent state, and changes to persistent state are often recorded in the log. This is especially true for events that represent changes in the lifecycle of pages, such as deletion or rename/move.Since log entries are inserted into the database as part of the main transaction that affects the change, it would be sensible to include the resulting LogEntry object in the event as well. However, there are some complications with this:

  • LogEntries are also created for changes that are not related to pages, such as user blocks. So the log is not part of the content management domain. What logical domain do they belong to? Can page events from the content management domain know about them without introducing cyclic dependencies?
  • It would be nice to have an interface that can be shared by all event classes that contain a LogEntry, which defines a getLogEntry() method. However, logging of page creation is optional - so either the return value of getLogEntry() has to be nullable, or there have to be two versions of the PageCreated event.
  • Extensions can define additional types of log entries. So we would need a generic LogEntryAdded event. But that would be redundant to the events that model the specific type of change. And it's not clear what domain that event should belong to.

In the future, these issues can hopefully be resolved, so LogEntries become available from event objects where appropriate.

Media File Events

Media file events are emitted by media file entities. The application logic and integrity guarantees of media files are implemented in the various File and FileRepo classes. A media file entity is paired with an associated wiki page, and changes to one may trigger changes to the other. The exact nature of this relationship is yet to be determined.

Potential media file events:

  • MediaFileCreated
  • MediaFileUpdated
  • MediaFileDeleted

Page Rendering Events

Page rendering events represent changes to a rendered representation (ParserOutput) associated with a page. The rendered representation of a page is not part of the page aggregate, it is a separate entity. However, changes to a page state may trigger changes to page renderings (in the case of template transclusion, potentially a greater number of page renderings). Page rendering is performed by ContentHandlers but controlled mainly through DerivedPageDataUpdater, PurgeJobUtils and RefreshLinksJob.

Potential events for changes to page renderings (parser output):

  • PageRenderingInvalidated
  • PageRenderingGenerated

Extensions (and services outside MediaWiki) are frequently interested in events that relate to the rendered output of pages (that is, to the lifecycle of ParserOutput). In particular, it would be useful to fire events when the page rendering becomes invalidated (by updating page_touched) and when a canonical rendering becomes available (generally from RefreshLinksJob). For the latter it would be useful to include the full rendered HTML in the external (Kafka) representation of the event ("fat events"). If that should not be possible, access to the newly rendered output could be stashed in a temporary store for a few minutes using the render ID as a key. This temporary store would have to be accessible via an API, so services outside MediaWiki can make use of it.

As a side note, the PageRecord interface currently has a getTouched() method. That should be split out, becuse the touch date doesn't belong to the content management domain, but to the domain of renderings (parser, composition). The fact that page_touched is recorded in the page table is an implementation detail that should not be exposed in the domain model.

User Account Events

Potential user account events:

  • GlobalAccountCreated
  • LocalAccountCreated
  • UserBlockedUpdated

Decisions

This section documents design decisions made when modeling events. The decision records are intended to provide information about the rationale behind less obvious aspects of the event model.

Represent null-edits as a reconciliation request

Null-edits (purge-edits) should emit the same events as proper edits, even though they do not create a new revision and do not change the page’s content.

Null edits happen when a user opens a page for editing and then immediately saves the page again, without making any modifications. In this case, no new revision is created, but derived data updates (HTML rendering, links tables, search index, etc) are triggered as if the page had been changed. This re-generation goes beyond what can be achieved with the purge action, since purges only apply to rendered content (and information derived from the rendering), and does not affect information based on the raw page content. This is particularly relevant for non-wikitext pages, e.g. localization cache for message pages, module cache for JS pages.

In a perfect world, null-edits would not be needed and could be ignored: if the page didn’t change, no data needs to be updated. In practice, null-edits are used to manually force re-generation of derived content to fix data corruption due to bugs and outages (in other words, reconciliation of downstream derived data). They are a well established tool that the community relies on, though the mechanism could certainly be made less obscure.

Emitting the same events on null-edits that we emit on proper edits makes this process robust: extensions that update data after page changes don’t have to do anything to ensure they also update this data on null-edits. Treating null-edits like regular edits with respect to the events that get emitted makes code that doesn’t consider null-edits fail on the safe side, preserving the expected behavior for the user performing the null-edit.

Indeed, this concept seems useful for all kinds of events, not just for page updates: downstream derived data may get corrupted or out of sync, and it is useful to be able to trigger their re-generation using the same events that would normally have created that data. So we may have a reconciliation version of each event.

In general, listeners will process reconciliation events just like regular events. In some cases however, we may want to make a distinction, in order to avoid creating duplicates (like sending redundant emails) or to fully purge and regenerate derived data instead of performing incremental updates.

Represent dummy revisions as page updates

Dummy revisions change the page history, but not the page content. They are used to make changes such as renames visible in the page history and the RecentChanges feed. Since they update the page history, they need to be represented as PageRevisionUpdated events. However, since they do not change the page’s content, listeners will often want to ignore them. So the event object should expose a property that allows this in a straightforward way.

It should however be noted that null-edits, which also do not change the page content, should generally not be skipped, since the represent requests for data reconciliation (see above).