Wikimore

The Domain Event System is intended to enable event oriented processing within and eventually around MediaWiki. Events become first-class concepts for connecting MediaWiki core components and extensions as well as integrating MediaWiki with other services in a distributed system.

Domain Events are designed to replace a certain type of hook as an extension interface, and to provide a mechanism for decoupling core components using the observer pattern. Eventually, MediaWiki should become able to broadcast events to and receive events from other wikis and other services.

The goal of the Event Dispatcher project in FY24/25 (as part of KR 5.2) was to create a capability for emitting events that reflect MediaWiki state changes, which can be received by in-process listeners in other components in core or in extensions. Options for broadcasting events and receiving events from other services are investigated and taken into consideration for the design of the domain events framework.

The overall motivation for introducing domain events is to make MediaWiki and extension development more sustainable:

Improve component boundaries between core components by applying the observer pattern (aka listener pattern). Listeners remove the need for code that affects a change to know about all code that needs to be informed about it.
Clarify the semantics of extension callbacks invoked as a result of a change, particularly with respect to transactional context. Standardize deferred update behavior, which will reduce boilerplate code and risk of misimplementation.
Make the extension interface more future proof by avoiding the rigidity imposed by using PHP interfaces to define hook parameters. Due to limitations of PHP, method signatures defined by extensions can’t be modified in a backwards-compatible way.
Prepare for the creation of a generic relay mechanism for broadcasting events over an event bus. Broadcasting itself is not in scope for the initial phase, but accommodating that use case is a design goal.

The design follows the idea of domain events as defined in domain driven design: these events represent changes maintained by a given component (or bounded context). This matches with two patterns that have evolved organically in our eco system: on the one hand, certain hooks, according to the survey we conducted in Q1 (WE 5.2.1); on the other hand, events emitted by the EventBus extension which are then handled by Wikimedia’s Event Platform based on Kafka.

Hypotheses':

Using DomainEvents for propagating updates between core components will reduce coupling
Using DomainEvents instead of hooks in extensions will reduce the amount of boiler plate code
Using DomainEvents instead of hooks in extensions will make it easier to make changes to core, by removing the need to keep hook interfaces stable.
MediaWiki DomainEvents will make the EventBus implementation easier and more sustainable by converging internal and external event models. By aligning the internal objects, we will be able to reduce transform logic within the extension.

Technical requirements

Provide an event dispatcher framework in MediaWiki that allows core components and extensions to communicate using the listener pattern (aka observer pattern).
Allow extensions to register listeners declaratively using their extension.json file.
Implement a dispatch mode that invokes listeners only after the transaction from which the event was emitted has been committed successfully. This should guarantee listener execution should be guaranteed if the PHP life cycle did not terminate abnormally.Constraints:
We need to retain backwards-compatibility with existing hooks until they have passed the deprecation period and can be removed.
The new framework has to be performant and scalable, so it doesn’t cause undue load on production infrastructure or an unacceptable increase in response latency.

Vision for the future

Implement an invocation mode that guarantees listener invocation even if the current PHP process terminated abnormally (transactional outbox).
Provide a mechanism for broadcasting events to other services and wikis
Provide a mechanism to receive events from other services and wikis
Provide a mechanism that allows listeners to schedule long running tasks for later execution
Provide a mechanism that allows listeners to implement retry logic when updates fail

Comparison to hooks

Domain Events are intended to replace the use of hooks for use cases where an extension needs to be notified about a state change in MediaWiki core (or another extension).

Domain Events offer several advantages over relying on the hook system:

Theme	Events	Hooks
Data flexibility	Events will be defined as objects, meaning extending the object model with more data will not cause a breaking change nor explicitly require updates for listeners. We may also follow normal method deprecation patterns when removing data.	Hooks are implemented as PHP methods defined by PHP interfaces, meaning the method signatures cannot change in a backwards compatible way. Instead, a new hook must be created every time we need to change something about the data surfaced to extensions.
Invocation semantics	Events listeners are invoked after the main transaction has successfully been committed and the response has been sent to the client. Listeners cannot interfere with the main transaction and they cannot influence the response. Listeners can safely assume that the change they are being informed about has been persisted to the database.	By default, hooks use immediate dispatch and handling, typically during the main database transaction. This means that listeners can interfere with the transaction, and influence the response sent to the client. They cannot be sure whether the change they are being informed about will be rolled back. Deferred processing can be implemented in a given hook handler, but dispatch preferences are implemented in a manual and ad-hoc way across handlers, resulting in duplicative and error prone code.
Encapsulation	Events will apply the observer pattern in core as a means of decoupling components. Events are immutable in nature, and can be ‘blindly’ delivered to listeners, without knowledge of how it may be used.	Hooks have very few restrictions in what they can do, making them hard to reason about without full context of what else is using the hook. This results in extensions interacting in surprising ways, or subtle changes in the MediaWiki platform unexpectedly breaking extensions.
Publishing & broadcasting	Since events are modeled as value objects, it becomes much easier to map them to an external representation for publication to a bus.	Hooks are executed as callbacks natively in code, each with its own unique method signature. This prevents generic routing and instrumentation, as hook signatures often include objects that cannot be serialized in a straight forward way.
Reuse & consumer orientation	Events will be bound to the specific effect resulting from the state change event. Events types will err on the side of inclusive and generic triggers (eg: page content changed) and metadata included in the given event can be used to differentiate nuanced use cases if necessary. In some cases, specialized events may be provided, such as importing page revision history without changing the current content.	Hooks are often defined ad-hoc to address the need at hand. They typically bind strongly to the context they are triggered from. Together with the fact that hook signatures cannot be changed later without breaking handlers, this results in many hooks that fulfill roughly (but not exactly) the same purpose. This confuses the intended usage, resulting in inconsistent utilization across extensions and more complicated change and deprecation paths.
Hierarchical modeling	event types form a hierarchy, so listeners registered for a more generic event will also receive the more specific events. This provides flexibility with respect to the granularity of modeling - we can always introduce a subtype or a parent type to adjust granularity, without breaking compatibility.	There is no semantic relationship between different hooks. If a hook turns out to be too generic or too granular, the only way to address this is to define a new hook, deprecate the old one, and migrate all handlers.

Solution strategy

To solve the problems listed above, we need a new, robust interface to enable further decoupling. We are starting with addressing the event-like hooks identified through the FY24Q1 hook survey. We believe that by having a robust event object interface, we will be able to improve MediaWiki sustainability by reducing brittleness and enforcing better component boundaries. The initial approach covers dispatching events to listeners within the same PHP process. This will make the events available to MediaWiki core components in addition to offering events as a new extension interface.

In order to introduce the concepts of events listeners into MediaWiki core, we define interfaces for emitting events and for registering listeners. We implement a mechanism for dispatching events to the relevant listeners if and when the change in state has been successfully committed to the storage layer.

The domain event infrastructure can be implemented as a self-contained system, built on top of existing infrastructure such as DeferredUpdates in a straightforward way. It is designed in such a way that allows us flexibility to change the implementation later, without affecting listeners.

Existing extensions using hooks remain functional without any change. This enables us to experiment with the new system on a small selection of extensions and core components, and to adjust the interfaces and implementation as we expand usage.

Architecture

The architecture for dispatching Domain Events to in-process listeners follows established patterns for event processing, with the distinction that, in constrast to event dispatching in most web frameworks, event delivery is deferred until after the main database transaction has been completed and the response has been sent to the client.

Important concepts:

Event Objects are defined by subclasses of the DomainEvent class. Event objects are value objects and should be serializable. They should represent the outcome of an event as well as the parameters that went into triggering it. Each event object has an event type. See Manual:Domain events/Hierarchy for more details and examples.

Event Listeners are methods that are invoked by the Dispatch Engine after an event of a certain type ocurred. That is, the listener is invoked if it was previously registered with the Event Source and then an Event Object of the correct type was emitted through the Event Dispatcher. Listeners are typically implemented as methods on an Ingress Object.

The Event Dispatcher is a service object that implements the DomainEventDispatcher interface. It can be used to emit event objects which will later be delivered to listeners. See the section on Event Dispatching and Listener Invocation below.

The Event Source is a service object the implements the DomainEventSource interface. It offers methods for registering listeners for any type of event. However, registering listeners directly with the event source should be the exception. Typically, this is left to an Ingress Object declared in extension.json.

Ingress Objects extend the EventIngressBase class and implement Listener methods. They act as an adapter between the logic inside a component or extensions and the models used by incoming events. Declaring Ingress Objects in extension.json is the preferred way for extensions to register listeners. Such ingress objects are instantiated lazilly, when an event of the relevant type is dispatched. See Subscribing to events for details.

The Dispatch Engine is the concrete implementation that routes events to registered listeners - it implements both DomainEventDispatcher and DomainEventSource. Understanding that Event Dispatcher and Event Sources are different perspectives on the same object is important for understanding the flow of information. However, application logic will always interact either with one interface or the other, never with the engine as such.

Event dispatching and listener invocation

When application logic calls the dispatch() method on the Event Dispatcher interface, the Dispatch Engine does the following:

Get the event type chain from the event object, and perform the actions below for each type. This ensures that each event is dispatched under its own type as well as all parent types.
Check if there are subscribers (ingress objects) registered for the event type that have not yet been instantiated and applied. If so:
- Instantiate the subscriber (ingress object)
- Initialize the subscriber, injecting information from the object spec
- Apply the subscriber by calling its registerListener method(). This causes the subscriber to register its listener methods with the Dispatch Engine, which is passed in as an Event Source.
Find all listeners registered for the given event type.
For each listener, create and schedule a DeferredUpdate
- The DeferredUpdate wraps a closure that, when executed later, will invoke the listener and pass it the event object
- The DeferredUpdate gets a reference to the emitters transactional context, represented by an IConnectionProvider. This allows the DeferredUpdate to check whether the transaction from which the event was dispatched was successful. If it wasn't, the DeferredUpdate is canceled.

After scheduling the DeferredUpdates, controll is returned to the code that emitted the event. No listeners have been invoked yet, so listener code cannot interfere with the transaction nor can it delay or influence the response that will be sent to the client.

Eventually, after the main transaction was committed and the response has been sent to the client, the DeferredUpdates get executed. If the main transaction was successfull, each update will invoke one listener. Otherwise, no losteners are called.

Architecture decisions

See also discussions on https://phabricator.wikimedia.org/T379959

Build on top of DeferredUpdates

Deferred, transaction-bound dispatch should be built on top of DeferredUpdates. They already provide a queue, an execution mechanism, a way to cancel on failed transactions, and transactional isolation of updates/listeners. There’s no reason to write that from scratch. However, this is just an implementation detail - we can change the implementation later, the interface is not bound to DeferredUpdates.

Side note: The concept of listeners that get invoked only after and if the current transaction is successfully committed is not unusual for a web application framework. This concept is established e.g. in Laravel and in Spring.

Do not use PSR-14, Symfony or Laravel

The interfaces defined by PSR-14 only allow for synchronous dispatch, which is insufficient for our use case.

The event frameworks defined by Symfony and Laravel support different invocation modes, including transaction-bound listeners. They are indeed pretty close to what we need, and serve as an inspiration. However, we will not use them directly, because:

They rely on reflection for listener discovery, which is relatively slow
We want lazy initialization using ObjectFactory, to further reduce latency
We want integration with existing infrastructure like DeferredUpdates, JobQueue, and EventRelayer, which would be awkward if possible at all.
Because of this, the re-usable part would be small, the similarities potentially misleading - we’d have to “twist” the frameworks to make them fit our needs.
So we’d start depending on a sizable library for little gain.

Focus on transactional updates

The design of the event system is focussed on informing other components, extensions, and (eventually) other processes about changes to persistent state. The assumption is that events are emitted from code that is already engaged in comparatively expensive write operations, so the overhead of handling and (eventually) broadcasting events can be allowed to be relatively high.

This acknowledges that there is likely a need for a more light weight mechanism for emitting events at higher volume and frequency, but with reduced guarantees with respect to reliability. We already have such systems for emitting logs and metrics, and it seems reasonable to assume that we may want to generalize and consolidate these systems with each other, while maintaining a conscious distinction from a system that emits events to inform about transactional changes.

The decision to make this distinction is rooted in the idea that we will need separate systems that optimize for different positions in the tradeoff-space defined by the CAP theorem, specifically between latency and reliability.

Invoke listeners only after commit

The contract of the event dispatcher interface should guarantee “fire and forget” semantics: the emitter does not have to consider the behavior of listeners, and listeners cannot interfere with the current transaction.

This can be achieved by invoking listeners only after the current transaction has been committed successfully. This implies that synchronous invocation of listeners is not supported.

This also implies that there is no way for listeners to update persistent state in a way that guarantees consistency with the main transaction: if the update that the listener is trying to perform fails, the main transaction remains committed. This can be mitigated by implementing retry logic in the listener. The easiest way to achieve this is to perform the update in a job, since jobs have retry logic built in. Implementing retries however comes with the complexity of handling repeated execution and out-of-sequence execution.

If it turns out that we do need to support invocation of listeners within the current transaction, this option capability should be added in a way that allows the emitter to explicitly opt in, at the cost of losing “fire and forget” semantics.

Use the “subscriber” pattern instead of raw listeners in extension.json

The “subscriber” pattern is borrowed from Symphony and Laravel. A subscriber combines several related listeners into one object, and is responsible for registering these listeners with the event source. It’s similar to the way we group hook handler methods into handler objects.

In extension.json, a subscriber would be registered as an object spec, using the same schema we use to define other kinds of objects in JSON:

{

   "DomainEventSubscribers": [
       {
           "events": [ "PageUpdated", "PageDeleted" ],
           "class": "MyExtension\\MySubscriber",
           "services": [ "PageStore" ]
       },
   ]
}

The subscriber class would implement the DomainEventSubscriber interface:

class MySubscriber implements DomainEventSubscriber {

   public function __construct( PageStore $pageStore ) {
       $this->pageStore = $pageStore;
   }

   public function registerListeners( DomainEventSource $source ) {
       $source->registerListener(
           PageUpdatedEvent::TYPE,
           [ $this, afterPageUpdated’ ]
       //…
   }

   public function afterPageUpdated( PageUpdatedEvent $event ) {
       $userName = $event->getNewRevision()->getUser()->getName();
       $isBotEdit = $event->hasFlag( Flags::BOT );
       //…
   }
}

We can define a default implementation of DomainEventSubscriber that can be used as a base class, to avoid boiler-plate code:

class MySubscriber extends EventSubscriberBase {

   public function __construct( PageStore $pageStore ) {
       $this->pageStore = $pageStore;
   }

   public function handlePageUpdatedEvent( PageUpdatedEvent $event ) {
       $editorName = $event->getNewRevision()->getUser()->getName();
       $isBotEdit = $event->hasFlag( PageUpdatedEvent::FLAG_BOT );
       //…
   }
}

Objects based on the EventSubscriberBase class get the list of events injected when they are constructed by the eventDispatchEngine. EventSubscriberBase implements registerListeners() to register a listener method for each event, based on a naming convention.

Note that the subscriber spec in extension.json must always contain the list of events to subscribe to. This is needed for two reasons:

Firstly, we want to be able to apply lazy initialization of subscribers and listeners, to minimize response latency. This means that we instantiate and apply the subscriber only when one of the events it has registered for is triggered. But that means we have to know the list of events in advance, we can’t leave it to the registerListeners() method to determine this programmatically.

Secondly, we want a mechanism to determine which extension is subscribing to which events. For the subscription mechanism itself, we could use a static class function or constant to provide the list of events that should trigger instantiation of the subscriber. However, that would make it much harder to generate a complete survey of all events used by an extension, or detect which extensions use a given event.

Provide interfaces for listeners to implement

When implementing hook handlers, extensions can make use of the hook interface that defines the signature of the handler method. The same can be done for event listeners.

However, Listener interfaces should be considered purely a convenience, to provide auto-completion in IDEs. The domain event framework does not require or use them.

When defining listener interfaces, the name of the listener method should follow the pattern required by EventSubscriberBase, namely handle{eventType}Event.

Pass an options array when registering Listeners

The interface for registering listeners should allow for an options array to be passed along with the listener. Supported options may depend on the specific dispatcher implementation. Passing options as an (associative) array rather than using a stricter mechanism such as individual parameters allows us to add support for options later, without the need to modify the interface used to register listeners.

While initially no options are supported, there are a number of things that could be covered by this mechanism in the future, such as:

Listener priority
Error handling and retry behavior
Filter callbacks
Message streams for receiving remote events

Ingress-objects and wiring

The subscriber pattern can be used by core components as well, to listen to events from other components. The subscriber object containing the listeners that a given component uses to process events from other components can be considered the component’s “ingress” interface. It can act as an “anti-corruption layer”, isolating the domain model of one component from knowledge about the domain events emitted by other components.

For example, the component responsible for tracking edits in RecentChanges may define an ingress object that listens to events emitted by the content component. This allows the domain model in the change tracking component to remain isolated from knowledge about content events, while also allowing the content component to remain oblivious to the existence of RecentChanges.

The goal is to have one subscriber for the entire component, as opposed to letting individual classes (service objects) subscribe separately. This way, only the ingress object has to know about events defined in other domains, serving as an adapter.

Type hierarchy

Event types should form a hierarchy, so listeners registered for a more generic event will also receive the more specific events. This provides flexibility with respect to the granularity of modeling - we can always introduce a subtype or a parent type to adjust granularity, without breaking compatibility.

Modeling events as a hierarchy of types is in line with common practice in other frameworks that make use of events.

In cases where a hierarchy proves too inflexible, it is also possible to define interfaces that are shared by multiple event classes across the hierarchy. The interface would have its own event type, so that listeners can subscribe to that type and receive an event object that implements the corresponding interface. This option adds a degree of freedom to the modeling, but should be used sparingly. Having many interfaces that can be mixed and matched may make it harder to understand which event types an extension should subscribe to, and how.

For the hierarchy of events defined by MediaWiki core, see Manual:Domain_Events/Hierarchy.

Do not implement a job-based asynchronous invocation mode for listeners

Based on observations of usage patterns in existing hook handlers as well as the example of event systems defined in web frameworks, it initially seemed useful to support listener invocation through the job queue mechanism. However, further investigation showed that hook handlers typically schedule jobs only for a small subset of their invocations, after checking conditions and exiting early if they are not met. Based on this observation it seems wasteful to schedule one or more jobs for each event, just to have the vast majority of them bail out without doing any work.

However, the need for listeners to schedule jobs is compelling, and the amount of boiler plate code that is currently needed to schedule jobs is considerable. For this reason, we should provide a mechanism that allows listeners to schedule code for execution via the job queue without the need to implement a dedicated job class.

This could be achieved by registering subscriber objects by name, so a generic job can be implemented that can invoke arbitrary methods on named subscribers. This would allow listeners to schedule another method on the same subscriber object for execution via the job queue by calling a convenience method like $this->callAsJob( ‘someOtherMethod’ ).

No strong delivery guarantees

While strong delivery guaranteeds are desirable, they add complexity, operational cost, and latency. Based on the fact that "best effort" semantics have been "good enough" for existing use cases implemented using hooks and deferred updates, it seems like the event system does not need to provide strong delivery guarantees at this time.

The following guarantees were considered:

at-least-once delivery: Ensure that events get delivered to listeners even when the process that executed the transaction during which the event was emitted terminates unexpectedly due to a critical failure. This could be implemented using the transactional outbox pattern, at the cost of additional database writes, added code complexity, and increased latency. Delivery within the original execution context (PHP runtime/process) could not be guaranteed.
ordered delivery: Guaranteeing that events get delivered in the order they were emitted is hard when at the same time we only want to invoke listeners after the transaction during which the event was emitted has been completed. It could be implemented if all event processing was doing through a message queue. Howeverr, that would mean that listeners are never invoked in the original process, and delivery may be delayed significanty. Delivery would only be complete after all listeners succeeded, and a single failing listener would block the delivery of all subsequent events (for the same entity, or site, or event across sites).

With the current implementation, it is estimated that about one in a million events may be lost due to unexpected failure. Out-of-sequence delivery may occurr somewhat more often, but should still be quite rare, depending on the type of event. Concurrent edits to the same page would be the most likely cause of this kind of issue.

Note that delivery issues, while rare, are likely to occurr in spikes, due to failure of a subsystem (e.g. a specific server) or sudden high demand of a specific resource (e.g. a specific page).