News & Articles

Full archive

January 17


Akka Multi-DC

Several exciting capabilities for multiple data centers have recently been added to Akka. There can be many reasons for using more than one data center, such as:

  • Redundancy to tolerate failures in one location and still be operational.
  • User proximity, in order to serve requests from a location near the user to provide better responsiveness.
  • Balance the load over many servers, regions or availability zones.

Multi-DC Clustering

We often recommend implementing a micro-service as one Akka Cluster. The external API of the service would be HTTP or a message broker, and not Akka Remoting or Cluster, but the internal communication within the service that is running on several nodes would use ordinary actor messaging or the tools based on Akka Cluster. When deploying this service to multiple data centers it would be inconvenient if the internal communication could not use ordinary actor messaging because it was separated into several Akka Clusters. The benefit of using Akka messaging internally is performance as well as ease of development and reasoning about your domain in terms of Actors.

Therefore, it’s possible to make the Akka Cluster aware of data centers so that one Akka Cluster can span multiple data centers and still be tolerant to network partitions. The reason for making the Akka Cluster aware of data center boundaries is that communication across data centers typically has much higher latency and higher failure rate than communication between nodes in the same data center.

However, the grouping of nodes is not limited to the physical boundaries of data centers, even though that is the primary use case. It could also be used as a logical grouping for other reasons, such as isolation of certain nodes to improve stability or splitting up a large cluster into smaller groups of nodes for better scalability.

DC nodes

Many of the Akka Cluster features are aware of the data center boundaries. Cluster membership for each data center can be managed independent of network partitions across the data centers. Gossip of the membership state is optimized and failure detection is more lenient across data centers. One thing to be aware of is that Cluster Singleton and Cluster Sharding are per data center and not global, which is important when used with Akka Persistence as we will explain next.

Akka Multi-DC Clustering is part of the ordinary Akka Open-Source release. You find more information in the documentation.

Multi-DC Persistence

Akka persistence enables stateful actors to persist their internal state so that it can be recovered when an actor is started, restarted after a crash, or migrated in a cluster. The key concept behind Akka persistence is that only changes to an actor’s internal state are persisted but never its current state directly (except for optional snapshots). Such stateful actors are recovered by replaying stored changes to these actors from which they can rebuild internal state.

This design of capturing all changes as domain events, which are immutable facts of things that have happened, is known as event sourcing.

Akka Persistence is using event sourcing that is based on the single writer principle, which means that there can only be one active instance of a PersistentActor with a given persistenceId.

This restriction means that the single persistent actor can only live in one data center and would not be available during network partitions between the data centers.

What if we could relax the single writer principle and allow persistent actors to be used in an active-active mode? The consistency boundary that we get from the ordinary persistent actor is nice and we would like to keep that within a data center, but network partitions across different data centers should not reduce availability. In other words, we would like one persistent actor instance in each data center and the persisted events should be replicated across the data centers with eventual consistency. Eventually, all events will be consumed by replicas in other data centers.

That approach is what is provided by the new Akka Multi-DC Persistence module, which is part of Lightbend’s commercial add-ons for Akka.

Let’s dive into an example to explain how it works. We would like to implement thumbs-up service and for each resource track which users gave the signal.

The operations that this service provides are:

  • Give the thumbs-up from a user for a resource id
  • Get current count of thumbs-up for a resource id
  • Get the user identifiers that gave the thumbs-up for a resource id

Akka Multi-DC Persistence introduces new type of persistent replicated actor that is called ReplicatedEntity.

The entity that handle the commands for the thumbs-up service and store the changes of the state as events looks like this:

import akka.persistence.multidc.javadsl.CommandHandler;
import akka.persistence.multidc.javadsl.EventHandler;
import akka.persistence.multidc.javadsl.ReplicatedEntity;

public class ThumbsUpCounter
    extends ReplicatedEntity<ThumbsUpCounter.Command, ThumbsUpCounter.Event, ThumbsUpCounter.State> {

  public State initialState() {
    return new State();

  public CommandHandler<Command, Event, State> commandHandler() {
    return commandHandlerBuilder(Command.class)
        .matchCommand(GiveThumbsUp.class, (ctx, state, cmd) -> {
          return Effect().persist(new GaveThumbsUp(cmd.userId))
            .andThen(state2 -> {
              ctx.getSender().tell(state2.users.size(), ctx.getSelf());

        }).matchCommand(GetCount.class, (ctx, state, cmd) -> {
          ctx.getSender().tell(state.users.size(), ctx.getSelf());
          return Effect().none();
        }).matchCommand(GetUsers.class, (ctx, state, cmd) -> {
          ctx.getSender().tell(state, ctx.getSelf());
          return Effect().none();

  public EventHandler<Event, State> eventHandler() {
    return eventHandlerBuilder(Event.class)
        .matchEvent(GaveThumbsUp.class, (state, event) ->

  // Classes for commands, events, and state...
  • initialState defines the State when the entity is first created
  • commandHandler defines the actions when receiving commands (messages), such as persisting events
  • eventHandler updates the current state when an event has been persisted, it’s also used when the entity is started up to recover its state from the stored events, and for consuming replicated events and updating the state from those

For brevity the POJO command, event and state classes are not shown in the above example. You find the full source code in the template project for Java or for Scala.

In the thumbs-up entity the state is represented as a Set of user ids. This is how it may look like when adding users a, b, …, g from two data centers.

Replicated Entity

When an event has been persisted it is asynchronously replicated to the other data center. For simplicity we are limiting this example to 2 data centers, but it works in same way for more than two data centers.

The replicated event is consumed by the corresponding entity instance in the other data center by calling the eventHandler. The asynchronous replication means that we can continue to perform updates on both sides also during network partitions. When the partition heals the events will be replicated and the state is updated. That is what often is referred to as active-active storage.

When all updates have been replicated and consumed the state is the same on both sides. This is what is called eventually consistent. Note that when querying the state while it’s being updated you may see different values if the query goes to the west or the east data center. For example, it can be {a,b,c,d} in west and {a,b,c,e} in east, because replication of d and e are still in flight.

In fact, it has a stronger consistency level called causal consistency and you can read more about that in the documentation.

As can be seen in the above example events may “arrive” at different order in the different data centers. That is something that must be considered when implementing the event handler of the ReplicatedEntity. In general, applying the same events in any order should always produce the same final state. In the above example we have used a Set and then it is obvious that the elements can be added in any order and the final contents will be the same.

One well-understood technique to create eventually-consistent systems is to model your state as a Conflict Free Replicated Data Type, a CRDT. Akka Multi-DC Persistence provides some general purpose CRDT implementations that you can use as the state or part of the state in the entity. However, you are not limited to those types. You can can implement the application specific event handler with the semantics of a CRDT in mind, as we did in the thumbs-up example. Sometimes it is good enough to use timestamps to decide which update should win.

The solution is using existing infrastructure for persistent actors and Akka persistence plugins, meaning that much of it has been battle tested, and we have also tested it thoroughly on EC2 across regions. Cassandra is currently the only supported data store and the replication mechanism of the events is taking advantage of the multi data center support that exists in Cassandra, but the solution is designed to allow for other future implementations.

Akka Multi-DC Persistence is part of Lightbend’s commercial add-ons for Akka. You find more information in the documentation and the above thumbs-up sample is available as Get Started download for Java or for Scala.