Akka Edge services running at the edge of the cloud will often need to minimize resource usage. Edge environments can be both expensive and limited in resources. At Akka, we've explored approaches to running lightweight deployments for Akka Edge services to run with low resource usage and adapt to changing resource needs.
In a typical deployment, a service keeps running even when it's completely idle and waiting for incoming requests or messages. Resources are reserved to handle any possible higher load. For services with fluctuating traffic or activity, this over-provisioning can lead to resources frequently going unused and billable cloud usage being higher than necessary. In an environment where the underlying resources are shared and possibly autoscaled, one approach to avoiding unnecessary resource usage and costs is the scale-to-zero model, and there are various projects and solutions available to implement this.
When using a scale-to-zero strategy, an idle service has all its instances stopped. Any inbound traffic triggers an activation of the service. Something needs to track the triggers that will activate the service and manage the inbound traffic while the service is scaled back up from zero. The service ingress is replaced with something that queues incoming requests, activates the service, and then forwards requests to the newly activated service when ready. For external events and message brokers, the state of queues or subscribed topics is tracked so that the service can be activated to consume any newly available messages. The activation process needs to know about the kinds of incoming traffic or triggers that are expected for the service.
Akka Edge services can require various activation triggers. These could be incoming requests, messages from message broker subscriptions, internal triggers such as scheduled timers, or Akka Edge building blocks, including projections and brokerless service-to-service communication. While most of these could be supported by a scale-from-zero activator, these requirements also prompted us to look for alternatives. We've experimented with a variation of the scale-to-zero strategy that we refer to as "scale to near-zero," based on multidimensional autoscaling and reducing the resource requirements for Akka Edge services.
Scale to near-zero
We've found an approach of scaling to and from “near zero” to work well for Akka Edge services. The service scales down to a state of minimal resource usage when idle and scales up and out when the load is increased. The Java Virtual Machine (JVM) often requires a significant amount of resources but can be configured to run with lower resource usage. Multidimensional autoscaling supports scaling vertically (lower or higher resource allocation) and horizontally (fewer or more service instances). It can align resource usage with the actual demand given dynamic workloads.
Compared with a full scale-to-zero model, rather than replacing the service with a small activator process, the service itself is scaled down to a small lightweight process when idle. This simplifies the way that the service is activated. We don't need particular activation triggers to be tracked outside the service, as the service is always ready to handle any requests or messages. It can more simply indicate when it needs to be activated from its idle state to use more resources. It has the added benefit that the service doesn't need to be fully activated from zero just to serve occasional requests. For Akka Cluster, scaling an existing cluster is more responsive than forming a new cluster from scratch. We do, however, need to be able to run the service in a low-resource mode.
A JVM-based service can require many resources and takes time to reach optimal performance. There are many configuration options for the JVM and alternatives to running a regular JVM for Edge deployments. OpenJDK's Project Leyden has not been released yet but is looking to improve the startup time, time to peak performance, and the footprint of Java programs. OpenJ9 is a JVM optimized for cloud environments and configured for lower resource usage. GraalVM Native Image compiles Java or Scala code ahead-of-time to a native executable, which provides lower resource usage compared with the JVM.
GraalVM Native Image
We used GraalVM Native Image in our experiments for lightweight deployments. GraalVM Native Image compiles Java or Scala code ahead-of-time to a native executable. A native image executable provides lower resource usage than the JVM, smaller deployments, faster starts, and immediate peak performance — making it ideal for Akka Edge deployments in resource-constrained environments and responsiveness under autoscaling.
While Native Image executables provide the low resource usage we need, the build setup can require a lot of configuration. An important part of this configuration is the reachability metadata, which covers dynamic features used at runtime that can't be discovered statically at build time.
GraalVM provides a tracing agent to gather metadata and create configuration files automatically. The tracing agent tracks the usage of dynamic features during the regular running of the application in a JVM and outputs Native Image configuration based on the code paths that were exercised. We found it helpful to deploy the service to a testing environment with the GraalVM tracing agent enabled, to capture usage in an actual deployment environment, and to exercise all the Akka Edge features that were being used.
We're looking to improve Akka's support for Native Image builds in the future so that most of the necessary configuration is automatically provided.
Multidimensional autoscaling
In Kubernetes, the horizontal pod autoscaler (HPA) and the vertical pod autoscaler (VPA) can be combined to support multidimensional autoscaling for a scale to near-zero approach. When the service is idle, it is “scaled down” to minimal resource requests by the vertical autoscaling and “scaled in” to a minimal number of pods by the horizontal autoscaling. When activity increases, the service can be “scaled up” with higher resource requests and “scaled out” with more pods.
The default vertical pod autoscaler for Kubernetes bases its recommendations for resource requests over long time frames (over days) and is designed to find an optimal resource allocation for a service. We can also use the vertical autoscaling differently to activate the service from an idle state or deactivate the service to its idle state by configuring or replacing the "recommender" part of the vertical autoscaling.
A multidimensional autoscaling configuration for an Akka Edge service in Kubernetes can be set up where:
- A custom recommender for vertical autoscaling is configured to activate the service (allocate more resources) when an activity threshold is passed. This can be based on CPU usage or a custom metric such as request rate.
- The horizontal autoscaling is configured to scale based on custom metrics. For Akka Edge services, we've used the number of active event-sourced entities in the Akka Cluster; entities are configured to passivate with an idle timeout, and we specify how many active entities the service should have per Akka Cluster node.
- Application availability is ensured by having a minimum of 2 replicas and configuring a pod disruption budget (PDB) so that no more than one pod is unavailable at a time. When the vertical autoscaler makes changes, pods are evicted and restarted with updated resource requests. Kubernetes does not currently support in-place changes.
Multidimensional autoscaling performed well in our testing, and we found that it provided a more flexible solution than the scale-to-zero model. It also had the advantage of having an immediately responsive and completely functional service for low activity while still minimizing the resources requested when idle.
Conclusion
We've explored approaches to running lightweight deployments for Akka Edge services. Deploying native image executables helps to reduce resource usage. If an Akka Edge service has dynamic activity and changing resource needs, autoscaling can be used to adapt to demand. Scaling to near-zero is a variation of the scale-to-zero model that leverages multidimensional autoscaling. The scale to near-zero strategy produced good results in our testing and can be implemented reasonably easily for a Kubernetes environment. These approaches are not just limited to Akka Edge services and can also be applied to Akka applications in general.
You can check out the Akka Edge reference documentation for lightweight deployments and the Local Drone Control example services for Java or Scala in the Akka Edge guide, configured for lightweight deployment and scaling to and from near-zero. The examples use GraalVM Native Image builds, configure multidimensional autoscaling, and run in k3s (lightweight Kubernetes).
We'll continue improving the support for deploying Akka Edge services that run efficiently and in resource-constrained environments. Please reach out if you have any questions or feedback.