Blog 1.7.2016

Collecting Data with Akka and Spring Boot

Good to see you here! We have no doubt this post has good information, but please keep in mind that it is over 8 years old.


Our team at Gofore is developing and maintaining Data Exchange Layer based on X-Road technology. Because this is a national infrastructure service that is expected to be a standard delivery mechanism for Finnish public sector organizations and also be widely used in the private sector, the system needs to offer clear benefits for user organizations with great level of automation and ease of use. An example of benefits is the API Catalogue that lists all the organizations offering services in the Data Exchange Layer and technical details like WSDL’s for the services. A part of the data is publicly available and other part requires registration.
X-Road infrastructure didn’t automate collecting the data included in the system even though it offers metaservices that can be used to query this data. Our job was to implement the data collector for the API Catalogue.


The data required for the API Catalogue and offered by X-Road meta services consists of member organizations, their subsystems and services. Because the data exchange layer will potentially have hundreds or thousands of organizations with several subsystems per organization and multiple services per subsystem, we decided to use architecture based on concurrent Akka actors. Our data collector component processed the data received from metaservices and caches it on a local database so that API Catalogue receives only data that has been changed since it last requested it. The processing of different organizations and their services can be done concurrently, because it does not depend on other organizations or services. Akka was an easy choice also because the X-Road technology itself is based on Akka. Spring Boot is used to have an easy way to implement the persistence using Spring Data and JPA.
The data model need for the API Catalogue is presented in the following diagram. It should be noted that the term client is used for both members and subsystems and the meta services offer those as a one structure. However, in the catalogue, those are separated. In order to confuse more, the term int the Catalogue are different from those used in the X-Road. X-Road term member in organization in the Catalogue, subsystem is API and service is resource.
High level architecture of our actor system is shown in the picture below. The system is initiated and the Supervisor actor is sheduler in the Spring Boot main XRoadCatalogCollector. The Supervisor creates pools for other types of actors and send a START_COLLECTING message to ListClientsActor. ListClientsActor calls the metaservice listClients and creates both member organizations and their subsystems based on the data. These are persisted in a form that is ready for the API Catalogue to use. ListClientsActor also sends a message to ListMethodsActor for each subsystem. ListMethodsActor processed the message based on the subsystem information and calls metaservice listMethods for the given subsystem. The methods or services are persisted and each is processed by a FetchWsdlActor.

Spring Boot and Akka Configuration

This chapter describes with examples how Spring Boot and Akka can be configured to work together.
Everything starts with our main class XRoadCatalogCollector where the Supervisor is sheduled.

ActorSystem is created in the ApplicationConfiguration. Note the initialization for Akka extension named SpringExtension on line 38

and the implementation of SpringExtension and SpringActorProducer. This is the way to pass the Spring context to Akka actors.


The two of the top level actors are simple and there is no need to present them here. Implementation of the other actors follow the same principle:
• Check if the received message is of correct type
• Make a call to X-Road metaservice based on the details in the message
• Process the results
• Persist the processed results
• Send a message to a further actor (or return in case of the bottom level actor)
The implementation of ListClientsActor is below.

Other actor implementations are similar although there are differences, for example, the method of calling X-Road meta services. Yes, for some reason some of the calls are REST, some SOAP. Anyway, there is no point to publish other implementations here. The remaining details can can be found in Github where all the source code is published under the MIT License in
In addition to the data collector component described here, the github repository contains the lister component that is used by the API Catalogue to query data from the collector database and a persistence module which is used by both.


The data collector has been installed to production for the Data Exchange Layer installation and the production API Catalogue used the real data provided by collector. Currently, the number of organizations and services available in the production system is small and thus we do not have real experience on how the collector would perform in the future when data amounts are expected to be much greater. A simple sequential implementation would also have been sufficient for current situation and much simpler to implement, but with expected future data amounts it would be too slow. It is possible that our concurrent implementation consumes the server resources quite easily if configured incorrectly, but by configuring the pool sizes and collecting interval with proper values, the concurrent system should be able to handle much bigger data amounts faster than sequential implementation. It is also possible to distribute the actors for different servers, but that would require code changes.

Back to top