Monday, August 22, 2016

An extended self-adaptive deployment framework for service-oriented systems


Five years ago, while I was still in academia, I built an extension framework around Disnix (named: Dynamic Disnix) that enables self-adaptive redeployment of service-oriented systems. It was an interesting application as it demonstrated the full potential of service-oriented systems having their deployment processes automated with Disnix.

Moreover, the corresponding research paper was accepted for presentation at the SEAMS 2011 symposium (co-located with ICSE 2011) in Honolulu (Hawaii), which was (obviously!) a nice place to visit. :-)

Disnix's development was progressing at a very low pace for a while after I left academia, but since the end of 2014 I made some significant improvements. In contrast to the basic toolset, I did not improve Dynamic Disnix -- apart from the addition of a port assigner tool, I only kept the implementation in sync with Disnix's API changes to prevent it from breaking.

Recently, I have used Dynamic Disnix to give a couple of demos. As a result, I have improved some of its aspects a bit. For example, some basic documentation has been added. Furthermore, I have extended the framework's architecture to take a couple of new deployment planning aspects into account.

Disnix


For readers unfamiliar with Disnix: the primary purpose of the basic Disnix toolset is executing deployment processes of service-oriented systems. Deployments are driven by three kinds of declarative specifications:

  • The services model captures the services (distributed units of deployments) of which a system consists, their build/configuration properties and their inter-dependencies (dependencies on other services that may have to be reached through a network link).
  • The infrastructure model describes the target machines where services can be deployed to and their characteristics.
  • The distribution model maps services in the services model to machines in the infrastructure model.

By writing instances of the above specifications and running disnix-env:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix executes all activities to get the system deployed, such as building their services from source code, distributing them to the target machines in the network and activating them. Changing any of these models and running disnix-env again causes the system to be upgraded. In case of an upgrade, Disnix will only execute the required activities making the process more efficient than deploying a system from scratch.

"Static" Disnix


So, what makes Disnix's deployment approach static? When looking at software systems from a very abstract point of view, they are supposed to meet a collection of functional and non-functional requirements. A change in a network of machines affects the ability for a service-oriented system to meet them, as the services of which these systems consist are typically distributed.

If a system relies on a critical component that has only one instance deployed and the machine that hosts it crashes, the functional requirements can no longer be met. However, even if we have multiple instances of the same components giving better guarantees that no functional requirements will be broken, important non-functional requirements may be affected, such as the responsiveness of a system.

We may also want to optimize a system's non-functional properties, such as its responsiveness, by adding more machines to the network that offer more system resources, or by changing the configuration of existing machine, e.g. upgrading the amount available RAM.

The basic Disnix toolset is considered static, because all these events events require manual modifications to the Disnix models for redeployment, so that a system can meet its requirements under the changed conditions.

For simple systems, manual reconfiguration is still doable, but with one hundred services, one hundred machines or a high frequency of events (or a combination of the three), it becomes too complex and time consuming.

For example, when a machine has been added or removed, we must rewrite the distribution model in such a way that all services are deployed to at least one machine and that none of them are mapped to machines that are not capable or allowed to host them. Furthermore, with microservices (one of their traits is that they typically embed HTTP servers), we must typically bind them to unique TCP ports that do not conflict with system services or other services deployed by Disnix. None of these configuration aspects are trivial for large service-oriented systems.

Dynamic Disnix


Dynamic Disnix extends Disnix's architecture with additional models and tools to cope with the dynamism of service oriented-systems. In the latest version, I have extended its architecture (which has been based on the old architecture described in the SEAMS 2011 paper and corresponding blog post):


The above diagram shows the structure of the dydisnix-self-adapt tool. The ovals denote command-line utilities, the rectangles denote files and the arrows denote files as inputs or outputs. As with the basic Disnix toolset, dydisnix-self-adapt is composed of command-line utilities each being responsible for executing an individual deployment activity:

  • On the top right, the infrastructure generator is shown that captures the configurations of the machines in the network and generates an infrastructure model from it. Currently, two different kinds of generators can be used: disnix-capture-infra (included with the basic toolset) that uses a bootstrap infrastructure model with connectivity settings, or dydisnix-geninfra-avahi that uses multicast DNS (through Avahi) to retrieve the machines' properties.
  • dydisnix-augment-infra is responsible for augmenting the generated infrastructure model with additional settings, such as passwords. It is typically undesired to automatically publish privacy-sensitive settings over a network using insecure connection protocols.
  • disnix-snapshot can be optionally used to preemptively capture the state of all stateful services (services with property: deployState = true; in the services model) so that the state of these services can be restored if a machine crashes or disappears. This tool is new in the extended architecture.
  • dydisnix-gendist generates a mapping of services to machines based on technical and non-functional properties defined in the services and infrastructure models.
  • dydisnix-port-assign assigns unique TCP port numbers to previously undeployed services and retains assigned TCP ports in a previous deployment for optimization purposes. This tool is new in the extended architecture.
  • disnix-env redeploys the system with the (statically) provided services model and the dynamically generated infrastructure and distribution models.

An example usage scenario


When a system has been configured to be (statically) deployed with Disnix (such as the infamous StaffTracker example cases that come in several variants), we need to add a few additional deployment specifications to make it dynamically deployable.

Auto discovering the infrastructure model


First, we must configure the machines in such a way that they publish their own configurations. The basic toolset comes with a primitive solution called: disnix-capture-infra that does not require any additional configuration -- it consults the Disnix service that is installed on every target machine.

By providing a simple bootstrap infrastructure model (e.g. infrastructure-bootstrap.nix) that only provides connectivity settings:

{
  test1.properties.hostname = "test1";
  test2.properties.hostname = "test2";
}

and running disnix-capture-infra, we can obtain the machines' configuration properties:

$ disnix-capture-infra infrastructure-bootstrap.nix

By setting the following environment variable, we can configure Dynamic Disnix to use the above command to capture the machines' infrastructure properties:

$ export DYDISNIX_GENINFRA="disnix-capture-infra infrastructure-bootstrap.nix"

Alternatively, there is the Dynamic Disnix Avahi publisher that is more powerful, but at the same time much more experimental and unstable than disnix-capture-infra.

When using Avahi, each machine uses multicast DNS (mDNS) to publish their own configuration properties. As a result, no bootstrap infrastructure model is needed. Simply gathering the data published by the machines on the same subnet suffices.

When using NixOS on a target machine, the Avahi publisher can be enabled by cloning the dydisnix-avahi Git repository and adding the following lines to /etc/nixos/configuration.nix:

imports = [ /home/sander/dydisnix/dydisnix-module.nix ];
services.dydisnixAvahiTest.enable = true;

To allow the coordinator machine to capture the configurations that the target machines publish, we must enable the Avahi system service. In NixOS, this can be done by adding the following lines to /etc/nixos/configuration.nix:

services.avahi.enable = true;

When running the following command-line instruction, the machines' configurations can be captured:

$ dydisnix-geninfra-avahi

Likewise, when setting the following environment variable:

$ export DYDISNIX_GENINFRA=dydisnix-geninfra-avahi

Dynamic Disnix uses the Avahi-discovery service to obtain an infrastructure model.

Writing an augmentation model


The Java version of StaffTracker for example uses MySQL to store data. Typically, it is undesired to publish the authentication credentials over the network (in particular with mDNS, which is quite insecure). We can augment these properties to the captured infrastructure model with the following augmentation model (augment.nix):

{infrastructure, lib}:

lib.mapAttrs (targetName: target:
  target // (if target ? containers && target.containers ? mysql-database then {
    containers = target.containers // {
      mysql-database = target.containers.mysql-database //
        { mysqlUsername = "root";
          mysqlPassword = "secret";
        };
    };
  } else {})
) infrastructure

The above model implements a very simple password policy, by iterating over each target machine in the discovered infrastructure model and adding the same mysqlUsername and mysqlPassword property when it encounters a MySQL container service.

Mapping services to machines


In addition to a services model and a dynamically generated (and optionally augmented) infrastructure model, we must map each service to machine in the network using a configured strategy. A strategy can be programmed in a QoS model, such as:

{ services
, infrastructure
, initialDistribution
, previousDistribution
, filters
, lib
}:

let
  distribution1 = filters.mapAttrOnList {
    inherit services infrastructure;
    distribution = initialDistribution;
    serviceProperty = "type";
    targetPropertyList = "supportedTypes";
  };

  distribution2 = filters.divideRoundRobin {
    distribution = distribution1;
  };
in
distribution2

The above QoS model implements the following policy:

  • First, it takes the initialDistribution model that is a cartesian product of all services and machines. It filters the machines on the relationship between the type attribute and the list of supportedTypes. This ensures that services will only be mapped to machines that can host them. For example, a MySQL database should only be deployed to a machine that has a MySQL DBMS installed.
  • Second, it divides the services over the candidate machines using the round robin strategy. That is, it divides services over the candidate target machines in equal proportions and in circular order.

Dynamically deploying a system


With the services model, augmentation model and QoS model, we can dynamically deploy the StaffTracker system (without manually specifying the target machines and their properties, and how to map the services to machines):

$ dydisnix-env -s services.nix -a augment.nix -q qos.nix

The Node.js variant of the StaffTracker example requires unique TCP ports for each web service and web application. By providing the --ports parameter we can include a port assignment specification that is internally managed by dydisnix-port-assign:

$ dydisnix-env -s services.nix -a augment.nix -q qos.nix --ports ports.nix

When providing the --ports parameter, the specification gets automatically updated when ports need to be reassigned.

Making a system self-adaptable from a deployment perspective


With dydisnix-self-adapt we can make a service-oriented system self-adaptable from a deployment perspective -- this tool continuously monitors the network for changes, and runs a redeployment when a change has been detected:

$ dydisnix-self-adapt -s services.nix -a augment.nix -q qos.nix

For example, when shutting down a machine in the network, you will notice that Dynamic Disnix automatically generates a new distribution and redeploys the system to get the missing services back.

Likewise, by adding the ports parameter, you can include port assignments as part of the deployment process:

$ dydisnix-self-adapt -s services.nix -a augment.nix -q qos.nix --ports ports.nix

By adding the --snapshot parameter, we can preemptively capture the state of all stateful services (services annotated with deployState = true; in the services model), such as the databases in which the records are stored. If a machine hosting databases disappears, Disnix can restore the state of the databases elsewhere.

$ dydisnix-self-adapt -s services.nix -a augment.nix -q qos.nix --snapshot

Keep in mind that this feature uses Disnix's snapshotting facilities, which may not be the best solution to manage state, in particular with large databases.

Conclusion


In this blog post, I have described an extended architecture of Dynamic Disnix. In comparison to the previous version, a port assigner has been added that automatically provides unique port numbers to services, and the disnix-snapshot utility that can preemptively capture the state of services, so that they can be restored if a machine disappears from the network.

Despite the fact that Dynamic Disnix has some basic documentation and other improvements from a usability perspective, Dynamic Disnix remains a very experimental prototype that should not be used for any production purposes. In contrast to the basic toolset, I have only used it for testing/demo purposes and I still have no real-life production experience with it. :-)

Moreover, I still have no plans to officially release it yet as many aspects still need to be improved/optimized. For now, you have to obtain the Dynamic Disnix source code from Github and use the included release.nix expression to install it. Furthermore, you probably need to a lot of courage. :-)

Finally, I have extended the Java and Node.js versions of the StaffTracker example as well as the virtual hosts example with simple augmentation and QoS models.

No comments:

Post a Comment