Large scale distributed systems have proven to be one of the most important resources in modern computing. However, the vast complexity of these environments limits the efficiency of traditional management tools.

The GloBeM research project aims to define a set of techniques oriented to build a global behavior model of a large scale distributed environment. This kind of model would provide a deep understanding of the system and a better reference for management and optimization.

Why model global behavior?

Many large-scale distributed systems are characterized by complexity. To make proper decisions in these environments is far from trivial. One paradigmatic example of these distributed systems is the old Grids. Grids emerged as large distributed systems where the greatest challenges of the scientific community could be faced. These challenges, commonly known as grand challenge applications (GCAs), are characterized as problems that, given their size and/or complexity, can not be solved by means of conventional computing techniques. The grid scalability, flexibility and massive computation power made it an ideal environment to face these GCAs.

In order to provide the expected service and achieve the required performance for these applications, grids should have powerful management mechanisms that efficiently deal with the natural variability and heterogeneity of the environment. Since the birth of this technology, this was one of the key aspects of its development and, in many ways, one of the most problematic. Conventional grid management mechanisms tried to improve performance based on the individual analysis of every component on the system. Then they intended to adjust the configuration or predict the behavior of each independent element. This approach may seem reasonable considering the large scale and complexity of the grid, but could possibly fail to achieve optimal performance, because in most cases it lacks the capability to understand the effects that different elements have on each other when they work together. From a more theoretical point of view, if we consider “the grid” as an individual entity (with its computational power, storage capacity and so on), it seems logical to analyze it as such, instead of composed of a huge set of individual resources. This could be similar as how computers are regarded as individual entities, even though they are made of several electronic components of different nature, or how clusters are most times considered as single machines, when in fact they are composed by many computers.

How a global behavior model looks like?

It has been said that a global behavior model of a large scale distributed system would provide the abstraction layer that finally makes the single entity point of view possible. In order to do so, this model must have certain characteristics:

  • Specific state definition: State characteristics and transition conditions should be unambiguously specified. The number of states should also be finite, in order to provide a useful model. A typical model representation that fits with this characteristics is a finite state machine.
  • Stable model: The resulting model must be considerably consistent with the environment behavior over time. As these environments are naturally changing, it seems unrealistic to hope for stationarity and try to find the definitive model for them. However, for a model to beuseful, it would have to have at least certain stability. A model that needs to be regenerated every time an event occurs on the system is simply unusable.
  • Easy to understand: The resulting model would be used by management tools and system administrators. Therefore, it should be understandable and provide basic and meaningful information about the systems behavior. A very complex model might be very precise, but it would be extremely complicated to use and then most certainly useless.
  • Service relevant states: The model states should be related to the system services. This ensures that the observed behavior can be explained in terms of how these services are being provided. This makes possible to determine if the conditions are acceptable and if the expected dependability is actually deserved.

How global behavior modelling is achieved?

The GloBeM project introduces a methodology for creating this kind of model. The details of this technique will not be explained here but its basics will, in order to better understand its importance. This methodology was in origin specifically designed for grid computing but, as grids are one of the most complex forms of large scale distributed systems, this methodology can be extended to a more generic set of environments and other scenarios where complex interactions exist. Of course, technical specifics should be adapted to every particular case, but the theoretical principles can be always applied.

This methodology is strongly based on knowledge discovery techniques (such as typical Machine Learning ones, but not limited to them), and divided in the four following steps:

  • Environment monitoring: The system is observed using large scale distributed systems monitoring techniques. At this point, every resource is monitored and the information is gathered. In the same way the operating system of a desktop computer monitors every hardware element, each resource must be observed as a start point to build the abstraction.
  • Information representation: After or even simultaneously with the monitoring phase, the information obtained is represented in a more global way. The use of statistical tools (mean, standard deviation, statistical tests, etc) and data mining techniques (visual representation, clustering, etc) are decisive to provide a correct information representation.
  • Information analysis: Once the monitoring information is properly formatted, again data mining techniques (machine learning) are applied in order to extract useful knowledge and state related information.
  • Model construction: Finally, the finite state machine model is constructed, providing meaningful states and behavior information.

The resulting model produced by this methodology becomes the abstraction layer on top of the large scale distributed environment. This model expresses in a simple and usable way the behavior of the system, and allows “system-level” fault tolerance to focus on a single entity vision of the environment.

Want to learn more about GloBeM?

If you are interested in more details about the GloBeM modeling tool you can contact María S. Pérez.

public/globem.txt · Last modified: 2019/02/21 13:05 by mperez
Trace: globem

Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0