Introduction to GloBeM

Why model global behavior?

Grids emerged in the last decade as large distributed systems where the greatest challenges of the scientific community could be faced. These challenges, commonly known as grand challenge applications (GCAs), are characterized as problems that, given their size and/or complexity, can not be solved by means of conventional computing techniques. The grid scalability, flexibility and massive computation power make it an ideal environment to face these GCAs.

In order to provide the expected service and achieve the required performance for these applications, grids must have powerful management mechanisms that efficiently deal with the natural variability and heterogeneity of the environment. Since the birth of this technology, this has been one of the key aspects of its development and, in many ways, one of the most problematic. Conventional grid management mechanisms try to improve performance based on the individual analysis of every component on the system. Then they intend to adjust the configuration or predict the behavior of each independent element. This approach may seem reasonable considering the large scale and complexity of the grid, but could possibly fail to achieve optimal performance, because in most cases it lacks the capability to understand the effects that different elements have on each other when they work together. From a more theoretical point of view, if we consider “the grid” as an individual entity (with its computational power, storage capacity and so on), it seems logical to analyze it as such, instead of composed of a huge set of individual resources. This could be similar as how computers are regarded as individual entities, even though they are made of several electronic components of different nature, or how clusters are most times considered as single machines, when in fact they are composed by many computers.

How a global behavior model looks like?

It has been said that a global behavior model of a large scale distributed system would provide the abstraction layer that finally makes the single entity point of view possible. In order to do so, this model must have certain characteristics:

  • Specific state definition: State characteristics and transition conditions should be unambiguously specified. The number of states should also be finite, in order to provide a useful model. A typical model representation that fits with this characteristics is a finite state machine.
  • Stable model: The resulting model must be considerably consistent with the environment behavior over time. As these environments are naturally changing, it seems unrealistic to hope for stationarity and try to find the definitive model for them. However, for a model to beuseful, it would have to have at least certain stability. A model that needs to be regenerated every time an event occurs on the system is simply unusable.
  • Easy to understand: The resulting model would be used by management tools and system administrators. Therefore, it should be understandable and provide basic and meaningful information about the systems behavior. A very complex model might be very precise, but it would be extremely complicated to use and then most certainly useless.
  • Service relevant states: The model states should be related to the system services. This ensures that the observed behavior can be explained in terms of how these services are being provided. This makes possible to determine if the conditions are acceptable and if the expected dependability is actually deserved.

How global behavior modelling is achieved?

The GloBeM project introduces a methodology for creating this kind of model. The details of this technique will not be explained here but its basics will, in order to better understand its importance. This methodology was in origin specifically designed for grid computing but, as grids are one of the most complex forms of large scale distributed systems, it seems reasonable to expect that it can be extended to a more generic set of environments. Of course, technical specifics should be adapted to every particular case, but the theoretical principles can be always applied.

This methodology is strongly based on knowledge discovery techniques (such as typical Data Mining ones, but not limited to them), and divided in the four following steps:

GloBeM: Global behavior model construction

  • Environment monitoring: The system is observed using large scale distributed systems monitoring techniques. At this point, every resource is monitored and the information is gathered. In the same way the operating system of a desktop computer monitors every hardware element, each resource must be observed as a start point to build the abstraction.
  • Information representation: After or even simultaneously with the monitoring phase, the information obtained is represented in a more global way. The use of statistical tools (mean, standard deviation, statistical tests, etc) and data mining techniques (visual representation, clustering, etc) are decisive to provide a correct information representation.
  • Information analysis: Once the monitoring information is properly formatted, again data mining techniques (machine learning) are applied in order to extract useful knowledge and state related information.
  • Model construction: Finally, the finite state machine model is constructed, providing meaningful states and behavior information.

The resulting model produced by this methodology becomes the abstraction layer on top of the large scale distributed environment. This model expresses in a simple and usable way the behavior of the system, and allows “system-level” fault tolerance to focus on a single entity vision of the environment.

proyectos/globem/introduction.txt · Última modificación: 2012/10/08 17:58 (editor externo)
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki