Large scale distributed systems have proven to be one of the most important resources in modern computing. However, the vast complexity of these environments limits the efficiency of traditional management tools.
The GloBeM research project aims to define a set of techniques oriented to build a global behavior model of a large scale distributed environment. This kind of model would provide a deep understanding of the system and a better reference for management and optimization.
Many large-scale distributed systems are characterized by complexity. To make proper decisions in these environments is far from trivial. One paradigmatic example of these distributed systems is the old Grids. Grids emerged as large distributed systems where the greatest challenges of the scientific community could be faced. These challenges, commonly known as grand challenge applications (GCAs), are characterized as problems that, given their size and/or complexity, can not be solved by means of conventional computing techniques. The grid scalability, flexibility and massive computation power made it an ideal environment to face these GCAs.
In order to provide the expected service and achieve the required performance for these applications, grids should have powerful management mechanisms that efficiently deal with the natural variability and heterogeneity of the environment. Since the birth of this technology, this was one of the key aspects of its development and, in many ways, one of the most problematic. Conventional grid management mechanisms tried to improve performance based on the individual analysis of every component on the system. Then they intended to adjust the configuration or predict the behavior of each independent element. This approach may seem reasonable considering the large scale and complexity of the grid, but could possibly fail to achieve optimal performance, because in most cases it lacks the capability to understand the effects that different elements have on each other when they work together. From a more theoretical point of view, if we consider “the grid” as an individual entity (with its computational power, storage capacity and so on), it seems logical to analyze it as such, instead of composed of a huge set of individual resources. This could be similar as how computers are regarded as individual entities, even though they are made of several electronic components of different nature, or how clusters are most times considered as single machines, when in fact they are composed by many computers.
It has been said that a global behavior model of a large scale distributed system would provide the abstraction layer that finally makes the single entity point of view possible. In order to do so, this model must have certain characteristics:
The GloBeM project introduces a methodology for creating this kind of model. The details of this technique will not be explained here but its basics will, in order to better understand its importance. This methodology was in origin specifically designed for grid computing but, as grids are one of the most complex forms of large scale distributed systems, this methodology can be extended to a more generic set of environments and other scenarios where complex interactions exist. Of course, technical specifics should be adapted to every particular case, but the theoretical principles can be always applied.
This methodology is strongly based on knowledge discovery techniques (such as typical Machine Learning ones, but not limited to them), and divided in the four following steps:
The resulting model produced by this methodology becomes the abstraction layer on top of the large scale distributed environment. This model expresses in a simple and usable way the behavior of the system, and allows “system-level” fault tolerance to focus on a single entity vision of the environment.
If you are interested in more details about the GloBeM modeling tool you can contact María S. Pérez.
![]() ![]() |