MAPFS-Grid

Data grid aims at developing suitable solutions to data-intensive applications by means of grid-based tools. Indeed, a data grid is specifically designed to store, manage, and provide reliable access to data.

Due to the basic grid principles, the environment is characterized by its heterogeneity. In the case of a data grid, this includes different storage systems, data access mechanisms, data access policies, and data formats. The data grid management infrastructure must act as an abstraction layer that provides a common, standard and efficient procedure to access the information stored. Although data grid allows heterogeneous data resources to be shared, only few research works in the field of data grid are oriented to increase the performance of these solutions.

The aim of MAPFS-Grid is to develop a complete suite of services for high performance access to huge volumes of data in a grid environment

A suite of services for accessing large volumes of data

Different needs arise in data management and access in grids.We have noticed three important aspects related to these needs:

  1. A generic WSRF-compliant data access service, which uses Simple Object Access Protocol (SOAP) and Web services technology is required. This proposal follows the OGSA guidelines, which propose Web services as basic technology for building grids. The Web services technology is suitable for managing services and resources through Web Service Resource Framework (WSRF). As far as we know, there are not WSRF-based data services designed to increase the performance of the I/O operations.
  2. The most important drawback of the previous scenario is the low performance exhibited by Web services. In fact, the use of XML and SOAP as transfer protocol is not appropriate for performance-critical applications. Although there are different proposals for dealing with this decrease of performance, none of them are suitable for scenarios demanding high throughput. In this context, GridFTP is a high-performance and reliable file transfer protocol for transfer of large files, since it is not based on SOAP transfer.
  3. Every grid project usually provides “ad hoc” solutions to the data management. A data service often offers a native interface, which does not provide interoperability with other I/O systems. OGSA-DAI has emerged to provide a uniform access to data sources in a grid environment. OGSA-DAI provides a uniform way of querying, accessing, updating, and transforming different type of data resources by means of Web services. However, OGSA-DAI is not focused on the performance of the I/O operations. Therefore, providing a bridge between the interoperability and the performance optimization is an important need of data grid projects.

MAPFS-Grid is a generic framework where all these problems can be solved, by means of the definition of different services, suitable for these three identified scenarios. All these three scenarios cover the needs of most data grid-based applications. All the developed services take advantage of double parallelism. Moreover, services provided by MAPFS-Grid are incorporated within the generic architecture of a grid making use of MAPFS.

MAPFS-Grid Parallel Data Access Service (PDAS)

Our first proposal is to provide a grid-like interface to MAPFS. This WSRF-compliant service, named PDAS, allows parallel I/O operations to be made in a cluster environment. The conception of this service comes from Data Access and Integration Service (DAIS). PDAS is an adaptation of this concept from the performance and parallelism viewpoints.

The two levels of parallelism provided by PDAS are shown in the Figure. The level 1 parallelism is provided by several PDAS (in every storage element), which give support to a distributed data repository. The level 2 parallelism is offered directly by MAPFS, in those storage elements which are clusters. As this figure shows, data to be transfered are divided in blocks which are sent to each storage element. These data blocks are internally divided and sent to each node if the storage element is a cluster.

The main advantage of PDAS is that constitutes a WSRF-compliant grid service, which provides reasonably good performance and it is easy to deploy in a grid scenario where the main components are clusters of workstations.

MAPFS Data Storage Interface (MAPFS-DSI)

Focusing on the GridFTP server, it is possible to optimize its performance by modifying one of its modules. This module is the Data Storage Interface (DSI), whose responsibility is to read and write to the local storage system. We have used the flexibility of the GridFTP server for transforming the I/O operations. MAPFS I/O routines are used instead for enhancing the server. The result is MAPFS-DSI. MAPFS-DSI enables GridFTP clients to read and write data in a storage system based on MAPFS. As the architecture for MAPFS is a cluster of workstations, the GridFTP server should be the master node from a cluster of workstation, where MAPFS is installed.

MAPFS-DSI is embedded within the general scenario in which GridFTP is used. As we can see in the Figure, there are two independent parts of the architecture that can improve the performance of a data transfer operation, both from client to server (writing operations) and from server to client (reading operations). Firstly, the specific features of GridFTP (TCP stream parallelism and striping), which can be used in any GridFTP server. Secondly, the parallel access provided by MAPFS. This implies that the use of MAPFS within the GridFTP server offers two levels of data parallelism, avoiding that the server storage system becomes a bottleneck in the whole data transfer process. MAPFS-DSI offers great flexibility, since several combinations of both levels of parallelism can be used in different configurations.

MAPFS Data Access and Integration (MAPFS-DAI)

MAPFS-DAI constitutes an extension of the OGSA-DAI architecture, whose aim is to increase this performance. As the Figure shows, the MAPFS-DAI architecture is divided into four layers:

  1. Data Layer, composed of data resources. Data resources exposed by MAPFS-DAI are flat and unformatted files. On the other hand, OGSA-DAI gives support to other kind of data resources, such as relational and XML databases.
  2. Presentation Layer, which provides theWeb service interfaces to data services. MAPFS-DAI uses WSRF.
  3. Business Logic Layer, which is composed of a suitable data service resource, which is named File Data Service Resource, associated to flat and unformatted files and A MAPFS-DAI accessor, whose main goal is to control access to the underlying data resource, that is, files.
  4. Client Layer, with two components: client application and client toolkit. The client toolkit makes the development of client applications easy by providing useful and simple tools to create the perform and response documents exchanged between the client and server. Both documents must fulfill the requirements specified by the service schema. In this way, we optimize the storage system, without changing the client application.

Therefore, the main advantage of MAPFS-DAI is its interoperability. Every storage element that exhibits the OGSA-DAI interface can be used together with MAPFS-DAI elements. Due to the same interface of OGSA-DAI, several storage systems providing this interface could be accessed in parallel.

Want to learn more about MAPFS-Grid?

If you are interested in more details about the MAPFS-Grid tool you can contact Alberto Sánchez.

public/mapfs-grid.txt · Last modified: 2010/11/16 18:55 by ascampos
Trace: mapfs-grid
  

Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0