[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gridcpr-wg] GridLab: In scope or out



I vote "in scope" although I might not be unbiased in this case ;-)

Thilo

On Wed, Nov 03, 2004 at 08:32:04PM -0500, Paul Stodghill wrote:
> X-Original-To: kielmann@localhost
> Delivered-To: kielmann@localhost.cs.vu.nl
> Subject: [gridcpr-wg] GridLab: In scope or out
> Date: Wed, 3 Nov 2004 20:32:04 -0500
> X-MS-Has-Attach: 
> X-MS-TNEF-Correlator: 
> Thread-Topic: GridLab: In scope or out
> thread-index: AcTCDhbJUCd1tjJxRwyvWw9/dEKJEg==
> From: "Paul Stodghill" <stodghil@cs.cornell.edu>
> To: <gridcpr-wg@gridforum.org>
> X-OriginalArrivalTime: 04 Nov 2004 01:32:09.0121 (UTC) FILETIME=[19C9E910:01C4C20E]
> X-Spam-Status: No, hits=1.2 required=5.0
> 	tests=HTML_10_20,HTML_MESSAGE
> 	version=2.55
> X-Spam-Level: *
> X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
> 
> In the GridLab project \cite{gridlab-homepage,gridlab-overview}, a job,
> consisting of one or more processes, is running on a grid machine. In
> the middle of the run, the job may be forced to migrate to a different
> machine, possibly with a different architecture and/or number of CPUs.
> The application program may either decide by itself to migrate (e.g.
> poor performance on the current machine) or may be forced to do so,
> either by the user (via an application manager) or by the local resource
> management software that wishes to evict the job. The main purpose of
> GridCPR in GridLab thus is the ability to interrupt and migrate a job
> until it finally terminates. Fault-tolerance is only a secondary aspect.
>  
> An extension of the above use-case is dealing with jobs that run
> concurrently at multiple grid sites.
>  
> Applications save their state to regular files. Checkpoint meta data can
> be stored in GridLab's "advert service", allowing the checkpoint file(s)
> to be found and retrieved after restart. File transport is done via
> GridLab's data movement service (or via GridFTP) \cite{gridlab-day}.
>  
> Key functions:
> \begin{itemize}
> \item Services for checkpoint data transport, via GridLab's data
>   movement service or GridGTP.
> \item Services for checkpoint data management, via Advert Service.
> \end{itemize}
>  
>  



-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/