[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: checkpointing architecture



One things we could do is take each of the elements below, and try to come 
up with a few bullet-points as to why this needs to be part of the 
architecture, then we can perhaps decide which really belong or not, and 
see how things fit together.  I've tried to do this with some of them...

On Tue, 10 Jun 2003, Thilo Kielmann wrote:

> Our first goal to achieve is a "grid checkpointing architecture diagram".
> We came up with this idea at GGF7 in Tokyo as we found it necessary to first
> describe which parites and pieces of software to be involved in grid
> checkpoint recovery.
> 
> >From the back of my head, I can think of the following elements of such an
> architecture:
> 

 - AAA (accounting, charging etc.)
   
    - if the checkpoint recovery is because of system failure, need to 
      refund wasted user hours 

 - scheduling of resources


 - resource brokers

 - data management for checkpoint files
   
     - need to store them some place
     - if the checkpoint can be restored on another machine, perhaps 
       should store on some data server so avaqilable if original machine 
       does not come up   


- checkpoint history (versioning for a single run)

    - support rollbacks ot earlier times in the application run

- other meta data (which?)

- application status monitoring

- user job interface (->portals?)


Tom