[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
checkpointing architecture
Dear GridCPR members,
despite all excitation and ambitions, our group has not really been making the
progress we were all hoping for. Being one of the WG chairs, I must admit that
we kind-of overcommitted ourselves such that the group received less attention
than would have been desirable. Some "fresh blood" would definitely help moving
forward... (Anybody?)
Our first goal to achieve is a "grid checkpointing architecture diagram".
We came up with this idea at GGF7 in Tokyo as we found it necessary to first
describe which parites and pieces of software to be involved in grid
checkpoint recovery.
>From the back of my head, I can think of the following elements of such an
architecture:
- AAA (accounting, charging etc.)
- scheduling of resources
- resource brokers
- data management for checkpoint files
- checkpoint history (versioning for a single run)
- other meta data (which?)
- application status monitoring
- user job interface (->portals?)
Does this set of elements sound reasonable? Any omissions? Anything that does
not belong here?
Which relations can we (should we) describe?
Anybody out there who is willing to get a discussion continued?
Regards,
Thilo
--
Thilo Kielmann http://www.cs.vu.nl/~kielmann/