[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: architecture diagram
Hi all!
I have given a look to the architecture under discussion, and I
have checked it against a couple of experiments I did in the
past. Apparently a further module is needed, and I explain you
what/why.
The checkpointed job usually has an internal state as for checkpointing: for
instance, it may record the location of the checkpoints, their timestamp, and
other meta data. Such data cannot be contained within the application itself,
since this would prevent recovery. It might be distributed among "checkpointing
services", but this appears to be complex (although probably the most robust
option). Along with my impression, this information should be contained in a
"mediation service" on its own.
Consider the following chat:
- Application service receives a request for a checkpointed job;
- Application service sends an "open" request to the mediator;
- Mediator returns to the Application the locations available for the checkpoints, the checkpoint id (unique, isn't it?), security credentials etc.;
- Application starts the job, and informs checkpoint services (your box named "service");
- Application records checkpoints;
- Checkpoint Service notifies Mediator of recorded checkpoints;
- Application terminates the job;
- Application frees Checkpointing Services;
- Checkpoint Service notifies Mediator of job termination.
In case of failure/migration:
- the new/moved application receives the credentials of the old job;
- the application submits a "restart" request to the mediator;
- Application restarts the job, and informs checkpoint services;
- see above
Notes:
- I have taken into consideration also some security aspects:
it is wiser to consider them from the beginning.
-
The Mediator is not necessarily unique, and should be replicated.
Hope this can be of help for further discussion.
Augusto
Attachment:
pgp00000.pgp
Description: PGP signature