[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: architecture diagram



Hi all!

I have given a look to the architecture under discussion, and I have checked it against a couple of experiments I did in the past. Apparently a further module is needed, and I explain you what/why.

The checkpointed job usually has an internal state as for checkpointing: for instance, it may record the location of the checkpoints, their timestamp, and other meta data. Such data cannot be contained within the application itself, since this would prevent recovery. It might be distributed among "checkpointing services", but this appears to be complex (although probably the most robust option). Along with my impression, this information should be contained in a "mediation service" on its own.

Consider the following chat:

  1. Application service receives a request for a checkpointed job;
  2. Application service sends an "open" request to the mediator;
  3. Mediator returns to the Application the locations available for the checkpoints, the checkpoint id (unique, isn't it?), security credentials etc.;
  4. Application starts the job, and informs checkpoint services (your box named "service");
  5. Application records checkpoints;
  6. Checkpoint Service notifies Mediator of recorded checkpoints;
  7. Application terminates the job;
  8. Application frees Checkpointing Services;
  9. Checkpoint Service notifies Mediator of job termination.

In case of failure/migration:

  1. the new/moved application receives the credentials of the old job;
  2. the application submits a "restart" request to the mediator;
  3. Application restarts the job, and informs checkpoint services;
  4. see above

Notes:

Hope this can be of help for further discussion.

Augusto

Attachment: pgp00000.pgp
Description: PGP signature