[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Draft architecture for grid scheduling
Thank you Steve for giving us something to base the discussion on.
In general, your document reflects my own point of view quite well.
However, I would like to make some remarks and to hear
some comments from the rest of you:
1) Just to clarify: a resource is anything that is needed by a task (job)
in order to run. Resources do not have to be related to hardware. Some
kind of service (e.g. database, input data, etc) can also be refered
to as a resource, right?
2) I have some dificulties in understanding the concept of jobs and tasks:
a) A job is basically a tree, consisting of a root job and possibly some
subjobs. I.e. a tree with height >= 1, right?
b) The leaves of such a tree are called "tasks"
c) What exactly is a "task"? The most atomic thing is a machine code
instruction. Everything more abstract can in principle be divided into
subjobs. Clearly, we do not want to schedule machine code instructions.
Hence, we have to stop somewhere and define our atomic entities.
But what is it? I would suggest: "A task is a (sub-)job, that can be
handled
by a deployment agent". (see f)
d) Do we require the whole tree structure to be known in advance (i.e.
before the scheduler can act) ? I personally would not like to make
such a limitation at this early stage.
e) Does a mapping have to assign ressources (and probably times) to
all nodes of such a tree? As with d), I do not think so.
f) If we do not forbit jobs to spawn subjobs during runtime. The definition
of "task" does no longer work and we have to use something like what
I suggested in e)
3) Job Control Agent (JCA) and monitor
a) I do not think that a JCA has to be persistant. In the MOL metacomputer,
the responsibility for a job can move around in the system in order to
improve
fault tolerance. Hence, I suggest something like "at each time, there has
to
be an demon/agent/server/whatever that {does all the things that were
identified
as responsibilities of the JCA}. But maybe, that was what you meant
anymay...
b) Is there a one-to-one relationship between monitors and JCAs? If so,
it is basically the same thing, right? In that case, we should name it as
such.
4) We should not deny schedulers the possibility to employ queues. We shall
simply not require them to use queues. (In Paderborn, we are working on
metacomputer-..
ooops...grid-schedulers that use queues)
5) A deployment agent might fail to fulfill an order (such as Riker
sometimes does).
Schedulers should be prepared for that.
6) If a ressource manager supports advanced reservation, there might still
be a (slight)
chance that a reservation cannot be fulfilled as planned (e.g. due to HW
or network trouble).
Maybe, it is a good idea to require schedulers to be prepared for that
case?
7) When a job enters the grid: who decides, which scheduler is going to
handle it?
I think it can either be the user or somekind of ...welll... "scheduler
broker", which
is currently not part of the layout. I propose to add them to the picture.
(Possibly
as an option)
Or did I miss something here?
8) The layout provided by Steve defines a two-level hirarchy (control
domains +
schedulers and their helpers)
I propose not to limit the depth of the hirarchy. If we put another box
(i.e. control
domain) around the whole picture, this would allow us to structure the
hole grid
more nicely. We had a similar limitation in our early version of CCS and
it took
a lot of effort to get rid of it. If we can make a more flexible
definition
(which we can), we shall do it.
I would be happy to see any comments on the above.
Joern
----------------------------------------------------------------------------
Joern Gehring office: F0.404
Paderborn Center for voice: +49 5251 60-6327
Parallel Computing fax: +49 5251 60-6297
Fuerstenallee 11 mail: joern@upb.de
33095 Paderborn, Germany http://www.uni-paderborn.de/pc2/
----------------------------------------------------------------------------