ATD: Abstract Task Descriptor
version 1.01, January 2000 author: T. Haupt, Syracuse
University
Introduction
A computational task requested by the user may involve many
steps. Some steps can be performed concurrently, but typically
there are data dependencies that force execution of the steps
in some particular order. In many cases, it might be convenient
to divide a particular step into smaller, "atomic" operations.
The task descriptor is abstract in a sense that it may
not describe all resources needed for completion of the task.
The final resolution and actual resource allocation is left to
discretion of middle-tier services.
Atomic task
An atomic task is described by its
- name
- descriptor (Application Descriptor, AD),
- input and output ports
Name of the task must be unique within the document.
AD is a separate XML document that provides all necessary
information on how to install and run the application, as well
as input and output files. The task is submitted by invoking a
method of the middle-tier proxy module. The method that can be
used for submitting the task is called input port. Each
task must define at least one input port. Upon completing the
task (successful or not), the middle-tier proxy fires an event.
Each event type signaling end of task processing is called output
port. Each task must define at least one output port.
Example of an atomic task:
<Task>
<TaskName name="task1"
descriptor="task1.xml">
<InputPort method="run"
/>
<OutputPort event ="done">
</Task>
Building complex tasks from atomic tasks
Atomic tasks can be grouped together to form a complex task.
Optionally, their dependency can be defined by creating a computational
graph. The graph is constructed by connecting output and
input ports of atomic tasks. This means that an event fired by
one task (output port) will cause invocation of a method of the
other task (input port).
As in the case of the atomic task, a complex task must define
at least one input and one output ports. However, instead of specifying
events and methods, input and output ports from the constituent
tasks are used to define ports of the complex task.
Example of a complex task built from atomic
tasks:
<Task>
<TaskName name="ComplexTask"
/>
<Task>
<TaskName
name="atomic_task1" descriptor="task1.xml"
/>
<InputPort
method="run" />
<OutputPort
event ="done" />
</Task>
<Task>
<TaskName
name="atomic_task2" descriptor="task2.xml"
/>
<InputPort
event="run" />
<OutputPort
method ="done" />
</Task>
<connection>
<output task="task1"
/>
<input task="task2"
/>
</ connection>
<InputPort task="atomic_task1"
/>
<OutputPort task="atomic_task2"
/>
</Task>
In this example, event "done" fired by atomic_task1
will result in invoking method run of atomic_task2. The complex
task can be submitted by invoking input port of atomic_task1 (bacause
of <InputPort task="atomic_task1" />), and
event done of atomic_task2 task will signal completion of the
complex task.
Hierarchy of tasks
Complex tasks can be grouped and connected to build an arbitralily
deep hierarchy of tasks.
<Task>
<TaskName name="example_task">
<Task>
<TaskName name="A">
<Task>
<TaskName="A1"
descriptor="A1.xml" />
<InputPort
method="run" />
<OutputPort
event="done" />
</Task>
<Task>
<TaskName="A2"
descriptor="A2.xml" />
<InputPort
method="run" />
<OutputPort
event="done" />
</Task>
<connection>
<output task="A1"
/>
<input task="A2"
/>
</connection>
<OutputPort application="A"
event="done" />
</task>
<Task>
<TaskName name="B" descriptor="B.xml">
<InputPort method="run" />
<OutputPort application="B"
event="done" />
</Task>
<Task>
<TaskName name="C" descriptor="C.xml"
/>
<InputPort application="C"
method="run" />
<OutputPort application="C" event="done"
/>
</Task>
<Task>
<TaskName name="D">
<Task>
<TaskName name="D1"
descriptor="D1.xml" />
<InputPort method="run"
/>
<OutputPort event="done"
/>
</Task>
<Task>
<TaskName name="D2"
descriptor="D2.xml" />
<InputPort method="run"
/>
<OutputPort
event="done" />
</Task>
<connection>
<output task="D1"
/>
<input task="D2"
/>
</connection>
<InputPort task="D1" />
<OutputPort task="D2" />
</Task>
<connection>
<output task="A" />
<output task="B" />
<input task="C" />
</connection>
<connection>
<output task="C" />
<input task="D" />
</connection>
</Task>
More on connecting tasks
The example of a complex task above show follow a simple dataflow
paradigm. Actually, the model presented here is more general.
A proxy module representing an atomic task can define more than
one input and output port. For example, the module can fire two
types of events: one signaling a successful completion of the
task, the other failure. Hence, a different action can be defined
depending on the outcome of processing the task (a different task
or a different method of the same task). Note, that submission
of a task is more than submitting a job. It may involve selecting
of host, file transfers, database access, mass storage access,
compilation, setting environmet variables, generating batch scripts,
generating Globus RSL strings, and more. Different methods of
the proxy module may implement different procedures for preparing
a job for submission and/or postprocessing - none of those require
any modifications of the code to be run at the back end.
Connecting modules by matching events and methods allows also
for constructing loops: completion of one task results in submission
of the other untill some stopping criteria are satisfied (say,
all input files are processed). If the back-end code is capable
of setting flags at runtime, the flags can be used for generating
custom events, which in turn can be used for ansynchronous communications,
or even message passing, between concurrent tasks (lattency permitting).
At this time, we have no mechanisms of specifying high performance
connections between tasks representing tightly coupled codes.
If the task defined more than one input port, there is a potential
ambiguity when to submit the task: when at least one or all events
are fired. We follow the conventions that all events defined within
a single <connection> tag have AND relationship,
while event defined in different <connection> tags are OR
related.
Examples:
<connection>
<output task="a" />
<output task="b" />
<input task="c" />
</connection>
in the above example both a AND b task must complete
to trigger task c
<connection>
<output task="a" />
<input task="c" />
</connection>
<connection>
<output task="b" />
<input task="c" />
</connection>
while here completion of either a OR b will result
in submitting task c
Since each task may define more than one input and output ports,
<input> and <output> tags within <connection>
tag have optional attributes, method and event, respectively to
reslove possible ambiguities, as shown in the example below:
Example (multiple ports):
<Task>
<TaskName name="ComplexTask"
/>
<Task>
<TaskName
name="task1" descriptor="task1.xml" />
<InputPort
event="run" />
<OutputPort
method ="done" />
<OutputPort
method="failure" />
</Task>
<Task>
<TaskName
name="task2" descriptor="task2.xml" />
<InputPort
event="run" />
<OutputPort
method ="done" />
</Task>
<Task>
<TaskName
name="task3" descriptor="task3.xml" />
<InputPort
event="run" />
<OutputPort
method ="done" />
<OutputPort
method="restart" />
</Task>
<connection>
<output task="task1"
event="done" />
<input task="task2"
/>
</ connection>
<connection>
<output task="task1"
event="failure" />
<input task="task3"
method="restart" />
</ connection>
<InputPort task="atomic_task1"
/>
<OutputPort task="atomic_task2"
/>
</Task>
In this example, if task1 fires event "done" then
method "run" of task2 is invoked. Otherwise, method
"restart" of task3 is submitted.
ATD.dtd
<!ELEMENT Task (TaskName, (Task|connection)*, InputPort+,
OutputPort+>
<!ELEMENT TaskName EMPTY>
<!ATTLIST TaskName
name
CDATA #REQUIRED
descriptor
CDATA #IMPLIED>
<!ELEMENT connection (output+,input+)>
<!ELEMENT output EMPTY>
<!ATTLIST output
task CDATA #REQUIRED
event CDATA #IMPLIED>
<!ELEMENT input EMPTY>
<!ATTLIST input
task CDATA #REQUIRED
method CDATA #IMPLIED>
<!ELEMENT InputPort EMPTY>
<!ATTLIST InputPort
task
CDATA #REQUIRED>
<!ELEMENT OutputPort EMPTY>
<!ATTLIST OutputPort
task CDATA
#REQUIRED>
|