[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
more on Workflow
Oops! I underestimated the power of Netscape.
In my previous mail I intended to respond to Hugh, and add my old
document as an appendix to illustrate some of the ideas in the body of
the mail. I was quite surprised to see my cc: Netscape made the
appendix (which is an html file) as the body of the message, and added
my text as an attachment!
It should be the other way round. To avoid confusion am attaching the
body of the message again because this is the real message.
Sorry for confusion,
Tomasz
-----------------------------------------------------------------------------------------------------
Thanks Hugh! This is the first white paper coming from the GCE work
group. Looks like we are making a progress. And it is my understanding
that this draft is based on actual implementation, which is very
important. To keep the ball rolling, let me offer few comments.
1. Do we really want to use the term "workflow"? During our meeting in
San Diego I got impression that majority was against it, as this
term is being used in a somewhat different sense outside our
community. Within my framework (Mississippi Computational Web
Portal - MCWP) I use the term complex task. Even you use this term
in your abstract: " (...) a standard for the sequencing of complex
high-performance computational tasks within a Grid".
2. Is the "sequencing" not too restrictive? I am hoping for a standard
that describes a computational graph (as AVS or other visualization
packages often implement). Let's assume that the complex task is
composed of "modules" or "atomic tasks". It seems to me that
sequencing means processing one module after another. Can't we
generalize it to more complex graph? Such as results of one modules
can be feed to several modules running concurrently, or one module
being feed with data coming from two or more modules (or being
dependent on them in any other way)? Actually you admit this
problem in section 6.
3. It seems to me you are suggesting an enumeration of "atomic tasks":
computation, resource query, data transfer, ..., etc. Again, is it
not too restrictive? Actually I have several problems with that.
i. Can we hope to get a complete list of atomic tasks? Isn't it an
invitation for nonstandard extensions?
ii. Is it the purpose of this document to define terms such
"computation" or "data transfer"? I think we should focus on
describing how to compose a complex task from constituents, and not
to describe how to process constituents.
iii. What about capability of hierarchical composition of complex
tasks? Say, we want to build a task from application A, B, and
C. Now, each of them is performed is steps: identify resource, stage,
compile, preprocess, run, etc. Hiding complexity in this case
would be building the final task descriptor from task descriptors
for each application (A,B,C), while these are built from atomic
tasks such as compile, run, transfer data. Additional advantage of
such an approach is that the task descriptor for, say, application
B can be created by the domain specialist and may be then
"published" so it can be reused by less savvy users. This is my
experience working with Climate, Weather and Ocean modeling that
setting a model to run involves much too many details for a
physicist/oceanographer/meteorologis to fully comprehend. Defining
a task descriptor for a particular model dramatically helps.
What would be remedy? Instead of having an enumeration of tasks,
let us introduce a generic terms: "atomic task" and "complex task".
The atomic task contains a reference to the application descriptor
(a GCE-WG white paper in preparation), and the complex task
contains references to its constituents (atomic tasks).
A simple example (just to illustrate idea, not to suggest syntax):
atomic task:
<task>
<taskName name="aTask" descriptor="aDescriptor.xml" />
</task>
complex task:
<task>
<taskName name="complexTask" />
<task>
<taskName name="aTask1" descriptor="aDescriptor1.xml" />
</task>
<task>
<taskName name="aTask2" descriptor="aDescriptor2.xml" />
</task>
</task>
Note hierarchical/recursive definition of task: The <task> tag
describes both a simple task and a complex task, the difference is
in the tag attribute. Again, take it as a concept and not suggested
syntax.
What is missing here is relationship between the tasks. In my work
I use the concept of a port (in the sense similar to what is in
AVS). Each module (or task) define input ports and output
ports. One compose a complex task by associating output ports with
input ports. New information on the input port triggers processing
of the module (can be .AND. or .OR., if more than one port), in a
classical dataflow way. In the implementation I made a couple of
years ago, an output port was an event fired by the module, and the
input port is the method to be invoked (a particular event
listener). Admittedly, this is very implementation specific, but I
am sure that we can work out a more general model along this
lines.
This oo approach does not preclude working with legacy
codes. First of all, my middle tier operates on application proxies
and these are java object. Then, in the simplest case, a dusty
fortran deck is represented by an object that has method run (input
port) and fires event "done" (output port). If you bother to check
return codes, you can easily fire event "failure", if this is the
case. And you can do much more in such a paradigm.
It is important to note, that unlike AVS, my ports does not
represent data. I am not sending data from module to
module. Instead I am sending events. You may envision a data
transfer module for moving output of one module to another. But
wait, there is more. I envision that the complex task descriptor is
passed to a metascheduler that would be capable to optimize the
task. As a byproduct the metascheduler can automatically determine
the need of a file transfer, so the file transfer module is not
needed at all!
To summarize, I would not define the enumeration of atomic tasks, and
instead define a generic task that can be defined recursively. In
addition, instead of defining properties of the atomic tasks (such as
attributes of "Computation") but instead use references to application
descriptors. I feel very strongly that these belong to the another
GCE-WG white paper. Finally, I would recommend that we look at
defining relationships between tasks within a complex task.
I would appreciate your comments on my thought about complex task
descriptor. To be a little more constructive, I am appending task
descriptors that I used in my work a couple years ago, and intend to
use in the near future while developing MCWP. Again, do not look at it
as the mature draft of the standard, but rather a bunch of ideas to be
considered.
Tomasz