[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: more on Workflow
Thanks Tomasz, I will look at your comments this evening, and get back to
you tomorrow.
-Hugh
> -----Original Message-----
> From: Tomasz Haupt [mailto:haupt@erc.msstate.edu]
> Sent: June 09, 2001 10:00 AM
> To: gce-wg@gridforum.org
> Subject: more on Workflow
>
>
> Oops! I underestimated the power of Netscape.
> In my previous mail I intended to respond to Hugh, and add my old
> document as an appendix to illustrate some of the ideas in
> the body of
> the mail. I was quite surprised to see my cc: Netscape made the
> appendix (which is an html file) as the body of the message,
> and added
> my text as an attachment!
> It should be the other way round. To avoid confusion am attaching the
> body of the message again because this is the real message.
>
> Sorry for confusion,
> Tomasz
> --------------------------------------------------------------
> ---------------------------------------
>
> Thanks Hugh! This is the first white paper coming from the GCE work
> group. Looks like we are making a progress. And it is my understanding
> that this draft is based on actual implementation, which is very
> important. To keep the ball rolling, let me offer few comments.
>
> 1. Do we really want to use the term "workflow"? During our meeting in
> San Diego I got impression that majority was against it, as this
> term is being used in a somewhat different sense outside our
> community. Within my framework (Mississippi Computational Web
> Portal - MCWP) I use the term complex task. Even you use this term
> in your abstract: " (...) a standard for the sequencing of complex
> high-performance computational tasks within a Grid".
>
> 2. Is the "sequencing" not too restrictive? I am hoping for a standard
> that describes a computational graph (as AVS or other visualization
> packages often implement). Let's assume that the complex task is
> composed of "modules" or "atomic tasks". It seems to me that
> sequencing means processing one module after another. Can't we
> generalize it to more complex graph? Such as results of one modules
> can be feed to several modules running concurrently, or one module
> being feed with data coming from two or more modules (or being
> dependent on them in any other way)? Actually you admit this
> problem in section 6.
>
>
> 3. It seems to me you are suggesting an enumeration of "atomic tasks":
> computation, resource query, data transfer, ..., etc. Again, is it
> not too restrictive? Actually I have several problems with that.
>
> i. Can we hope to get a complete list of atomic tasks? Isn't it an
> invitation for nonstandard extensions?
>
> ii. Is it the purpose of this document to define terms such
> "computation" or "data transfer"? I think we should focus on
> describing how to compose a complex task from constituents, and not
> to describe how to process constituents.
>
> iii. What about capability of hierarchical composition of complex
> tasks? Say, we want to build a task from application A, B, and
> C. Now, each of them is performed is steps: identify
> resource, stage,
> compile, preprocess, run, etc. Hiding complexity in this case
> would be building the final task descriptor from task descriptors
> for each application (A,B,C), while these are built from atomic
> tasks such as compile, run, transfer data. Additional advantage of
> such an approach is that the task descriptor for, say, application
> B can be created by the domain specialist and may be then
> "published" so it can be reused by less savvy users. This is my
> experience working with Climate, Weather and Ocean modeling that
> setting a model to run involves much too many details for a
> physicist/oceanographer/meteorologis to fully comprehend. Defining
> a task descriptor for a particular model dramatically helps.
>
> What would be remedy? Instead of having an enumeration of tasks,
> let us introduce a generic terms: "atomic task" and "complex task".
> The atomic task contains a reference to the application descriptor
> (a GCE-WG white paper in preparation), and the complex task
> contains references to its constituents (atomic tasks).
>
> A simple example (just to illustrate idea, not to suggest syntax):
>
> atomic task:
> <task>
> <taskName name="aTask" descriptor="aDescriptor.xml" />
> </task>
>
> complex task:
> <task>
> <taskName name="complexTask" />
> <task>
> <taskName name="aTask1" descriptor="aDescriptor1.xml" />
> </task>
> <task>
> <taskName name="aTask2" descriptor="aDescriptor2.xml" />
> </task>
> </task>
>
> Note hierarchical/recursive definition of task: The <task> tag
> describes both a simple task and a complex task, the difference is
> in the tag attribute. Again, take it as a concept and not suggested
> syntax.
>
> What is missing here is relationship between the tasks. In my work
> I use the concept of a port (in the sense similar to what is in
> AVS). Each module (or task) define input ports and output
> ports. One compose a complex task by associating output ports with
> input ports. New information on the input port triggers processing
> of the module (can be .AND. or .OR., if more than one port), in a
> classical dataflow way. In the implementation I made a couple of
> years ago, an output port was an event fired by the module, and the
> input port is the method to be invoked (a particular event
> listener). Admittedly, this is very implementation specific, but I
> am sure that we can work out a more general model along this
> lines.
>
> This oo approach does not preclude working with legacy
> codes. First of all, my middle tier operates on application proxies
> and these are java object. Then, in the simplest case, a dusty
> fortran deck is represented by an object that has method run (input
> port) and fires event "done" (output port). If you bother to check
> return codes, you can easily fire event "failure", if this is the
> case. And you can do much more in such a paradigm.
>
> It is important to note, that unlike AVS, my ports does not
> represent data. I am not sending data from module to
> module. Instead I am sending events. You may envision a data
> transfer module for moving output of one module to another. But
> wait, there is more. I envision that the complex task descriptor is
> passed to a metascheduler that would be capable to optimize the
> task. As a byproduct the metascheduler can automatically determine
> the need of a file transfer, so the file transfer module is not
> needed at all!
>
> To summarize, I would not define the enumeration of atomic tasks, and
> instead define a generic task that can be defined recursively. In
> addition, instead of defining properties of the atomic tasks (such as
> attributes of "Computation") but instead use references to application
> descriptors. I feel very strongly that these belong to the another
> GCE-WG white paper. Finally, I would recommend that we look at
> defining relationships between tasks within a complex task.
>
> I would appreciate your comments on my thought about complex task
> descriptor. To be a little more constructive, I am appending task
> descriptors that I used in my work a couple years ago, and intend to
> use in the near future while developing MCWP. Again, do not look at it
> as the mature draft of the standard, but rather a bunch of ideas to be
> considered.
>
> Tomasz
>
>
>
>
>