[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: more on Workflow



Thanks Tomasz, I will look at your comments this evening, and get back to
you tomorrow. 

-Hugh

> -----Original Message-----
> From: Tomasz Haupt [mailto:haupt@erc.msstate.edu]
> Sent: June 09, 2001 10:00 AM
> To: gce-wg@gridforum.org
> Subject: more on Workflow
> 
> 
> Oops! I underestimated the power of Netscape.
> In my previous mail  I  intended to respond to Hugh, and add my old 
> document as an appendix to illustrate some of the ideas in 
> the body of 
> the mail. I was quite surprised to see my cc:  Netscape made the 
> appendix (which is an html file) as the body of the message, 
> and added 
> my text as an attachment!
> It should be the other way round. To avoid confusion am attaching the 
> body of the message again because this is the real message.
> 
> Sorry for confusion,
> Tomasz
> --------------------------------------------------------------
> ---------------------------------------
> 
> Thanks Hugh! This is the first white paper coming from the GCE work
> group. Looks like we are making a progress. And it is my understanding
> that this draft is based on actual implementation, which is very
> important. To keep the ball rolling, let me offer few comments.
> 
> 1. Do we really want to use the term "workflow"? During our meeting in
>   San Diego I got impression that majority was against it, as this
>   term is being used in a somewhat different sense outside our
>   community. Within my framework (Mississippi Computational Web
>   Portal - MCWP) I use the term complex task. Even you use this term
>   in your abstract: " (...) a standard for the sequencing of complex
>   high-performance computational tasks within a Grid".
> 
> 2. Is the "sequencing" not too restrictive? I am hoping for a standard
>   that describes a computational graph (as AVS or other visualization
>   packages often implement). Let's assume that the complex task is
>   composed of "modules" or "atomic tasks". It seems to me that
>   sequencing means processing one module after another. Can't we
>   generalize it to more complex graph? Such as results of one modules
>   can be feed to several modules running concurrently, or one module
>   being feed with data coming from two or more modules (or being
>   dependent on them in any other way)? Actually you admit this
>   problem in section 6.
> 
> 
> 3. It seems to me you are suggesting an enumeration of "atomic tasks":
>   computation, resource query, data transfer, ..., etc. Again, is it
>   not too restrictive? Actually I have several problems with that.
> 
>   i. Can we hope to get a complete list of atomic tasks? Isn't it an
>   invitation for nonstandard extensions?
> 
>   ii. Is it the purpose of this document to define terms such
>   "computation" or "data transfer"? I think we should focus on
>   describing how to compose a complex task from constituents, and not
>   to describe how to process constituents.
> 
>   iii. What about capability of hierarchical composition of complex
>   tasks? Say, we want to build a task from application A, B, and
>   C. Now, each of them is performed is steps: identify 
> resource, stage,
>   compile, preprocess, run, etc. Hiding complexity in this case
>   would be building the final task descriptor from task descriptors
>   for each application (A,B,C), while these are built from atomic
>   tasks such as compile, run, transfer data. Additional advantage of
>   such an approach is that the task descriptor for, say, application
>   B can be created by the domain specialist and may be then
>   "published" so it can be reused by less savvy users. This is my
>   experience working with Climate, Weather and Ocean modeling that
>   setting a model to run involves much too many details for a
>   physicist/oceanographer/meteorologis to fully comprehend. Defining
>   a task descriptor for a particular model dramatically helps.
> 
>   What would be remedy? Instead of having an enumeration of tasks,
>   let us introduce a generic terms: "atomic task" and "complex task".
>   The atomic task contains a reference to the application descriptor
>   (a GCE-WG white paper in preparation), and the complex task
>   contains references to its constituents (atomic tasks).
> 
>   A simple example (just to illustrate idea, not to suggest syntax):
>  
>   atomic task:
>   <task>
>    <taskName name="aTask" descriptor="aDescriptor.xml" />
>   </task> 
> 
>   complex task:
>   <task>
>    <taskName name="complexTask" />
>    <task>
>      <taskName name="aTask1" descriptor="aDescriptor1.xml" />
>    </task> 
>    <task>
>      <taskName name="aTask2" descriptor="aDescriptor2.xml" />
>    </task> 
>   </task> 
> 
>   Note hierarchical/recursive definition of task: The <task> tag
>   describes both a simple task and a complex task, the difference is
>   in the tag attribute. Again, take it as a concept and not suggested
>   syntax.
> 
>   What is missing here is relationship between the tasks. In my work
>   I use the concept of a port (in the sense similar to what is in
>   AVS). Each module (or task) define input ports and output
>   ports. One compose a complex task by associating output ports with
>   input ports. New information on the input port triggers processing
>   of the module (can be .AND. or .OR., if more than one port), in a
>   classical dataflow way. In the implementation I made a couple of
>   years ago, an output port was an event fired by the module, and the
>   input port is the method to be invoked (a particular event
>   listener). Admittedly, this is very implementation specific, but I
>   am sure that we can work out a more general model along this
>   lines.
> 
>   This oo approach does not preclude working with legacy
>   codes. First of all, my middle tier operates on application proxies
>   and these are java object. Then, in the simplest case, a dusty
>   fortran deck is represented by an object that has method run (input
>   port) and fires event "done" (output port). If you bother to check
>   return codes, you can easily fire event "failure", if this is the
>   case. And you can do much more in such a paradigm.
> 
>   It is important to note, that unlike AVS, my ports does not
>   represent data. I am not sending data from module to
>   module. Instead I am sending events. You may envision a data
>   transfer module for moving output of one module to another. But
>   wait, there is more. I envision that the complex task descriptor is
>   passed to a metascheduler that would be capable to optimize the
>   task. As a byproduct the metascheduler can automatically determine
>   the need of a file transfer, so the file transfer module is not
>   needed at all! 
> 
> To summarize, I would not define the enumeration of atomic tasks, and
> instead define a generic task that can be defined recursively. In
> addition, instead of defining properties of the atomic tasks (such as
> attributes of "Computation") but instead use references to application
> descriptors. I feel very strongly that these belong to the another
> GCE-WG white paper. Finally, I would recommend that we look at
> defining relationships between tasks within a complex task.
> 
> I would appreciate your comments on my thought about complex task
> descriptor. To be a little more constructive, I am appending  task
> descriptors that I used in my work a couple years ago, and intend to
> use in the near future while developing MCWP. Again, do not look at it
> as the mature draft of the standard, but rather a bunch of ideas to be
> considered.
> 
> Tomasz
>  
>  
>  
> 
>