[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: DAIS Data Sets
Hi Jim,
I was impressed by the talk and demonstrations that
you gave at Edinburgh today. What is also impressive is that
all of this was done using Web Services. I have a general
question - would using OGSI buy you anything over a purely
Web services solution? Also, in your view, what should/could
DAIS do to facilitate the type of work that you guys are
doing with the sky server? I, and possibly the group, would
be interested on your take on this.
Mario
On Fri, 4 Jul 2003, Jim Gray wrote:
> The Internet (and I suspect the Grid) works by having very little state
> in the middle.
> This gives a scalable and fault-tolerant design -- and is a VERY simple
> design pattern.
> All the state is in the client or the database (beans get persistence by
> saving state to a DBMS or file system).
> It is sometimes a pain to design for this "queue-oriented" "loosely
> coupled" world
> -- but the resulting designs are generally more scalable than
> connection-oriented schemes.
>
> HTTP and SMTP follow this model: they are stateless.
> This is the emissary - fiefdom model.
> All state in the "middle" is soft.
>
> Anyway, the .NET folks observe that a DataSet is an answer to a
> question.
> It is self-consistent (transactional) but transactions do not span
> questions.
> If the client wants to persist the dataset, fine!
> But, it is emissary state (a stale copy of the fiefdom's data).
> They provide diffgrams (optimistic concurrency control) if the client
> wants to update sets or make inserts or deletes.
> But, being mostly web-centric they encourage you to use web methods to
> update objects (transfer funds).
> This should be a familiar song to the EJB folks.
> Transactions are fine within a service,
> but operations that span multiple autonomous services are probably best
> done with WS-Transaction sagas with compensation
> driving queue-oriented services.
>
> That is the mindset that motivates the design of the .NET dataset.
>
> This is one place where the ODBC model (and the file open/read/write
> model) is VERY different than the internet model.
> ODBC tightly couples client and server.
> You can look at WebDAV to see how the internet-folks do file access
> (similar ideas are in CIFS/SAMBA operations packages).
> There are sometimes leases (think shopping carts), but no real
> connections.
>
> You may be skeptical that large or complex apps can be built in this
> way.
> If so, I encourage you to look at IBMs IMS (it is a queue driven
> system), or look at .NET apps like SkyQuery.
>
> Scalable designs require loose coupling and only soft state in the
> middle.
> Much of the brittleness of some current middleware stems from forgetting
> this hard-learned lesson.
>
> -----Original Message-----
> From: Simon Laws [mailto:simon_laws@uk.ibm.com]
> Sent: Wednesday, July 02, 2003 12:34 PM
> To: Jim Gray
> Cc: dais-wg@gridforum.org; Jim Gray; szalay@pha.jhu.edu; Tamas Budavari;
> Maria A. Nieto-Santisteban
> Subject: Re: DAIS Data Sets & Transformations
>
> Hi Jim
>
> I'm becoming attached to "data set" so I agree but we have, to date,
> combined data identification with data access into data set and this is
> causing confusion.
>
> So, when we considered data set originally it was aligned with the "part
> of the file you asked for" idea that you suggest below. However many
> people see it simply as a mechanism for accessing data represented as a
> data resource. I.e. there is no implication of caching in a data set. In
> fact we must provide both of these functions but we haven't yet achieved
> a consensus on what component is responsible for what.
>
> Malcolm generated a list of orthogonal properties that could be applied
> to a data set as it stands, for example, data type, materialization
> policy (on-demand, eager..), security policy, delivery policy
> (synchronous, asynchronous), unit of access (all at once, iteration),
> lifetime ( use once, use many) etc. This gives us a challenge as to how
> our interfaces should be factored out. For those services that represent
> data it is natural to build a hierarchy of interfaces based on data
> type. So, if data set represents data, you can imagine the hierarchy
> you suggest:
>
> Data Set
> file
> xml doc
> ODBC rowset
> .NET style dataset
> cube
> HDF
> FITS
> VOtable
> CSV
>
> If data set were simply representing access to data resources a
> different hierarchy of operations and properties emerges, for example,
>
> Data Set
> Materialization
> On demand
> Eager
> Parallel
> AccessMode
> Pull
> PushToOne
> PushToSubscribers
> AccessModel
> Full
> Incremental
> Unit of access
> Lifetime
> UseOnce
> UseMany
> Etc...
>
> These are all fairly general except for "unit of access" which is
> probably related to the type of the data. The next job is to come up
> with a proposal for how these are positioned in the DAIS model in the
> context of the debate around what a data set really is. I don't know
> the answer to this but it feels like the way that we type data and the
> way that we access it should separated as is the case, for example, with
> file descriptors and read/write operations in Unix and File and
> associated streams in Java.
>
> On the transformation point I agree that transformation can be
> considered to be something that falls outside of the drm, dr, das, ds
> structure. For example, you can apply a transformation to a data set
> and obtain a new data set but the transformation itself does not have to
> be defined by DAIS. I do think that users of this technology will want
> to specialize the components to present clients with tailored
> interfaces, for example, specialized query languages or results in
> consistent formats across data resources, without having to chain many
> grid services together to achieve the effect.
>
> Regards
>
> Simon
>
> Simon Laws
> IBM Hursley Services and Technology
>
> "Jim Gray" <gray@microsoft.com> on 07/02/2003 09:34:20 AM
>
> To: Simon Laws/UK/IBM@IBMGB, <dais-wg@gridforum.org>
> cc: "Jim Gray" <gray@microsoft.com>, <szalay@pha.jhu.edu>, "Tamas
> Budavari" <budavari@pha.jhu.edu>, "Maria A. Nieto-Santisteban"
> <nieto@skysrv.pha.jhu.edu>
> Subject: DAIS Data Sets & Transformations
>
>
> "DataSet" seems like a perfectly fine name.
> It will need many sub-classes (file, xml doc, ODBC rowset, .NET style
> dataset, cube, HDF, FITS, VOtable, CSV,....) as things progress.
>
>
> For example:
> DataManager == FileServer
> DataResource == File
> DataActivity == FileHandle
> DataSet == the part of the file you asked for.
> Many other sub-classes (for these 4 data super-classes) will be defined
> as different groups define their world inside this framework.
>
>
> I am particularly interested in the ".NET dataset equivalent" sub-class
> since that is what we are using in the Virtual Observatory, and that is
> what is needed by portals that want relational metadata in a single
> response package (tell me all your tables, columns, indices,?).
>
>
> Adding transformations beyond the (odbc-speak) commands presented to the
> data activity seems overkill.
> The commands have lots of transformations already, adding more is
> orthogonal to the data access issues.
>
>
>
>
>
> -----Original Message-----
> From: owner-dais-wg@gridforum.org [mailto:owner-dais-wg@gridforum.org]
> On Behalf Of Simon Laws
> Sent: Monday, June 30, 2003 8:50 AM
> To: dais-wg@gridforum.org
> Subject: DAIS GGF8 Session 3 and Data Sets
>
>
> Thank you to those who attended and contributed to the DAIS sessions at
> GGF8. This is just a short note on the conversation during Session 3
> where we debated the role of Data Set. This is not the official minute
> but I wanted to put my recollection (and the data I captured on slides
> during the
>
>
> meeting) out there.
>
>
> In the specification to date we have set out a position where the "data
> set" artifact describes data that is logically disconnected from a data
> resource. A data set can be produced by a data activity session, moved,
> copied, transformed and then consumed by another data activity session
> so updating a data resource.
>
>
> In DAIS session 3 at GGF 8 we discussed around this and started debating
> data set as an interface. I.e. The data resource is a data container.
> The data activity session is a transformation. The data sets are handles
> to input and output interfaces. Data set becomes a mechanism for
> packaging standard data access techniques. If there is a requirement to
> represent a collection of physical data then this is the job of the data
> resource.
>
>
> Steve Tuecke gave an example where a client wishing to use GridFTP to
> move data can ask for a GridFTP compatible ds to be constructed to
> provide access to the data to be moved. This differs from the current
> position where the data set would BE the data to be moved rather than
> just an interface to it.
>
>
> We started capturing likely properties and possible alternative names
> for a data set.
>
>
> Properties Of a Data Set:
>
>
> - Control / Properties
> ?Use exclusive
> ?Isolation levels
> - Lifetime
> - Open OR Use OR Connect OR Bind
> - Close
> - Get next item
> - Type
>
>
> Possible Names for a Data Set:
>
>
> -Physical
> ?Data resource
> - Logical
> ?Data service
> - Data interface
> ?Data set interface
> ?Data handler
> ?Data provider / Data consumer
>
>
> We didn't make any definitive decisions about the future of data set and
> I am not making a judgment here but a clarification of the position and
> role of data set is clearly required.
>
>
> As the topic holder for "The Model" I propose to work over the next few
> weeks to develop the debate into a proposition and a new revision of the
> model section of the DAIS specification. Any thoughts, comments, ideas
> at this stage are of course most welcome.
>
>
> Regards
>
>
> Simon
>
>
> Simon Laws
> IBM Hursley Services and Technology
>
>
>
>
>
>
>
>
-------------------------------------------------------------------------
|Mario Antonioletti:EPCC,JCMB,The King's Buildings,Edinburgh EH9 3JZ. |
|Tel:0131 650 5141:mario@epcc.ed.ac.uk:http://www.epcc.ed.ac.uk/~mario/ |
-------------------------------------------------------------------------