Sorry I wasn't able to attend the call: I'm trying to clear the
decks to become more actively involved in this group.
Just to correct one
misunderstanding on BTP. (I write as one of the co-authors of the OAIS BTP
specification.)
BTP allows complete freedom to implementers of services
participating in a business transaction: if they wish to use a DO-COMPENSATE
model, then they can do so. If they wish to use a PROVISIONAL-FINAL model, or a
VALIDATE-DO, then they can do so. These three models form a spectrum of
participant patterns for business transactions (i.e. transactions that link
related state-changing operations in autonomous applications or
services).
BTP permits this by sending one of two messages to
participants or sub-coordinators: CONFIRM or CANCEL. Each of these messages can
be faulted, if a technical failure or business event has prevented or superseded
delivering on the promise of an earlier PREPARED message. The application has
complete freedom to define its actions prior to sending PREPARED, and in
reaction to either CONFIRM or CANCEL. The "application" could be a generic
resource manager operating under conventional ACID rules; it could be a business
service that treats CANCEL as "charge 2.5% for the cancelled reservation" and
CONFIRM as "charge 100%". Any application or business rules can apply.
In
our company, Choreology, we have developed the pretty firm view that the
canonical model is PROVISIONAL-FINAL (where some kind of application resource
reservation algorithm is used to put a service into a pending state with respect
to a request). Examples from the business world that use this pattern, very
naturally, are quote-to-order, or reserve credit (authorize), do trade (trigger
payment). The service transits from a tentative or provisional state to a final
state (either confirmed or cancelled) under the control of the business
transaction coordination service, using a distributed coordination protocol like
BTP.
One way or the other, this implies the use of some kind of application-level lock (often as simple as a pending status flag). However, and this can be of critical value in some applications, we can allow observation of provisional states. It may be very useful to see potential consumption of a resource, including in deciding when to allow reservation of more of that resource. A probabilistic approach to reservation can allow more sophisticated use of inventory. The use of an application-determined locking strategy can tune the competing demands of concurrency and accuracy in a very knowledgeable way. This frees systems from the rigorous, but limiting, constraints imposed by general-purpose data-level resource managers such as DBMS and message queues.
Both VALIDATE-DO and DO-COMPENSATE are afflicted to differing degrees by "promise degradation" over time. When a system goes prepared with respect to a business transaction it promises to obey the final decision of the coordinating application. These two patterns create wobbly promises.
(Parenthetically: coordinating applications should be able
to decide that some participants are to be confirmed, while others are to be
cancelled: what BTP calls a Cohesion, or cohesive business transaction, where
the uniform outcome rule of classic atomic transactions is relaxed. This allows
several viable outcomes, from the most desirable to the minimally acceptable, to
be created. "All-or-nothing" is not a natural feature of many business
transactions.)
VALIDATE-DO: a valid potential operation now may become an
invalid operation in the future. This may lead to recoil: the attempted
confirmation must be faulted. A familiar pattern of optimistic concurrency
control, which can create relatively localized damage (failure of an interaction
between the two parties, breach of a globally uniform outcome in a single
business transaction).
DO-COMPENSATE: much worse. If one's
"provisional" action is identical to one's "confirmed" action, then the system
which enacts DO really enacts it. And when I say "really" I mean: right the way
through to physical actions such as manufacture-to-order, inventory
picking/movement, invoice generation, payments etc. The "effect explosion" gets
worse and worse, the longer you leave the newly unleashed business process
running. COMPENSATE operations then become very complex indeed: they must
accomodate a plethora of time-dependent states, affecting numerous internal
systems and potentially external organizations. This effect explosion problem
can lead to system designers breaking natural business transactions in two:
first do a quote, then do an order. The recoverable correlation offered by the
simplifying transaction abstraction is then lost.
WS-BusinessActivity
(first edition, August 2002, revised 2004) from BEA/IBM/Microsoft is very
similar to BTP (not surprising, as it post-dates the fundamental conceptual work
of BTP in 2001). But it suffers a particular defect, which is small, but
very limiting. This defect arises from its original
defining use-case: BPEL Long-running Transactions (LRT).
BPEL
processes are formed of nested scopes. Scopes can be defined to be "reversible"
(compensatable). Service invocations from such scopes could use WS-BA to
distribute the implied transactional context to remote service operations,
allowing them in turn to become "reversible". Reversibility is expressed in the
ability of a BPEL scope to have a compensation handler. But BPEL scopes (and
transitively, WS-BA-infected services) cannot define a "confirmation" or
"finalization" handler. This means that BPEL can only employ one model on the BT
participant spectrum: DO-COMPENSATE. My hope would be that this limitation will
be lifted in a future version of BPEL, allowing scopes to be defined as
"contingent", implying presence of both confirm and cancel ("compensation")
handlers. (Another way of putting this is: currently BPEL uses the Open Nested
Transaction model, which has not gained wide practical support because of the
problems I have outlined with its implications for participant
behaviour.)
In WS-BA this limitation is expressed in the fact that a CLOSE
message (akin to a BTP CONFIRM) cannot be faulted: it is assumed that the only
effect of CLOSE is to drop the compensation handler (in transaction terms, to
logically delete the log entry which allows recoverable memory of the business
transaction to be retained), and it is further assumed that failure to logically
delete the log is a non-fatal technical error, that can be passed over silently,
and cleaned up by some form of GC.
We have proposed to the authors of
WS-BA that this restriction be lifted. This is a very simple change to WS-BA,
but a liberating one. If CLOSE can be faulted, then the full spectrum of BT
participant patterns can be used, as the service writer desires. And it should
be noted: each, relatively autonomous, service drawn into a coordinated business
transaction may choose to use a different participant model, concurrently. It
should also be noted that the BPEL LRT (do-compensate) model simply becomes a
particular (perfectly valid) use of a more general facility.
The
following diagram shows the change required to the WS-BA
protocol.

The explicit
recognition of the effect of time on "options" or "reservations" in BTP is
another useful feature, which we would like to see introduced into any other
protocol (such as WS-BA), that is intended for use in application-level
coordination.
At some point in the near future we intend to publish our currently private feedback to the authors of WS-Coordination, WS-AtomicTransaction and WS-BusinessActivity, and we will post the link to this group, inter alia.
I hope that the group has a productive meeting in Hawaii.
Alastair