Short and Long-Running Processes in SOA-part1Ahren Stevens-Taylor
Short and Long-Running Processes
As a process moves from activity to activity, it consumes time, and each activity adds to the overall duration. But different sorts of activities have different durations, and it’s not uncommon to observe a ten-step process that outpaces, say, a five-step one. It depends, of course, on what those activities are doing.
In SOA, process cycle times range from one second or less to one or more years! The latter sort need not have a large number of activities. The pyramids might have been built rock-by-rock over several decades, but protracted SOA processes typically span only a few dozen tasks, a handful of which consume almost the entire interval.
As we discuss in this article, most of that time is spent waiting. The disputes process often requires several months to complete, because at various times it sits idle waiting for information from the customer, the merchant, or the back office. Business processes crawl along at human speed, it often makes sense to let SOA manage the end-to-end flow.
It’s not easy to build an SOA process engine that can simultaneously blaze through a sub-second process but keep on top of a one that hasn’t moved in weeks. On the other hand, when a long-running process rouses, we expect the engine to race very quickly to the next milestone. The central argument of this article is that both long-running and short-running processes run in very quick bursts, but whereas a short-running process runs in a single burst, a long-running process might have several bursts, separated by long waits. To support long-running processes, the process engine needs a strategy to keep state.
In this article, we examine the fundamental differences between long-running and short-running processes. We discuss how to model state, and demonstrate how to build a long-running process as a combination of several short-running processes tied together by state. We also show how to compile short-running BPEL processes to improve the execution speed of a burst.
Process Duration—the Long and Short of It
SOA processes have the following types of activities:
- Tasks to extract, manipulate, or transform process data
- Scripts or inline code snippets
- Calls to systems and services, both synchronous and asynchronous
- Events, including timed events, callbacks, and unsolicited notifications from systems
The first three sorts of activities execute quickly, the first two in the order of milliseconds, the third often sub-second but seldom more than a few seconds (in the case of a synchronous call to a slow system). These activities are active: as the process navigates through them, it actively performs work, and in doing so ties up the process engine. Event times are generally much longer and more variable. Events come from other systems, so (with the exception of timed events) the process cannot control how quickly they arrive. The process passively waits for events, in effect going to sleep until they come.
An event can occur at the beginning of a process—indeed, every SOA process starts with an event—or in the middle. An event in the middle is called an intermediate event. The segment of a process between two events is called a burst. In the following figure, events are drawn as circles, activities as boxes, and bursts as bounding boxes that contain activities. Process (a), for example, starts with an event and is followed by two activities—Set Data and Sync Call—which together form a burst. Process (b) starts with an event, continues with a burst (consisting of the activities Set Data and Call System Async), proceeds to an intermediate event (Fast Response), and concludes with a burst containing the activity Sync Call. Process (c) has two intermediate events and three bursts, and (d) has a single intermediate event and two bursts.
Processes are classified by duration as follows:
- Short-running: The process runs comparatively quickly, for not more than a few seconds. Most short-running processes run in single burst (as in process (a) in the figure), but some have intermediate events with fast arrival times—as in (b), where the intermediate event, a response to an asynchronous system call, arrives in about two seconds—and thus run in multiple bursts. TIBCO’s BusinessWorks and the BPEL compiler described later in the article are optimized to run both single-burst and multiple-burst short-running processes. BEA’s Weblogic Integration can run single-burst, short-running processes with limited overhead, but, as discussed further next, treats cases like (b) as long-running.
- Long-running: The process has multiple bursts, and the waiting times of its intermediate events are longer than the process engine itself is expected to run before its next restart! In process (d), for example, the engine is restarted for maintenance while the process waits two days for a human action. The process survives the restart because its state is persisted. At the end of its first burst (that is, after the Assign Work step), the engine writes the state to a database, recording the fact that the process is now waiting on an event for a human action. When the engine comes back up, it fetches the state from the database to remember where it left off. Most BPEL processes are long-running. In Weblogic Integration, stateful processes can run for arbitrarily long durations.
- Mid-running: The process has multiple bursts, but the waiting times of its intermediate events last no more than a few minutes, and do not need to be persisted. Stakeholders accept the risk that if the process engine goes down, in-flight processes are lost. Chordiant’s Foundation Server uses mid-running processes to orchestrate the interaction between agent and customer when the customer dials into a call center. The call is modeled as a conversation, somewhat like a sequence of questions and answers. A burst, in this design, processes the previous answer (for example, the Process Answer activity in (c)) and prepares the next question (Prepare Question). Intermediate events (Get Answer) wait for the customer to answer. State is held in memory