The work flow repair system.

Case study

trip database

short security advice

security advice

trip database

get dates

transiently faulty

This plans the sequence of repair actions which must be applied to the faulty work flow.
Time points 1 to 9 represent the original work flow execution. Which means that at the time point 9 a fault was detected and diagnosed.

The repair begins at the time moment 10. Activity get_dates can be simply redone, because it does not read any object and was not blamed by the diagnoser as a permanently faulty activity.
The repair plan shows that at time moments 11 and 12 there is no need to redo activities get_destination and short_sec_advice - they were executed correctly and do not depend (by any data dependency) on any faulty activity or data produced by a faulty or a faulty executed activity. So, such objects as destination and security_advice can be simply reused.

The XOR choosing block used the state of the object destination which was produced by a faulty activity. Which means that now this XOR split must be redone. It is possible because the new correct state of the object duration was obtained at the time moment 10.

Redoing the XOR choosing block at the time point 13 results in the branching within the repair plan. Time points 14 till 18 correspond to the case when duration is lower that 1 day, and time points 20 till 26 correspond to the case when duration of the trip is greather than 1 day (and according to the work flow definition the hotel must be booked and the new detailed security advice obtained).

In the case when the trip will take not longer than a day, no hotel is needed and no detailed security advice must be obtained. But during the original work flow execution it was already done. That's why at time points 14, 15 and 17 these actions have to be compensated (which means to cancel the hotel reservation, delete the information about the security advice and restore the trip data). The data about the trip must be restored at the time point 17 in order to make the redoing of the store_trip_data activity possible - this activity reads and affects the trip_database which was incorrectly chaged during the original work flow execution at time moment 8.

In the case when the trip will take longer than a day, the hotel reservation must be done and detailed security advice must be obtained. These actions were originally done during the original work flow execution. But not all of them can be reused: the hotel was booked using the incorrect information about the duration of the trip. That's why this booking has to be canceled at the time point 20 and the new booking performed at time point 21.
But for obtaining the detailed security advice only destination and the short security advice were used. They both were not infected according to the data dependencies within the work flow. That's why there is no need to compensate detailed_sec_advice at time point 22.

The data about the trip must be retored because of the same reasons as it was done at time point 17.

As we see, executing this repair plan will eliminate several redundant repair actions (at time points 11,12,22 and 23) and will guarantee that at the end of the work flow execution (in both cases of the XOR split) all objects will have correct states.

Examples

The folder with the "trip planning" example and random-generated work flows can be accessed from here

the work flow model with diagnosis and initial state (partial execution);

the graphical representation of this work flow;

the repair plan generated using the reasoning module (see Source code );

the graphical representation of this work flow;

the result of simulated execution of this work flow - states of each objects at each time point;

the BPEL source code generated for this work flow;

the result of the simulated execution of the repair plan.

A partially completed Work flow with faults

Corresponding repair plan

About

Source code

Case study

Examples

Evaluation