About
Source code
Case study
Examples
Evaluation

About

Our approach aims in completing faulty work flow instances successfully using knowledge based planning methods. The application of such methods has the advantage to provide a correct and complete search procedure. In particular, such a planning system finds a successful completion of a work flow if one exists and enables the complete search for an optimal completion. We show how such a planning system can be constructed by exploiting planning system and a cautious extension of a given work flow specification.
On this web page you can find the information related to the paper "Model-based repair of web service processes" proposed for the ECAI 2008 conference. Here you can access the complete formalization of our approach, specification of the example used within the paper, and more than 1000 test randomly generated examples that prove the correctness of our approach.
About
Source code
Case study
Examples
Evaluation

Source code

The source code of repair reasoner module (the complete formalization of our approach) in dlv-format can be accessed here or its HTML-version can be found here.
About
Source code
Case study
Examples
Evaluation

Case study

As the case study the trip planning work flow was choosen (click on the picture to enlarge it).
The input to our example WF is a trip database, provided by the activity STARTFLOW which initializes also all other WF objects. An employee inputs trip data and a security agent immediately performs the activity short security advice returning standard security advice. Since the trip takes longer than a day, a hotel is booked and detailed security advice is generated. Then, the trip database is updated. Let us assume an exception is thrown after the trip data was stored (before ENDFLOW is executed) and a diagnosis indicates that the activity get dates is transiently faulty. The repair plan generated by our repair system for this case can be represented as following (click to enlarge):

This plans the sequence of repair actions which must be applied to the faulty work flow.
Time points 1 to 9 represent the original work flow execution. Which means that at the time point 9 a fault was detected and diagnosed.



























The repair begins at the time moment 10. Activity get_dates can be simply redone, because it does not read any object and was not blamed by the diagnoser as a permanently faulty activity.
The repair plan shows that at time moments 11 and 12 there is no need to redo activities get_destination and short_sec_advice - they were executed correctly and do not depend (by any data dependency) on any faulty activity or data produced by a faulty or a faulty executed activity. So, such objects as destination and security_advice can be simply reused.

The XOR choosing block used the state of the object destination which was produced by a faulty activity. Which means that now this XOR split must be redone. It is possible because the new correct state of the object duration was obtained at the time moment 10.

Redoing the XOR choosing block at the time point 13 results in the branching within the repair plan. Time points 14 till 18 correspond to the case when duration is lower that 1 day, and time points 20 till 26 correspond to the case when duration of the trip is greather than 1 day (and according to the work flow definition the hotel must be booked and the new detailed security advice obtained).


In the case when the trip will take not longer than a day, no hotel is needed and no detailed security advice must be obtained. But during the original work flow execution it was already done. That's why at time points 14, 15 and 17 these actions have to be compensated (which means to cancel the hotel reservation, delete the information about the security advice and restore the trip data). The data about the trip must be restored at the time point 17 in order to make the redoing of the store_trip_data activity possible - this activity reads and affects the trip_database which was incorrectly chaged during the original work flow execution at time moment 8.

In the case when the trip will take longer than a day, the hotel reservation must be done and detailed security advice must be obtained. These actions were originally done during the original work flow execution. But not all of them can be reused: the hotel was booked using the incorrect information about the duration of the trip. That's why this booking has to be canceled at the time point 20 and the new booking performed at time point 21.
But for obtaining the detailed security advice only destination and the short security advice were used. They both were not infected according to the data dependencies within the work flow. That's why there is no need to compensate detailed_sec_advice at time point 22.

The data about the trip must be retored because of the same reasons as it was done at time point 17.


As we see, executing this repair plan will eliminate several redundant repair actions (at time points 11,12,22 and 23) and will guarantee that at the end of the work flow execution (in both cases of the XOR split) all objects will have correct states.
About
Source code
Case study
Examples
Evaluation

Examples

In order to show the correctness of our approach we have tested it on the hunderds of randomly generated work flows. For each of them a diagnosis, an initial execution and states of objects were generated also randomly. The repair system was applied to generate a repair plan. The execution of the obtained repair plan was simulated using a work flow simulator. It simulates a privitive business logic behind each activity. The results (states of objects) were compared to the results obtained from the simulation of the normal work flow execution. As we see from the execution logs these results are equal. It proves the correctness of our approach.

The folder with the "trip planning" example and random-generated work flows can be accessed from here.
Each sub-folder contains:
  • the work flow model with diagnosis and initial state (partial execution);
  • the graphical representation of this work flow;
  • the repair plan generated using the reasoning module (see Source code );
  • the graphical representation of this work flow;
  • the result of simulated execution of this work flow - states of each objects at each time point;
  • the BPEL source code generated for this work flow;
  • the result of the simulated execution of the repair plan.
For instance (click on pictures to enlarge):
A partially completed Work flow with faults Corresponding repair plan
About
Source code
Case study
Examples
Evaluation

Evaluation

In order to evaluate the correctness of our approach we've performed a set of tests on the randomly generated work flows.
Each test differs on the:
  • amount of activities;
  • amount of objects;
  • amount of choosing blocks;
  • amount of faulty activities within the executed part of the work flow.
We show how repair planning time depends on each of these 4 dimensions.
The full table with evaluation results (in Excel format) can be accessed here.

1.The first set of test shows how the time required to calculate a repair plan depends on the amount of activities within the work flow. The table represents these results:

The corresponding work flows generated within this test are accessible here.

2.The second set of test shows how the time required to calculate a repair plan depends on the amount of objects within the work flow. The table represents these results:

The corresponding work flows generated within this test are accessible here.

3.The third set of test shows how the time required to calculate a repair plan depends on the amount of choosing blocks (XOR-splits) within the work flow. The table represents these results:

The corresponding work flows generated within this test are accessible here.

4.The fourth set of test shows how the time required to calculate a repair plan depends on the amount of infected activities within the executed part of the work flow. The table represents these results:

The corresponding work flows generated within this test are accessible here.


The results show that our approach is feasible for the medium sized orchestrated web service used in practice.