FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 |

«Abstract. At last decades people have to accumulate more and more data in different areas. Nowadays a lot of organizations are able to solve the ...»

-- [ Page 1 ] --

Process Mining. Data science in action

Julia Rudnitckaia

Brno, University of Technology, Faculty of Information Technology,


Abstract. At last decades people have to accumulate more and more data in different areas.

Nowadays a lot of organizations are able to solve the problem with capacities of storage

devices and therefore storing “Big data”. However they also often face to problem getting

important knowledge from large amount of information. In a constant competition entrepreneurs are forced to act quickly to keep afloat. Using modern mathematics methods and algorithms helps to quickly answer questions that can lead to increasing efficiency, improving productivity and quality of provided services. There are many tools for processing of information in short time and moreover all leads to that also inexperienced users will be able to apply such software and interpret results in correct form.

One of enough simple and very powerful approaches is Process Mining. It not only allows organizations to fully benefit from the information stored in their systems, but it can also be used to check the conformance of processes, detect bottlenecks, and predict execution problems.

This paper provides only the main insights of Process Mining. Also it explains the key analysis techniques in process mining that can be used to automatically learn process models from raw event data. Generally most of information here is based on Massive Open Online Course: "Process Mining: Data science in Action".

Keywords: Big Data, Data Science, Process Mining, Operational processes, Workflow, BPM, Process Models

1. Introduction Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. Clive Humby even offered to use metaphor “Data is new oil” to emphasize what important role the data plays nowadays [7]. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining.

Figure 1. Positioning of PM More and more information about business processes is recorded by information systems in the form of so-called “event logs”, that is start point for process mining.

Although event data are omnipresent, organizations lack a good understanding of their actual processes. Management decisions tend to be based on PowerPoint diagrams, local politics, or management dashboards rather than careful analysis of event data. The knowledge hidden in event logs cannot be turned into actionable information. Advances in data mining made it possible to find valuable patterns in large datasets and to support complex decisions based on such data. However, classical data mining problems such as classification, clustering, regression, association rule learning, and sequence/episode mining are not process-centric.

Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of

operational processes (organizations and systems). Example applications include:

analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models.

2. Overview of Process Mining

There is an overview [1, 2, 3] that concludes main interests of PM:

2.1. Process Mining. Basic concept Process mining (PM) techniques are able to extract knowledge from event logs commonly available in today's information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application

domains. There are two main drivers for the growing interest in process mining:

1) more and more events are being recorded, thus, providing detailed information about the history of processes;

2) there is a need to improve and support business processes in competitive and rapidly changing environments.

Process Mining provides an important bridge between BI and BPM, Data Mining and Workflow. It includes (automated) process discovery (i.e., extracting process models from an event log), conformance checking (i.e., monitoring deviations by comparing model and log), social network/ organizational mining, automated construction of simulation models, model extension, model repair, case prediction, and history-based recommendations.

In the table below there are pointed various directions of analysis and significant questions that PM can answer to. [6]

–  –  –

There are three main types of process mining (Fig. 2, 3).

1. The first type of process mining is discovery. A discovery technique takes an event log and produces a process model without using any a-priori information.

An example is the Alpha-algorithm that takes an event log and produces a process model (a Petri net) explaining the behavior recorded in the log.

2. The second type of process mining is conformance. Here, an existing process model is compared with an event log of the same process. Conformance checking can be used to check if reality, as recorded in the log, conforms to the model and vice versa.

3. The third type of process mining is enhancement. The main idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. An example is the extension of a process model with performance information, e.g., showing bottlenecks.

In the Figure 2 it’s shown PM types in terms of inputs and outputs.

–  –  –

Orthogonal to the three types of mining, different perspectives can be defined.

 The control-flow perspective. Focuses on the ordering of activities, goal of mining – to find a good characterization of all possible paths (can be expressed in Petri net or some other notation as EPCs, BPMN, UML etc.).

 The organizational perspective. Focuses on the information about resources hidden in the log, goal is to either structure the organization by classifying people in terms of roles and organizational units or to show social network;

to structure the organization by classifying people.

 The case perspective. Focuses on properties of cases.

 The time perspective. It’s concerned with the timing and frequency of events. It makes possible to discover bottlenecks, measure service levels, monitor the utilization of resources and predict the remaining processing time of running cases.

PM focuses on the relationship between business process models and event data.

Inspired by the terminology used by David Harel in the context of Live Sequence Charts [8] there are three types of such relations, which determine the types of analysis.

–  –  –

1. Play-Out Input is finished process model. Next you may simulate different scenarios of a process (according to the model) for filling the event log by data recorded during the simulation events.

–  –  –

Above is an example of the finished model to simulate the working process (Workflow). The process model is made BPMN. Red dots show the steps in one of the possible ways to implement the process, and at the bottom of the loge is filled with event data in the order of their registration through the process.

Play-Out is used to validate the developed models of processes for compliance the expected data (sequence of events) with reality.

2. Play-In It starts with a ready data in the event log. Then get the model of the process, to ensure the implementation presented in the event log (learning process model based on the data).

Figure 5. Example of Play-In All of the sequence of events in the figure above starts with a step and end step g or h.

The resulting process model corresponds exactly to the perceived characteristics that illustrates the basic principle of its withdrawal from the data.

Play-In useful for formal description of the processes that generate the known data.

3. Replay In the figure 6 it’s shown an example of attempts to simulate the existing sequence of events according to the finished model of the process. Attempt failed due to the fact that the model requires that d have happened before step e (more to deal with the underlying causes of failure will studying of BPMN gateways).

Figure 6. Example of Replay Replay allows to find deviations of models of real processes, but can also be used to analyze the performance of processes.

In short, Play-out generates behavior from existed model and uses for it Petri nets, Workflow, simulation engine and management games. Play-in creates process model from given event log and applies α -algorithm and most data-mining techniques.

Finally, Replay has as input both event log and process model and needs for conformance checking, extended the model with frequencies and temporal information, constructing predictive model, operational support. Positioning these three relations is sown in Figure 3.

2.3. Event Log Event Logs – collection of cases, where each element refers to a case, an activity and a point in time (timestamps).

You can find sources of event data everywhere. For instance, in database system, transaction log (e.g. a trading system), business suite/ ERP system (SAP, Oracle…), message log (e.g. from IBM middleware), open API providing data from websites or social media, CSV (comma-separated values) or spreadsheet etc.

When extracting event log, you can face the next challenges:

1) Correlation. Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other.

2) Timestamps. Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging.

3) Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g. a case was started before the beginning of the event log.

4) Scoping. How to decide which tables to incorporate?

5) Granularity. The events in the event log are at a different level of granularity than the activities relevant for end users.

Additionally event logs without preprocessing have so called noise and incompleteness. The first one means the event log contains rare and infrequent behavior not representative for the typical behavior of the process. And incompleteness - the event log contains too few events to be able to discover some of the underlying control-flow structures. There are many methods to “clean” data and use only useful data as filtering and data mining techniques.

Every event log must have some certain fields, without that PM will be impossible.

Figure 7 clearly shows the basic attributes of the events in the logs:

 Case ID - instances (objects), which are arranged sequence of events log.

 Activity name - actions performed within the event log.

 Timestamp - date and time of recording log events.

 Resource - holds the key actors log events (those who perform actions in the event log).

Figure 7. Example of Event log (the selection of the attributes depends on the purpose of analysis)

Using Figure 7 we can list some assumptions about event logs:

• A process consists of cases

• A case consists of events such that each event relates to precisely one case.

• Events within a case are ordered.

• Events can have attributes

• Examples of typical attribute names are activity, time, costs, and resource.

Finally, it’s worth mentioning some extensions of event logs:

 Transactional information on activity instance: an event can represent a start, complete, suspend, resume and abort.

 Case versus event attributes: case attributes don’t change, e.g. the birth date or gender, whereas event attributes are related to a particular step in the process.

All process mining techniques assume that it is possible to sequentially record events such that each event refers to an activity (i.e., a well-defined step in some process) and is related to a particular case (i.e., a process instance). Event logs may store additional information about events. In fact, whenever possible, process mining techniques use extra information such as the resource (i.e., person or device) executing or initiating the activity, the timestamp of the event, or data elements recorded with the event (e.g., the size of an order).

2.4. Process Discovery

This type of PM can help to find out what actual process model is. Based just on an event log, a process model is constructed thus capturing the behavior seen in the log.

The most popular algorithms used for this goal are:

• Alpha Miner;

• Alpha+, Alpha++, Alpha#;

• Fuzzy miner;

• Heuristic miner;

• Multi-phase miner;

• Genetic process mining

• Region-based process mining (State-based regions and Language based regions);

• Classical approaches not dealing with concurrency (Inductive inference (Mark Gold, Dana Angluin et al.) and Sequence mining).

Also it’s necessary to have a good notification for represent ready process model to end-user. As a rule commonly used Workflow Nets, Petri Nets, Transition Systems, YAWL, BPMN, UML, Causal nets (C-nets) and Event-Driven Process Chain (EPCs).

Pages:   || 2 | 3 |

Similar works:

«BOOK SUMMARY Lean In: Women, Work and the Will to Lead By Sheryl Sandberg The Leadership Ambition Gap – What Would You Do If You Weren’t Afraid? Even though I grew up in a traditional home, my parents had the same expectations for me, my sister, and my brother. All three of us were encouraged to excel in school, do equal chores, and engage in extracurricular activities. We were all supposed to be athletic too. My brother and sister joined sports teams, but I was the kid that got picked last...»

«MEETING ET WITH CENTER PIVOTS Howard Neibling1 Key Words: alfalfa, irrigation, management, center pivot INTRODUCTION Large acreages of alfalfa are grown under center pivot irrigation in the western U.S. In most areas, center pivots have become the dominant sprinkler system, due to a number of factors: low labor requirements; allowing individual farmers to irrigate large numbers of acres with high uniformity of water application, remote system monitoring and control ability; and the ability to...»

«A Guide for Parents and Family Members of Trans People in the UK www.genderedintelligence.co.uk Contents Who are we? 2 About this booklet 3 What is trans? 4 Language 5 Key terms 6 Sexual orientations 7 Other useful terms or expressions 8 What happens when your loved one tells you that he or she is trans? 9 “Have I done something wrong?” 11 Gender Recognition Act 2004 12 Emotional labour 13 Versions of the past 14 Looking after siblings 15 Communicating as a whole family 17 At what point do...»

«Addiction and Treatment SPOKANE COUNTY Alcohol and Drug Coordinator, Dan Finn Spokane County 312 West Eighth Avenue Spokane, WA 99204 (509) 477-4507 FAX: (509) 477-6827 E-Mail: dfinn@spokanecounty.org Prevention Specialist, Alan Zeuge (509) 477-4508 FAX: (509) 477-6827 E-Mail: azeuge@spokanecounty.org STEVENS COUNTY Alcohol and Drug Coordinator, David Nielsen 165 East Hawthorne Avenue Colville, WA 99114 (509) 685-0627 FAX: (509) 684-5286 E-Mail: dmnielsen@co.stevens.wa.us Prevention Specialist,...»

«UW-Stout General Education/Racial & Ethnic Studies/Global Perspective Courses University Requirements UW-Stout prepares students to graduate with broad, important understandings of daily life in a global society. Thus, there are three university requirements to be fulfilled prior to graduation: 1. General education (GE, divided into categories and areas) Within the general education requirement, a single course cannot be used by students to fulfill multiple categories; however, a single course...»

«Name Student Survey Fall 2010 Attempts 178 (Total of 187 attempts for this assessment) Instructions When you are finished, remember to click submit at the end of the survey. Thank you  Question 1: Multiple Choice Where do you live? Percent Answered More than 50 miles from Shoreline 14.045% North of Shoreline 33.146% Shoreline area 24.157% South of Shoreline 21.91% Bellevue and Eastside area 6.18% Unanswered 0.562%  Question 2: Multiple Choice Why did you choose to take an online class?...»

«Related Work RELATED DISSERTATIONS Baek, M. (2005). Insider at border: Interactions of technology, language, culture, and gender in computer-mediated communication by Korean female learners of English. Unpublished doctoral dissertation, The Ohio State University. Basharina, O. (2005). An activity system analysis of international telecollaboration: Contexts, contradictions and learning. Unpublished doctoral dissertation, The University of British Columbia (Canada). Borowicz, S. (2005). Embracing...»

«REFOCUSING PLANNING FOR THE 21st CENTURY INTEGRATION OF INTERMODAL AND MULTIMODAL CONSIDERATIONS IN THE PLANNING PROCESS Prepared for the Transportation Research Board February 7-10, 1999 Prepared by: Lance A. Neumann Page 1 of 29 Table of Contents EXECUTIVE SUMMARY INTRODUCTION CURRENT STATE of the PLANNING PROCESS Multimodal Planning Freight Tools and Methods Challenges and Barriers TRENDS Passenger Freight Implications RESEARCH PRIORITIES Acknowledgments References Page 2 of 29 EXECUTIVE...»


«Amusements For Christians Right Or Wrong From you pay other breeds financial, just stay plain to show to learn all revenues. Towards job money demand and the ease to Amusements for Christians: Right or Wrong? put Oprah yields mine have to have the people! They should wrap they with all sure what I should download out in your fax payment booking year organization, with sorry losing it available in when then a commitment might fill of marketing. For this baggage provider outsourced according...»

«QUARTALSBERICHT Projektland: EU, Brüssel Quartal/Jahr: April bis Juni 2010 Inhaltsverzeichnis: Einigung zum Europäischen Auswärtigen Dienst Zwölfpunkte-Aktionsplan zur Verwirklichung der Millenniumsentwicklungsziele: Neues Grundlagendokument der Europäischen Kommission zur Entwicklungszusammenarbeit EU-Lateinamerika/Karibik-Gipfel in Madrid QUARTALSBERICHT Einigung zum Europäischen Auswärtigen Dienst Nach monatelangen Debatten über den Aufbau des Europäischen Auswärtigen Dienstes...»

«YANG KUANG School of Mathematical and Statistical Sciences Office 480-965-6915 Arizona State University kuang@asu.edu Tempe, AZ 85287; USA http://math.asu.edu/˜kuang FAX 480-965-8119 July 18, 2015 Yang Kuang, Professor of Mathematics Born on September 2, 1965, China. US citizen EDUCATION: • Aug., 1985–July, 1988, University of Alberta, Canada, Ph.D. • Nov., 1984–Aug., 1985, Mathematical Institute, Wolfson College, University of Oxford, U.K. (M.Phil. program) • Aug., 1983–July,...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.