«Abstract. Humans, in order to create, share and improve knowledge on business processes, need a common, readable and preferably visual notation. In ...»
A Domain Specific Language for Digital Libraries’
João Edmundo, José Borbinha
INESC-ID, Rua Alves Redol 9, Apartado 13069,
1000-029 Lisboa, Portugal
IST – Department of Information Science and Engineering, Instituto Superior Técnico,
Lisbon Technical University, Portugal
Abstract. Humans, in order to create, share and improve knowledge on
business processes, need a common, readable and preferably visual notation. In
addition, since Internet is more and more a platform for global application sharing, the requirements for Web applications concerning workflow execution, interaction, aesthetics and Web service integration are steadily increasing. In this paper we propose a Domain Specific Language, implemented as an extension of the standard BPMN 2.0 language specifically to the domain of digital libraries’ interoperability, and use it to define and execute data harvest processes. To support it, we designed an architecture and implemented a real case of a computational environment based on the jBPM and GWT frameworks.
This approach allowed us to provide a web-based, natural and flexible way not only to create and manage domain specific processes, but also to monitor each process execution, allowing process managers to understand the process history, its current state and possible future execution. In our opinion, creating specific languages for domains, and putting activity of interest in its visualized context, makes the user knowledge more comprehensive.
Keywords: Process Orchestration; BPMN 2.0; Specialization; Workflow Infrastructure; Domain Specific Language;
1 Introduction Libraries, archives, museums and other cultural heritage organizations face the need to share their resource description metadata in international initiatives such as Europeana1, TEL2 and EuDML3. In these scenarios, having a system not supporting natively the commonly required OAI-PMH4 protocol is an important constrain.
Commercial and open-source solutions for this problem exist, but the first imply investments not always possible, and using open-source software might require some Europeana - http://dev.europeana.eu/ TEL - The European Library- http://www.theeuropeanlibrary.org EuDML- European Digital Mathematics Library- http://www.eudml.eu/ OAI-PMH - Protocol for Metadata Harvesting - http://www.openarchives.org/pmh/ 2 João Edmundo, José Borbinha technical expertise for local customization, often not found in the staff of those organizations. Also, new emerging scenarios for transfer not only of data sets but also the contents referenced by these data sets (for example, the harvesting of the full-text of the documents described in the data sets) require the support for more sophisticated harvesting and aggregation processes.
REPOX  is an open-source framework to address that problem. It was designed and developed for convenient usage by both intended data providers and service providers, requiring little technical knowledge and effort, thus supporting a fast start process (installation and configuration). It is focused on common metadata interoperability scenarios, offering not only the publication and harvesting, but also support for metadata transformation. Consequently, REPOX is also a convenient tool for service providers. However, the initial releases of REPOX only provide metadata harvesting services that are used within built-in processes. These processes cannot be shared between REPOX installations neither edited nor extended (to harvest objects’ contents, for example) without programming knowledge. So, a new system’s design is necessary, for which not also new frameworks to develop web applications and interfaces must be explored, but also find new ways to define and orchestrate these harvesting processes.
Overall, this research seeks to develop a solution to orchestrate business processes using a Domain Specific Language (DSL) based on the BPMN 2.0, in scenarios of data aggregation and systems interoperability in digital libraries.
2 Related Work Many people consider Business Process Management (BPM) to be the “next step” after the workﬂow wave of the nineties. BPM is a management approach focused on aligning all aspects of an organization with the wants and needs of clients. It is an approach that promotes business effectiveness and efficiency while striving for innovation, flexibility, and integration with technology. As a mean to visually represent BPM, the Business Process Modeling Notation (BPMN) was created which is based on a flowcharting technique very similar to activity diagrams from UML, tailored for creating graphical models of business process operations. Thus, BPMN creates a standardized bridge for the gap between the business process design and process implementation . The current version of BPMN specification is 2.05. It not only defines a standard on how to graphically represent a business process like BPMN 1.x, but also includes execution semantics for the elements defined, and an XML format on how to store process definitions.
Flexible and innovative business processes are one of the key elements that enable modern organizations to succeed. According to Janis Barzdins et al. , and Marjan Mernik et al. , there is a growing need to consider new issues when implementing tools for domain specific languages with an orientation to the business process management. Although general languages can cover a large variety of cases, they add unnecessary complexity to specialized systems . Therefore, specific languages for narrow business domains are required. As a result, some solutions presented by Steen BPMN 2.0 - http://www.omg.org/spec/BPMN/2.0/ A Domain Specific Language for Digital Libraries’ Interoperability 3 Brahe et al.  and Momotko  show a set of guidelines used when defining a DSL, and present techniques for DSL creation based on BPMN, through the use of colors and custom icons.
To find a suitable approach to define and execute our DSL, we studied some workflow technologies. Although several commercial products like Microsoft’s BizzTalk6 and IBM’s WebSphere  are widely used in the BPM domain, some opensource solutions also start to appear. These solutions like jBPM 7 and Activiti use BPMN 2.08 as the core process definition and execution language to their process engine, which gives them the flexibility to define complex processes. However, they still lack the process execution monitoring interfaces and modeling input interfaces, which are important issues for process managers  .
Finally, to create our web framework for process orchestration we searched for the right tool for the job. After comparing some web development frameworks like Prototype+script.aculo.us, jQuery, ExtJS, MooTools, Dojo, Google Web Toolkit (GWT)  , and ZK, we concluded that GWT, though it has a medium learning curve and ease of use, it provides a high performance and extensible framework to build complex UI interactions, eventually being the more fitting to our problem.
3 Proposed DSL After analyzing the goals and requirements established for a digital library’s interoperability and aggregation system in the Europeana Libraries project , the knowledge obtained while developing a new web visual interface for the REPOX 2.0, and using some DSL definition principles proposed by Steen Brahe et al. , we were able to design our DSL.
To characterize each concept within the DSL, a set of standard visual representations were used, and will be explained in this chapter.
These entities, which represent a piece of data or a group of pieces of data with a unique semantic definition (Table 1), allow the exchange of information between the DSL tasks described further in this section. Next, in Table 2, some operations/actions are represented using some standard visual symbols.
Table 2. Description of Operations/Actions in the DSL.
Finally, the Technology concepts (names of the protocols applied in data harvest) used in this DSL are visually represented by their own written name since they don’t have a standard symbol that represents them: OAI, Z39.50 and Folder.
These previously described concepts lead to the visual representation of our tasks in the DSL, presented in the next section.
After describing the main concepts of the domain, defining the information entities, and choosing the visual representation of the concepts in our DSL, a set of tasks are proposed. We start by presenting the tasks related to Data Providers, in Table 3.The set of tasks regarding the Data Sources is proposed in Table 4. Finally Table 5 shows the tasks for the Data Records.
Finally, in order to define and execute processes using this DSL, we developed a BPMN 2.0-based architecture which will be described in the next section.
4 Implementation and Solution In this section we describe a possible solution to create an extensible architecture that enables the orchestration of business processes based on the DSL described in the previous section. Additionally, we explain how the development process was managed and why some of the choices were made.
4.1 Process Orchestration Architecture After some analyses we decided to use the jBPM framework due to, among other features, its capability of running BPMN 2.0 defined processes and ease of extension to add new types of tasks. Also, the development of a web application was possible 6 João Edmundo, José Borbinha
through the GWT, Ext GWT9 and a GWT-SVG library developed by us. The following Figure 1 represents the defined architecture for the process orchestrator.
The core of our Process Engine is the Process Orchestrator, composed by a Process Manager that launches new processes in the jBPM engine and the Process State Manager which monitors the state of each process (Figure 1). Both these managers share the access to a common list of processes, managed by the Process Planning.
Each process consists in an orchestration of a set of Web Services that are registered within the jBPM. The Process Orchestrator is constantly monitoring all the running processes. The Process Orchestrator provides an interface so that new processes can be added. These processes can be defined visually using the DSL based on the BPMN
2.0 notation, which is then coded in XML, according to the format defined by the OMG – Open Management Group.
The web application is composed by a client side containing a Process Editor that supports the visual definition of new processes, and by a Process Instance Viewer that allows the runtime monitoring of each process instance. The processes defined on the client side are persisted in an extended BPMN 2.0, through the Process Definitions Manager. Initialized process instances are started through the Process Orchestrator, called by the Process Instances Manager.
Ext GWT - http://www.sencha.com/products/extgwt/ A Domain Specific Language for Digital Libraries’ Interoperability 7 Figure 1. Process Engine architecture.
This architecture allowed us to create an extensible web process framework that enables process orchestration from its definition to execution and runtime monitoring.
4.2 BPMN 2.0 Extension Being this DSL an extension of BPMN 2.0, we started by choosing which BPMN
2.0 main components we should use to help define our processes. As a result, and according to Michael Muehlen et al  analysis on usage of BPMN on different areas, we noticed that the Sequence Flow, the Parallel Gateway, and the Start and End Events were the most popular BPMN components used by BPMN modelers, so we decided to include them to our created components (Section 3).
In order to create a DSL that would be executable in the jBPM engine we developed a BPMN 2.0 extended language based on the BPMN 2.0 semantic definition10. In addition to the BPMN 2.0 data, all our process components have position and state, used for process runtime monitoring, and additional input data for each task in the process.
To define a process, the user is presented with a standard BPMN editor interface with the drag and drop grid in the middle, the available tasks on the left side, and on the right side the selected task’s properties. On the properties panel, our solution adds BPMN 2.0 XSD- http://www.omg.org/spec/BPMN/2.0/20090502/Semantic.xsd 8 João Edmundo, José Borbinha a customized input interface for each task type, granting the user the capability of specifying the input of each task while defining the process.
To successful monitor more than one process we present a table interface where the user can see all running process instances and their state of execution through each task’s state. Therefore, to represent the current state of a task within a process, colors and stroke patterns were used (Figure 2). Such approach enables process managers to check quickly what the current status of the process is, improving their monitoring performance .
Figure 2. State color representation.
A more detailed view of each process instance can be accessed, where the user is presented with a single process and its log data, organized in a chronological manner, presenting a more complete analysis of what’s happening in the process.
Overall, the proposed solution for our process engine uses the jBPM and GWT frameworks. The first one is used to run our extended BPMN 2.0 processes that represent the Proposed DSL described in section 3, and are defined through a BPMN
2.0 XML schema and visual notation extension. The GWT framework is separated in a server side that communicates with the jBPM and therefore manages process execution and perseverance. On the other hand, the GWT’s client side manages the interfaces used for process modeling (where some BPMN 2.0 base components are used and several interface techniques for custom input during modeling are applied), process definition and instance management and process runtime monitoring (made more efficient through a color and stroke pattern system for each component state representation.