«Text Analysis for Requirements Engineering Leonid Kof Vollst¨ ndiger Abdruck der von der Fakult¨ t f¨ r Informatik der Technischen a au ...»
Institut f¨ r Informatik
der Technischen Universit¨ t M¨ nchen
Text Analysis for Requirements Engineering
Vollst¨ ndiger Abdruck der von der Fakult¨ t f¨ r Informatik der Technischen
Universit¨ t M¨ nchen zur Erlangung des akademischen Grades eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
Vorsitzender: Univ.-Prof. Nassir Navab, Ph. D.
Pr¨ fer der Dissertation:
1. Univ.-Prof. Dr. Dr. h.c. Manfred Broy
2. Univ.-Prof. Michael Beetz, Ph. D.
Die Dissertation wurde am 5.07.2005 bei der Technischen Universit¨ t M¨ nchen a u eingereicht und durch die Fakult¨ t f¨ r Informatik am 21.11.2005 angenommen.
au Kurzfassung Requirements Engineering ist die Achillesferse des gesamten Prozesses der Softwareentwicklung. Es erfordert Interaktion vieler Beteiligten und beinhaltet nicht nur technische, sondern auch soziologische und psychologische Aktivit¨ ten. Auch a wenn alle Beteiligte zu einem Konsens kommen, ist das resultierende Anforderungsdokument meist informell. In den fr¨ hen Projektphasen ist die Funktionu alit¨ t der zu erstellenden Software noch nicht genau genug verstanden. Das macht a den Prozess der Formalisierung der Anforderungen zu einem Lernprozess.
¨ Wie die Studie von Mich et al. [MFN04] zeigt, wird die uberwiegende Menge der Anforderungen in nat¨ rlicher Sprache geschrieben. In der Praxis sind solche u Dokumente meistens vage und enthalten viele Inkonsistenzen. Missverst¨ ndnisse a und Fehler aus der Requirements Engineering-Phase wirken sich in sp¨ teren a Projektphasen aus und k¨ nnen potentiell zum Misserfolg des gesamten Projekts o f¨ hren.
u Um Missverst¨ ndnisse in den Griff zu bekommen und den Schritt von ina formellen Anforderungen zu einem formalen Modell zu unterst¨ tzen, wird in u dieser Dissertation ein neuer Ansatz zur Extraktion der dom¨ nenspeziﬁschen Ona tologie aus Anforderungsdokumenten vorgeschlagen. Eine Ontologie besteht aus einer Menge von Termen und Relationen zwischen diesen Termen. Sie gibt
Requirements Engineering is the Achilles’ heel of the whole software development process. It involves many stakeholders and includes not only technical but also sociological and psychological activities. Even when all the stakeholders come to a consensus, the produced requirements are rather informal. In the early project phases the functionality of the prospective software is not yet understood in the precision necessary for formalization, which makes requirements formalization not only a reﬁnement, but also a learning process.
As the survey by Mich et al. [MFN04] shows, the overwhelming majority of requirements are written in natural language. In practice these documents are often vague and contain a lot of ambiguities, which causes misunderstandings between project stakeholders. Misunderstandings and errors of the requirements engineering phase propagate to later development phases and can potentially lead to a project failure.
To alleviate misunderstanding and to support the step from informal requirements to a formal model this thesis proposes a novel approach to the extraction of a domain ontology from requirements documents in order to establish a common language for the project stakeholders. An ontology consists of a set of terms and relations between these terms. As compared to a glossary, a domain-speciﬁc ontology gives a more explicit deﬁnition of terms and relations between them. When the ontology is extracted, a domain expert validates it. The validated ontology becomes both the common language for all the project stakeholders and a valuable resource for later development steps.
The thesis makes two key contributions to ontology extraction as a part of
• It implements a semiautomatic method, extracting an ontology from a requirements document and validating the extracted ontology.
• It shows how traditional requirements analysis process should be modiﬁed to include ontology extraction and validation.
The feasibility of the proposed approach was evaluated on three comprehensive case studies.
Acknowledgements This thesis was made possible by help and cooperation of many people. Here I want to use the opportunity to thank them. Without their help and support this thesis had never been written.
First of all, I want to thank Professor Manfred Broy who offered me a position at his chair. Discussions with colleagues in this great research group gave me a lot of helpful suggestions for my research. Furthermore, I want to thank Professor Manfred Broy for his support of research that does not belong to the research mainstream at the chair. Without his support and invaluable feedback I could never have carried out the research presented in this thesis.
My thanks also go to Professor Michael Beetz, who, when asked, readily agreed to participate at the dissertation committee. His extremely short review cycles and constructive reviews enabled almost on-the-ﬂy improvements in the ﬁnal stage of thesis writing. Fruitful discussions with him contributed a lot to the ﬁnal structuring of the thesis and to producing a clear line of argument.
Some phases of my work were really disappointing, as it seemed almost impossible to publish my ideas. It was Markus Pizka who encouraged me in such phases not to give up and gave valuable tips on how to write papers. I want to thank him here.
I am also obliged to the colleagues who agreed to review the almost ﬁnal versions of my dissertation. Daniel Ratiu, Tilman Seifert, Jorge Fox, David Cruz, and Stefan Wagner contributed a lot to polishing the thesis and improving understandability.
Case studies on different requirements documents were an important part of my work. It was really difﬁcult to ﬁnd documents suitable for case studies and at the same time not completely secret. I am really grateful to the people who provided ideas and documents for case studies: Alexander Pretschner, Franz Huber, Jan Philipps, Markus Pister, Jewgenij Botaschanjan, Andreas Fleischmann, and Brian Berenbach. Without their case study ideas this thesis could not have been completed.
The case studies gave rise to the questions whether the documents prepared for automated text analysis are still human readable. Some of my helpful colleagues agreed to read and evaluate different versions of different documents.
For this painstaking work I want to thank Stefan Berghofer, Martin Deubler, Norbert Diernhofer, Ulrike Hammerschall, Jan J¨ rjens, Michael Meisinger, Jan u Philipps, Yuri Riabov, Maurice Schoenmakers, Tilman Seifert, Oscar Slotosch, Martin Strecker, Stefan Wagner, Martin Wildmoser, Guido Wimmel, and Alexander Ziegler.
The tools that I used during my research were all research tools, some of them not available off-the-shelf. It is the cooperation by the tool authors that made certain parts of my work possible. I am deeply grateful for this cooperation to Helmut Schmid, Sabine Schulte im Walde, David Faure, Claire N´ dellec, Alaine Pierre Manine, Philipp Cimiano, Johanna V¨ lker, Thomas B¨ chner, Tobias Hain, o u Alexander Klitni and Armand Wendt. I had solely e-mail contact with most of these people, which makes their readiness to cooperate even more worthily.
I also owe a special thank to Barbara Kalter, who helped a lot with the formalities of the dissertation submission and diminished my chaotic tendencies.
Last but not least, I want to thank my friends and my parents for their continuous support and encouragement during the work.
6.3.1 Application of the Extraction Technique to German.... 141 6.3.2 Potential Improvement of the Extraction Technique.... 142
6.4 Perspective: Enterprise Ontology.................. 147
CONTENTSList of Figures
6.1 Ontology Building Procedure, as presented in the thesis...... 144
6.2 Integrated Ontology Extraction Approach............. 145 Chapter 1 Introduction Construction of software systems is a non-trivial and error-prone task. In spite of the general understanding which steps are necessary in the development process (requirements engineering, architecture design, etc.), proper execution of these steps remains problematic. This problem becomes especially acute when constructing large software systems.
The understanding of the fact that development of large software systems requires a systematic approach, as opposed to ad-hoc programming, gave rise to the research ﬁeld of software engineering. Software engineering is
1. The application of a systematic, disciplined, quantiﬁable approach to the development, operation, and maintenance of software; that is, the application of engineering to software.
2. The study of approaches as in (1).
(IEEE Standard 610-1990, see also [IEE05]). Software engineering traditionally subdivides the software development process in several phases, such as requirements engineering, design, implementation, and testing.
Although it makes no sense to say which phase is more important, it is rather obvious that requirements engineering is a crucial one: errors made in the requirements engineering phase propagate to all the later stages. For this reason correction of requirements engineering errors is also extremely expensive: according to Boehm [Les05], the cost of the error correction increases by the factor of 10 when the error is detected in a later project phase. Thus, a correction of a requirements engineering error in the design phase is 10 times more expensive than a direct correction in the requirements engineering phase, and the correction in the implementation phase is even 100 times more expensive. The later in the development process the error is detected, the higher the correction cost.
CHAPTER 1. INTRODUCTION
Zave [Zav97] deﬁnes requirements engineering as
“... the branch of software engineering concerned with the real-world goals for functions of and constraints on software systems. It is also concerned with the relationship of these factors to precise speciﬁcations of software behavior, and to their evolution over time and across software families.” Requirements engineering process poses manifold challenges, because it involves not only technical, but also psychological and sociological aspects, such as interaction of different stakeholders and requirements negotiation. As Jackson states, requirements engineering is “where informal meets formal” (cited after Berry [Ber03]). Supporting the step from informal to formal is one of the goals of the presented work.
The result of the early requirements engineering phases, namely requirements elicitation and negotiation, is a requirements document. As the survey by Mich et al. [MFN04] shows, the overwhelming majority of requirements are written in natural language. Practice shows that the natural language requirements documents mostly contain plenty of inconsistencies. In the requirements engineering phase it is vital to detect these inconsistencies and at least to establish an inconsistencyfree common project language.
One of the possible deﬁnitions of a common project language would be a glossary of domain-speciﬁc terms. A glossary gives an informal natural language deﬁnition for each term. However, such deﬁnitions still leave room for interpretations.
(An example of a possible misinterpretation will be given later, see Section 1.1.3.) A better, more explicit, term deﬁnition is an ontology. Contrary to the glossary, which is basically a plain term list, an ontology contains explicit relations between concepts. This thesis proposes ontology engineering as a promising way to deﬁne terms speciﬁc to the application domain. Wikipedia, the free encyclopedia,
deﬁnes an ontology in the following way [Wik05b]:
In computer science an ontology is an attempt to formulate an exhaustive and rigorous conceptual schema within a given domain, a typically hierarchical data structure containing all the relevant entities and their relationships and rules (theorems, regulations) within that domain.
The goal of the presented thesis is to build an application domain ontology on the basis of requirements documents.
The deﬁnition of an ontology is a ﬁrst step towards a uniform project language.
In order that the ontology can be really used as a common project language, it must be validated. Validation means in this context that an application domain expert approves the extracted terms and associations. The validation of the constructed ontology can take place in two ways: either via manual validation by a domain expert or via building an initial system model on the ontology basis and validating the model. The validated ontology then becomes the common language for all the project stakeholders. Furthermore, ontology validation indirectly contributes to the validation of the requirements document.
Ontology extraction, as proposed in this thesis, is based on the following scenario:
1. Requirements engineering starts mostly with rather vague ideas, and with different stakeholders having different ideas about the prospective project.
Then, the goals of the different stakeholders are discussed and goal conﬂicts are detected. The conﬂicts must be negotiated and eliminated. The ﬁnal result of this elicitation stage is a requirements document, agreed upon by all the stakeholders.
2. An ontology is extracted from this document. The process of ontology extraction consists of three steps:
(a) term extraction (glossary extraction) (b) term classiﬁcation, building of the term hierarchy (c) relation extraction The second and the third step are interactive and give the requirements analyst feedback on terminology inconsistencies. It is important to eliminate these inconsistencies before they ﬁnd their way into the ontology (the requirements engineering process goes back to the step of requirements elicitation, negotiation, and writing). This interactive process of ontology extraction and document correction has the invaluable side-effect of validating the terminology that will be used in later project phases.