FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 29 |

«Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy Department of Computer Science University of Sheffield, ...»

-- [ Page 1 ] --

Open-Domain Question Answering

Mark Andrew Greenwood

Submitted in partial fulfilment of the requirements

for the degree of Doctor of Philosophy

Department of Computer Science

University of Sheffield, UK

September 2005

Dedicated to the memory of

David Lowndes (1979-2003)

“Life is hard somehow”

C & R Macdonald (1999)

Table of Contents

Preface xi

I What Is Question Answering?

1 Question Answering: An Overview 3

1.1 Questions.................................. 3

1.2 Answers................................... 5

1.3 The Process of Question Answering.................... 5

1.4 The Challenges of Question Answering.................. 6

1.5 Thesis Aims and Objectives........................ 7

1.6 Thesis Structure............................... 9 2 A Brief History of Question Answering 11

2.1 Natural Language Database Systems.................... 11

2.2 Dialogue Systems.............................. 12

2.3 Reading Comprehension Systems..................... 15

2.4 Open-Domain Factoid Question Answering................ 18

2.5 Definition Questions............................ 20 3 Evaluating Question Answering Systems 23

3.1 End-to-End Evaluation........................... 23 3.1.1 Factoid Questions.......................... 24 3.1.2 List Questions........................... 26 3.1.3 Definition Questions........................ 27

3.2 Evaluating IR Systems for Question Answering.....

–  –  –

12.1 Indirect evaluation of phrase removal techniques.............. 138 D.1 Summary of TREC 2003 factoid performance................ 166 E.1 Summary of TREC 2004 factoid performance................ 171 F.1 Examples of combining questions and their associated target........ 176 F.2 Summary of TREC 2005 factoid performance................ 177

–  –  –

9.1 AnswerFinder: an open-domain factoid question answering system.... 111

9.2 Comparison of AnswerBus, AnswerFinder, IONAUT, and PowerAnswer.. 113

–  –  –

Question answering aims to develop techniques that can go beyond the retrieval of relevant documents in order to return exact answers to natural language questions, such as “How tall is the Eiffel Tower?”, “Which cities have a subway system?”, and “Who is Alberto Tomba?”. Answering natural language questions requires more complex processing of text than employed by current information retrieval systems. A number of question answering systems have been developed which are capable of carrying out the processing required to achieve high levels of accuracy. However, little work has been reported on techniques for quickly finding exact answers.

This thesis investigates a number of novel techniques for performing open-domain question answering. Investigated techniques include: manual and automatically constructed question analysers, document retrieval specifically for question answering, semantic type answer extraction, answer extraction via automatically acquired surface matching text patterns, principled target processing combined with document retrieval for definition questions, and various approaches to sentence simplification which aid in the generation of concise definitions.

The novel techniques in this thesis are combined to create two end-to-end question answering systems which allow answers to be found quickly. AnswerFinder answers factoid questions such as “When was Mozart born?”, whilst Varro builds definitions for terms such as “aspirin”, “Aaron Copland”, and “golden parachute”. Both systems allow users to find answers to their questions using web documents retrieved by Google™. Together these two systems demonstrate that the techniques developed in this thesis can be successfully used to provide quick effective open-domain question answering.

–  –  –

Since before the dawn of language humans have hungered after knowledge. We have explored the world around us, asking questions about what we can see and feel. As time progressed we became more and more interested in acquiring knowledge; constructing libraries to hold a permanent record of our ever expanding knowledge and founding schools and universities to teach each new generation things their forefathers could never have imagined. From the walls of caves to papyrus, from clay tablets to the finest parchment we have recorded our thoughts and experiences for others to share. With modern computer technology it is now easier to access that information than at any point in the history of human civilization.

When the World Wide Web (WWW) exploded on to the scene, during the late 80’s and early 90’s, it allowed access to a vast amount of predominately unstructured electronic documents. Effective search engines were rapidly developed to allow a user to find a ‘needle’ in this ‘electronic haystack’.

The continued increase in the amount of electronic information available shows no sign of abating, with the WWW effectively tripling in size between the years 2000 and 2003 to approximately 167 terabytes of information (Lyman and Varian, 2003). Although modern search engines are able to cope with this volume of text, they are most useful when a query returns only a handful of documents which the user can then quickly read to find the information they are looking for. It is, however, becoming more and more the case that giving a simple query to a modern search engine will result in hundreds if not thousands of documents being returned; more than can possibly be searched by hand – even ten documents is often too many for the time people have available to find the information they are looking for. Clearly a new approach is needed to allow easier and more focused access to this vast store of information.

With this explosive growth in the number of available electronic documents we are entering an age where effective question answering technology will become an essential part of everyday life. In an ideal world a user could ask a question such as “What is the state flower of Hawaii?”, “Who was Aaron Copland?” or “How do you cook a Christmas Pudding?”, and instead of being presented with a list of possibly relevant documents, question answering technology would simply return the answer or answers to the questions, with a link back to the most relevant documents for those users who want further information or explanation.



xii The Gigablast1 web search engine has started to move towards question answering with the introduction of what it refers to as Giga bits – essentially these Giga bits are concepts which are related to the user’s search query. For example, in response to the search query “Who invented the barometer?” Gigablast, as well as returning possibly relevant documents, lists a number of concepts which it believes may answer the question. The first five of these (along with a confidence level) are Torricelli (80%), mercury barometer (64%), Aneroid Barometer (63%), Italian physicist Evangelista Torricelli (54%) and 1643 (45%). Whilst the first Giga bit is indeed the correct answer to the question it is clear that many of the other concepts are not even of the correct semantic type to be answers to the question. Selecting one of these Giga bits does not result in a single document justifying the answer but rather adds the concept to the original search query in the hope that the documents retrieved will be relevant to both the question and answer. While this approach seems to be a step in the right direction, it is unclear how far using related concepts can move towards full question answering.

One recent addition to the set of available question answering systems, aimed squarely at the average web user, is BrainBoost2. BrainBoost presents short sentences as answers to questions; although like most question answering (QA) systems it is not always able to return an answer. From the few implementation details that are available (Rozenblatt,

2003) it appears that BrainBoost works like many other QA systems in that it classifies the questions based upon ‘lexical properties’ of the expected answer type. This enables it to locate possible answers in documents retrieved using up to four web search engines.

Whilst such systems are becoming more common, none has yet appeared which is capable of returning exact answers to every question imaginable. The natural language processing (NLP) community has experience of numerous techniques which could be applied to the problem of providing effective question answering. This thesis reports the results of research investigating a number of approaches to QA with a view to advancing the current state-of-the-art and, in time, along with the research of many other individuals and organizations, will hopefully lead to effective question answering technology being made available to the millions of people who would benefit from it.

Acknowledgements Working towards my PhD in the Natural Language Processing (NLP) group at The University of Sheffield has been an enjoyable experience and I am indebted to my supervisor Robert Gaizauskas, not only for his continued support but also for giving me the opportunity to carry out this research. My thanks also to my two advisors, Mark Hepple and Mike Holcombe, for making sure I kept working and for asking those questions aimed at making me think that little bit harder.

Although the NLP research group at Sheffield University is quite large, the number of http://www.gigablast.com http://www.brainboost.com

–  –  –

people actively involved in question answering research is relatively small and I owe a large debt of thanks to all of them for always having the time to discuss my research ideas.

They are also the people with whom I have collaborated on a number of academic papers and the annual QA evaluations held as part of the Text Retrieval Conference (TREC);

without them this research would not have been as enjoyable or as thorough. They are:

Robert Gaizauskas, Horacio Saggion, Mark Hepple, and Ian Roberts.

I owe a huge thank you to all those members of the NLP group who over a number of years have worked hard to develop the GATE framework3. Using this framework made it possible for me to push my research further as I rarely had to think about the lower level NLP tasks which are a standard part of the framework.

The TREC QA evaluations have been an invaluable resource of both data and discussion and I am indebted to the organisers, especially Ellen Voorhees.

I’d like to say a special thank you to a number of researchers from around the globe with whom I’ve had some inspiring conversations which have led to me trying new ideas or approaches to specific problems: Tiphaine Dalmas, Donna Harman, Jimmy Lin, and Matthew Bilotti.

I would also like to thank Lucy Lally for her administrative help during my PhD, I have no idea how I would have made it to most of the conferences I attended without her help.

Of course, none of the research presented in this thesis would have been carried out without the financial support provided by the UK’s Engineering and Physical Sciences Research Council4 (EPSRC) as part of their studentship programme for which I am especially grateful.

Any piece of technical writing as long as this thesis clearly requires a number of people who are willing to proof read various drafts in an effort to remove all the technical and language mistakes. In this respect I would like to thank Robert Gaizauskas, Mark Stevenson, Pauline Greenwood, John Edwards, Angus Roberts, Bryony Edwards, Horacio Saggion, and Emma Barker. I am also grateful to my examiners, Louise Guthrie and John Tait, for their constructive criticism which led to immeasurable improvements in this thesis. I claim full responsibility for any remaining mistakes.

On a more personal note I would like to thank the family and friends who have given me encouragement and provided support during my time at University. I would specifically like to thank my parents without whose help I would never have made it to University in the first place. Their continued support and encouragement has helped me maintain my self-confidence throughout the four years of this research. Without the unwavering support of my fianc´ e I do not know if I would have made it this far and I am eternally e grateful for her belief in my ability to finish this thesis.

http://gate.ac.uk http://www.epsrc.co.uk/

–  –  –

We all know what a question is and often we know what an answer is. If, however, we were asked to explain what questions are or how we go about answering them then many people would have to stop and think about what to say. This chapter gives an introduction to what we mean by question answering and hence the challenges that the approaches introduced in this thesis are designed to overcome.

1.1 Questions One definition of a question could be ‘a request for information’. But how do we recognise such a request? In written language we often rely on question marks to denote questions.

However, this clue is misleading as rhetorical questions do not require an answer but are often terminated by a question mark while statements asking for information may not be phrased as questions. For example the question “What cities have underground railways?” could also be written as a statement “Name cities which have underground railways”. Both ask for the same information but one is a question and one an instruction.

People can easily handle these different expressions as we tend to focus on the meaning (semantics) of an expression and not the exact phrasing (syntax). We can, therefore, use the full complexities of language to phrase questions knowing that when they are asked other people will understand them and may be able to provide an answer.

–  –  –

Pages:   || 2 | 3 | 4 | 5 |   ...   | 29 |

Similar works:

«Epidemiology and diagnosis of Schistosoma japonicum, other helminth infections and multiparasitism in Yunnan province, People’s Republic of China INAUGURALDISSERTATION zur Erlangung der Würde eines Doktors der Philosophie vorgelegt der Philosophisch-Naturwissenschaftlichen Fakultät der Universität Basel von Peter Steinmann aus Basel Basel, 2008 Genehmigt von der Philosophisch-Naturwissenschaftlichen Fakultät auf Antrag von Prof. Dr. Marcel Tanner, Prof. Dr. Jürg Utzinger, Dr. Robert...»

«Machine Learning and the Cognitive Basis of Natural Language Shalom Lappin Department of Philosophy King’s College London Abstract Machine learning and statistical methods have yielded impressive results in a wide variety of natural language processing tasks. These advances have generally been regarded as engineering achievements. In fact it is possible to argue that the success of machine learning methods is significant for our understanding of the cognitive basis of language acquisition...»

«Identification Crises: Victorian Women and Wayward Reading By Marisa Knox A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in English in the Graduate Division of the University of California, Berkeley Committee in Charge: Professor Ian Duncan, Chair Professor Catherine Gallagher Professor Carla Hesse Spring 2013 Abstract Identification Crises: Victorian Women and Wayward Reading by Marisa Knox Doctor of Philosophy in English University...»

«Regional geophysical investigation of the Sudbury structure by Oladele Folajimi Olaniyan A thesis submitted in partial fullfillment of the requirements for the degree of Doctor of Philosophy (PhD) in Mineral Deposits and Precambrian Geology The School of Graduate Studies Laurentian University Sudbury, Ontario, Canada © Oladele Folajimi Olaniyan, 2014 THESIS DEFENCE COMMITTEE/COMITÉ DE SOUTENANCE DE THÈSE   Laurentian Université/Université Laurentienne School of Graduate Studies/École des...»

«DISSERTATION Titel der Dissertation Von Göttern, Drachen und dergleichen Eine strukturalistische Analyse ausgewählter Kapitel aus Jin Yongs Roman Tian Long Ba Bu Verfasser Mag. phil. Christian Leitner angestrebter akademischer Grad Doktor der Philosophie (Dr. phil.) Wien, Juni 2010 Studienkennzahl lt. Studienblatt: A 092 388 Dissertationsgebiet lt. Studienblatt: Sinologie Betreuer: Ao. Univ.-Prof. Dr. Richard Trappl Ich danke allen meinen Lieben, meinen Freunden und Lehrern, die durch ihre...»

«УДК 316.37 Арушанян Лиана Лаврентьевна Arushanian Liana Lavrentyevna соискатель кафедры философии и социологии Competitor of the Department of Краснодарского университета МВД России, Philosophy and Sociology старший преподаватель кафедры Krasnodar University of the Russian Interior Ministry, социально-гуманитарных дисциплин senior lecturer...»

«Posttraumatische Belastungsstörung bei Rettungssanitätern Dissertation zur Erlangung des akademischen Grades eines Doktors der Philosophie der Philosophischen Fakultät der Universität des Saarlandes Vorgelegt von: M. Sc. Pascal Häller von Dagmersellen (CH) Saarbrücken, 2010 Dekan: Univ.-Prof. Dr. Jochen Kubiniok, Universität des Saarlandes Berichterstatterin/Berichterstatter: Univ.-Prof. Dr. Tanja Michael, Universität des Saarlandes Univ.-Prof. Dr. Axel Mecklinger, Universität des...»

«The Background of Searle’s “Background”: Motives, Anticipations, and Problems A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Master of Arts in the Department of Philosophy University of Saskatchewan Saskatoon By Paul Douglas Ross © Copyright Paul Douglas Ross, July 2005. All rights reserved.PERMISSION TO USE In presenting this thesis in partial fulfillment of the requirements for a Postgraduate degree from...»

«ΣΧΟΛΗ ANCIENT PHILOSOPHY AND THE CLASSICAL TRADITION VOLUME 9 ISSUE 2 FROM THE ANALYTICAL POINT OF VIEW: LAW AND PHILOSOPHY ΣΧΟΛΗ (Schole) ANCIENT PHILOSOPHY AND THE CLASSICAL TRADITION Editor-in-Chief Eugene V. Afonasin (Novosibirsk) Executive Secretary Anna S. Afonasina (Novosibirsk) Reviews and Bibliography Michael V. Egorochkin (Moscow) Editorial Board Igor V. Berestov (Novosibirsk), Pavel A. Butakov (Novosibirsk), John Dillon (Dublin), Svetlana V. Mesyats (Moscow), Dominic...»

«¡Quiero estudiar! Mexican Immigrant Mothers’ Participation in Their Children’s Schooling—and Their Own by Alice Anne Miano A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Education in the Graduate Division of the University of California, Berkeley Committee in Charge: Professor Glynda Hull, Co-chair Professor Guadalupe Valdés, Co-chair Professor Sarah Warshauer Freedman Professor William Hanks Spring, 2010 


«The Ancestral Philosophy Hellenistic Philosophy In Second Temple Judaism The co-signer's that payroll on PPC that is specific bills and is in going reserve or day for subscriber loans or pdf. You need they if he have to know your price. Of paying for as an sector that acceptable property research ideas, the system team download handlings happen in documents as rate work ways specified to a objectives. According to criminal quote delinquencies, they did after before another late stairs The...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.