FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |


-- [ Page 1 ] --

Data Mining MethoDologies for supporting

engineers During systeM iDentification

THÈSE NO 4056 (2008)







PAR Sandro SAITTA ingénieur informaticien diplômé EPF de nationalité suisse et originaire de Bavois (VD)

acceptée sur proposition du jury:

Prof. B. Moret, président du jury Prof. I. Smith, Dr B. Raphael, directeurs de thèse Prof. B. Faltings, rapporteur Prof. P. Struss, rapporteur Dr E. Viennet, rapporteur Suisse ‫ای ر ﻨﺎ ﻞ ز ﺪ ﯽ ﻦ‬ i Acknowledgments This work was funded by the Swiss National Science Foundation under grant #200020-109257.

My first acknowledgment goes to my co-advisor, Prof. Ian Smith, who was present during my PhD for guiding my work. He also was a valuable person for explaining to me crucial aspects of thesis work. Benny Raphael, my other co-advisor and an Assistant Prof. at the National University of Singapore, followed my thesis from the very beginning to the end. Working in an autonomous manner during four years is not always straightforward. When I faced an obstacle, Benny was always there to help me. I will always remember a sentence he wrote to me: “Remember, you can make something good come out of everything if you do it in the right spirit”. Prakash Kripakaran, a Post doc. researcher at EPFL, is also the kind of person that you only meet once in your life. I think my work would never have reached the present state without his help. I had a lot of fruitful discussions with him regarding difficult issues in my research. Instead of only giving a simple answer, Prakash always suggested a new way to face a particular problem. Fran¸ois Fleuret, a researcher at IDIAP, is an expert in machine learning.

c I had a lot of general discussions with him about data mining and it was really good food for thought. He is also the first person who made me understand what research really is about. I also thank examiners for their time and interest in my work: Prof. Boi Faltings (EPFL), Prof.

Youn`s Bennani (Universit´ Paris 13) and Prof. Peter

–  –  –

Data alone are worth almost nothing. While data collection is increasing exponentially worldwide, a clear distinction between retrieving data and obtaining knowledge has to be made. Data are retrieved while measuring phenomena or gathering facts. Knowledge refers to data patterns and trends that are useful for decision making. Data interpretation creates a challenge that is particularly present in system identification, where thousands of models may explain a given set of measurements. Manually interpreting such data is not reliable. One solution is to use data mining. This thesis thus proposes an integration of techniques from data mining, a field of research where the aim is to find knowledge from data, into an existing multiple-model system identification methodology.

It is shown that, within a framework for decision support, data mining techniques constitute a valuable tool for engineers performing system identification. For example, clustering techniques group similar models together in order to guide subsequent decisions since they might indicate possible states of a structure. A main issue concerns the number of clusters, which, usually, is unknown.

For determining the correct number of clusters in data and estimating the quality of a clustering algorithm, a score function is proposed. The score function is a reliable index for estimating the number of clusters in a given data set, thus increasing understanding of results. Furthermore, useful information for engineers who perform system identification is achieved through the use of feature selection techniques. They allow selection of relevant parameters that explain candidate models. The core algorithm is a feature selection strategy based on global search.

In addition to providing information about the candidate model space, data mining is found to be a valuable tool for supporting decisions related to subsequent sensor placement. When integrated into a methodology for iterative sensor placement, clustering is found to provide useful support through providing a rational basis for decisions related to subsequent sensor placement on existing structures. Greedy and global search strategies should be selected according to the context. Experiments show that whereas global search is more efficient for initial sensor placement, a greedy strategy is more suitable for iterative sensor placement.

Keywords: data mining, machine learning, correlation, PCA, clustering, K-means, cluster validity, feature selection, PGSL, SVM, system identification, decision support, sensor placement, measurement system design.


–  –  –

λi i-th Lagrange multiplier C SVM tuning parameter representing the penalty of misclassifying training examples Pi i-th probability H(X) Entropy of variable X


–  –  –

“The capacity of digital data storage worldwide has doubled every nine months for at least a decade, at twice the rate predicted by Moore’s Law for the growth of computing power during the same period.” (Fayyad and Uthurusamy, 2002)

–  –  –

This chapter introduces the context of the thesis. It briefly describes related topics such as data mining, system identification and sensor placement. The last section presents research questions as well as the research methodology for achieving the objectives of this thesis.

1.1 Context Data alone is worth almost nothing. While data is increasing exponentially, people in some fields are “starving” for knowledge. In spite of this, the gap between data and knowledge may be huge. These days, the meaning of the word data is often confused with knowledge. Knowledge is obtained through the understanding of data. The amazing increase in data worldwide brings several challenges. The more the amount of data, the more difficult it is to understand. It is sometimes assumed that the increase of knowledge is proportional to the increase of data. The reason for such an assertion might be the lack of appreciation of the difference between obtaining and understanding data.

Increase of data is a challenge that is particularly present in engineering. The number of sensors is increasing while costs are decreasing. In many domains, engineers are saturated with data of many types. A good example of such a task is model-based diagnosis (de Kleer and Williams, 1987) and system identification (Ljung, 1999). Recently, a new methodology (RobertCHAPTER 1. INTRODUCTION Nicoud, 2003) has been developed in which system identification is treated as a constraint satisfaction problem (CSP) instead of the more traditional optimization problem. This approach results in a set of several candidate models instead of a single model.

When there are many models, engineers need sophisticated tools to interpret them. Data mining (Tan et al., 2006) may provide help. Data mining techniques are used for the task of identifying characteristics of candidate models. Better system identification is possible by integrating data mining into the overall process. No work has been done on mining models.

More specifically, data mining techniques have never been used for identifying characteristics of candidate models that explain observations (Chapter 2 provides more details). The present work is an attempt to fill this knowledge gap by developing an overall methodology for multiple-model system identification that integrates data mining to provide support for engineers.

1.2 Data Mining

Data mining techniques are becoming important in the context of the increasing trend in data worldwide as explained in Section 1.1. There are more and more sensors capturing changes in our environment and our infrastructure. Therefore, a growing challenge involves determining the meaning of data. As written in Piatetsky-Shapiro (2007), “[...] as long as the world keeps producing data of all kinds [...] at an ever increasing rate, the demand for data mining will continue to grow.” Data mining is a field which is concerned with understanding data. In other words, the aim is to look for patterns in data (Pal and Mitra, 2004). As this pattern may be very difficult to find, it is sometimes compared to gold mining in rivers (Figure 1.1); gravel represents the enormous amount of data and gold nuggets are the hidden patterns to find.

Although civil engineers were among the first of all traditional engineering disciplines to use the power of computers five decades ago, they are now lagging behind other professions in the use of advanced techniques such as data mining. Indeed, data mining techniques have proven their efficiency in domains such as handwritten digit recognition, image and speech recognition, DNA sequences, financial time series and web mining. Although data mining has been used in engineering, most of this work takes advantage of the predictive abilities of data mining methods. Very little work applies data mining techniques to tasks such as describing the structure of data. Known to the author, there is no attempt to apply data mining to models in system identification. This work is thus a new application for data mining.


–  –  –

1.3 System Identification Several years after construction, structures may no longer fulfill their intended functions. As written in Levy and Salvadori (2002), “It is the destiny of the man-made environment to vanish [...] ”. People outside of civil engineering domains have the misconception that civil engineers know exactly how structures behave in service. The complexity of both the structures and the materials involved make the understanding of exact structural behavior impossible. One way to learn about the state of the structure, before it collapses or as frequently happens, it reaches a stage where repair costs increase by orders of magnitude, is through diagnosis. When the goal of diagnosis is to determine models that reasonably explain measured responses, the approach is commonly known as system identification. Although system identification is closely related to diagnosis, the focus of this work is on helping engineers identify the system, not diagnose it.

The aim is not to propose a way to repair the system as it is the case in diagnosis, rather to find the state of the system (even if it is not damaged) in order to improve management of artifacts that are expected to last more than one hundred years.

The goal of system identification is to determine the state of a system and values of system parameters through comparisons of predicted with observed responses. Traditionally, this is treated as an optimization problem in which the best combination of values of model parameters are selected such that differences between model predictions and measurements are minimal.

Recent work has brought out the different types of errors that can occur in system identificaCHAPTER 1. INTRODUCTION tion processes (Robert-Nicoud, 2003). These errors make optimization in system identification unreliable since the global optimum may not correspond to the true state of the system due to compensating modeling and measurement errors. In such situations, treating the task as a constraint satisfaction problem (CSP) is more appropriate (see Section 2.5). It is noted that recent work proposes a distributed version of the constraint programming approach (Faltings, 2006).

Since measurements are indirect, the use of models is necessary. Even though a design model may be the most appropriate for designing and analyzing the structure prior to construction, it often cannot be used for system identification. This is usually because design models are conservative. On the other hand, diagnosis models have to be as accurate as possible in order to avoid wrong diagnoses. The current work is a combination of model based reasoning concepts from computer science (de Kleer and Williams, 1987) and traditional model updating techniques used in engineering (Ljung, 1999). A correct understanding of the output using such techniques is an important challenge.

Difficulties associated with system identification are that since many model predictions might match observations with certain limits, the best matching model may not be the correct model.

In this work, the reliability of identification is defined as the probability that the candidate model(s) obtained through system identification corresponds to reality. Reliability is poor when many models predict the similar responses at measured locations. Factors that affect the reliability of system identification have been studied in previous research (Robert-Nicoud et al., 2004). The present work is an extension of this research and uses data mining techniques for a better estimation of the reliability of identification.

1.4 Sensor Placement A basic assumption of system identification is that there is a set of sensors measuring an effect. There are thousands of ways to measure physical phenomena in structures and many new technologies are emerging. Although their development has been the result of significant scientific effort, decisions related to the choice of measurement technology, specifications of performance and positioning of measurement locations are often not based on systematic and rational methodologies. While use of engineering experience and judgment may often result in measurement systems that provide useful results, a poorly designed measurement system can waste time and money.

When placing sensors on a structure, the analogy with medical diagnosis is relevant. People usually go to the doctor for a diagnosis of their conditions. They want to know what is wrong. For that, the doctor measures physiological parameters such as temperature and pulse rate. They try

1.5. OBJECTIVES to infer causes from what is measured. The way doctors conduct the measurements is iterative.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 21 |

Similar works:

«American Society of Hypertension 2010 Annual Scientific Meeting and Exposition 25th Anniversary, 1986–2010 2010 Program Book Hilton New York Saturday, May 1, 2010 – Tuesday, May 4, 2010 in hyPeRtenSion Many hypertensive patients could benefit from a more comprehensive RAAS inhibitor than an ARB to help achieve their BP goals 1-3 A SMART VALTURNATIVE Visit Booth #1100 to learn more about VALTURNA Indication VALTURNA is indicated for the treatment of hypertension in adults. VALTURNA may be...»

«Marburg Journal of Religion: Volume 2, No. 1 (May 1997) PAPER PREPARED FOR THE 95th ANNUAL MEETING OF THE AMERICAN ANTHROPOLOGICAL ASSOCIATION, SAN FRANCISCO, NOVEMBER 21, 1996 E. B. Tylor and the Anthropology of Religion Benson Saler Brandeis University Waltham, Massachusetts eMail: saler@binah.cc.brandeis.edu In light of the retrospective theme of this Annual Meeting, it is fitting that we pay homage to Edward Burnett Tylor (1832-1917). His appointment as Reader in Anthropology at Oxford in...»

«AUTOMOTIVE REFINISHING INDUSTRY ISOCYANATE PROFILE Submitted to: U.S. Environmental Protection Agency Office of Pollution Prevention and Toxics 401 M Street, S.W. Washington, D.C. 20460 Submitted by: Science Applications International Corporation 11251 Roger Bacon Drive Reston, Virginia 20190 May 1, 1997 revised June 2005 (see Editor’s Note) EPA Contract No. 68-D4-0098 SAIC Project No. 01-1029-07-8088 Acknowledgments This report, “Isocyanates Profile: Autorefinishing Industry”, was...»

«Virtual Reality 301 Produktivitätssteigerung durch Virtual Reality basierte Dienstleistungen Schenk, M.; Blümel, E.; Straßburger, S.; Hintze, A.; Sturek, R. 1 Einleitung Der Trend in Richtung immer kürzerer Produktentwicklungszyklen und kürzeren Produktionsanlaufphasen und Inbetriebnahmezeiten von komplexen Maschinen und Anlagen erfordert den Einsatz von innovativen Lösungen. Der Einsatz und die Entwicklung von modernen IT-Verfahren und Werkzeugen ist ein Ansatz zur Problemlösung. Auf...»


«TACTICAL SHOOTING, A FEW THOUGHTS When looking for a tactical shooting system to adopt for your tactical team or agency, several important factors need to be considered. First, the system you choose should go from dry fire, to range fire, to Sims, to Live Fire Close Quarter Battle (CQB), to Combat (street or tactical encounter) with no changes and be accomplished in a safe manner. Too many systems have limitations or problems when transitioning across the spectrum of shootings we encounter....»

«KFK 2204 September 1975 Institut für Reaktorentwicklung Definition, übersetzung und Anwendung benutzerorientierter Sprachen als Erweiterung von Pl/1 in dem System für das rechnerunterstützte Entwickeln und Konstruieren REGENT G. Enderle GESELLSCHAFT FÜR KERNFORSCHUNG M.B.H. KARLSRUHE Als Manuskript vervielfältigt Für diesen Bericht behalten wir uns alle Rechte vor GESELLSCHAFT FüR KERNFORSCHUNG M. B. H. KARLSRUHE KERNFORSCHUNGS ZENTRUM KARLSRUHE KFK 2204 Institut für Reaktorentwicklung...»

«American University Student Conduct Code 2015–2016 d American University • Student Conduct Code Preamble The central commitment of American University is to the development of thoughtful, responsible human beings in the context of a challenging yet supportive academic community. American University Statement of Common Purpose To achieve its ends, an academic community requires the knowledge, integrity, and decency of its members. In turn, the community helps individuals develop habits and...»

«ВЕСТНИК НП «АРФИ» НАУЧНО-ПРАКТИЧЕСКОЕ ЭЛЕКТРОННОЕ ИЗДАНИЕ ДЛЯ СПЕЦИАЛИСТОВ ПО СВЯЗЯМ С ИНВЕСТОРАМИ #8 Сентябрь / 2014 ВЕСТНИК НП «АРФИ», научно-практическое электронное издание для специалистов по связям с инвесторами, распространяется бесплатно. В электронной форме...»

«Archives of Ontario Finding Ministry of Natural Resources Aerial Photography Most Recent Update: 232 Research Guide January 2015 This research guide has information about Ministry of Natural Resources Aerial Photography at the Archives of Ontario. Use this guide to help search and order aerial photographs. THE RECORDS HISTORY AND FUNCTION ELEMENTS OF AN AERIAL PHOTOGRAPH NUMBERING INDEXING HOW TO ACCESS THE RECORDS – A PATHFINDER FINDING THE MOST RECENT PRINTED AERIAL PHOTOGRAPHY AVAILABLE....»

«FÁBOS CONFERENCE on Landscape and Greenway Planning 2010 Budapest, July 8–11. Book of Abstracts Presentations and Posters Editors: Julius Gy. Fábos, Robert L. Ryan, Sándor Jombach Organized by: Corvinus University of Budapest Department of Landscape Planning and Regional Development University of Massachusetts Amherst Department of Landscape Architecture and Regional Planning Budapest, 2010 This publication should be cited as follows: Fábos, J. Gy., Ryan, R. L., Jombach, S. (Eds.) 2010:...»

«The evangelical revival of Philadelphia The foundation for the great Second Advent movement 1833 – 1844 An exposition of Revelation Chapter 3:7-13 Austin P Cooke Table of contents Background Time period of Philadelphia Character of the Philadelphian church John Wesley An open door Reforms in England Great Britain revived The door of mission opportunity The synagogue of Satan Anglican opposition Wesley exonerated England’s greatness The prophetic awakening Dramatic events in the early 19th...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.