FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 17 |

«vorgelegt von Matthew Brian Blaschko, M.S. aus La Jolla Von der Fakult¨t IV - Elektrotechnik und Informatik a der Technischen Universit¨t Berlin a ...»

-- [ Page 1 ] --

Kernel Methods in Computer Vision:

Object Localization, Clustering,

and Taxonomy Discovery

vorgelegt von

Matthew Brian Blaschko, M.S.

aus La Jolla

Von der Fakult¨t IV - Elektrotechnik und Informatik


der Technischen Universit¨t Berlin


zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften

Dr. rer. nat.

genehmigte Dissertation


Vorsitzender: Prof. Dr. O. Hellwich

Berichter: Prof. Dr. T. Hofmann Berichter: Prof. Dr. K.-R. M¨ller u Berichter: Prof. Dr. B. Sch¨lkopf o Tag der wissenschaftlichen Aussprache: 23.03.2009 Berlin 2009 D83 Zusammenfassung In dieser Arbeit studieren wir drei fundamentale Computer Vision Probleme mit Hilfe von Kernmethoden.

Zun¨chst untersuchen wir das Problem, Objekte in nat¨rlichen Bildern zu lokalisieren, a u welches wir als die Aufgabe formalisieren, die Bounding Box eines zu detektierendes Objekt vorherzusagen. In Kapitel II entwickeln wir hierf¨r ein Branch-andu Bound Framework, das es erlaubt uns, effizient und optimal diejenige Bounding Box zu finden, welche eine gegebene Qualit¨tsfunktion maximiert. Dabei kann es sich a sowohl um die Entscheidungsfunktion eines kernbasierten Klassifikators, als auch um ein Nearest-Neighbor Abstandsmaß handeln kann. Wir zeigen, dass dieses Verfahren bereits hervorragende Lokalisierungergebnisse erzielt, wenn es mit einer einfachen lineare Qualit¨tsfunktion verwendet wird, die durch Trainieren einer Supporta Vektor-Maschine gefunden wurde.

In Kapitel III untersuchen wir, wie sich kernbasierte Qualit¨tsfunktionen lernen a lassen, die optimal f¨r die Aufgabe der Objektlokalisierung geeignet sind. Insbesonu dere zeigen wir, dass Structured Output Regression dies erm¨glicht: im Gegensatz o zu Support-Vektor-Machinen kann Structured Output Regression nicht nur bin¨re a Entscheidungen treffen, sondern beliebige Elemente eines Ausgaberaumes vorhersagen. Im Fall der Objektlokalisierung besteht der Ausgaberaum dabei aus allen m¨glichen Bounding Boxes innerhalb des Zielbildes. Structured Output Regreso sion lernt eine Funktion, die die Kompatibilit¨t zwischen Eingaben und Ausgaben a messen kann, und pr¨diziert a

–  –  –

In this thesis we address three fundamental problems in computer vision using kernel methods. We first address the problem of object localization, which we frame as the problem of predicting a bounding box around an object of interest. We develop a framework in Chapter II for applying a branch and bound optimization strategy to efficiently and optimally detect a bounding box that maximizes objective functions including kernelized functions and proximity to a prototype. We demonstrate that this optimization can achieve state of the art results when applied to a simple linear objective function trained by a support vector machine. In Chapter III, we then examine how to train a kernelized objective function that is optimized for the task of object localization. In particular, this is achieved by the use of structured output regression. In contrast to a support vector machine, structured output regression does not simply predict binary outputs but rather predicts an element in some output space. In the case of object localization the output space is the space of all possible bounding boxes within an image. Structured output regression learns a function that measures the compatibility of inputs and outputs, and the best output is predicted by maximizing the compatibility over the space of outputs. This maximization turns out to be exactly the same branch and bound optimization as developed in Chapter II. Furthermore, a variant of this branch and bound optimization is also utilized during training as part of a constraint generation step.

We then turn our focus to the problem of clustering images in Chapter IV. We first report results from a large scale evaluation of clustering algorithms, for which we measure how well the partition predicted by the clustering algorithm matches a known semantically correct partition of the data. In this study, we see particularly strong results from spectral clustering algorithms, which use the eigenvectors of an appropriately normalized kernel matrix to cluster the data. Motivated by this success, we develop a generalization of spectral clustering to data that appear in more than one modality, the primary example being images with associated text.

As spectral clustering algorithms can be interpreted as the application of kernel principal components analysis followed by a reclustering step, we use the generalization of regularized kernel canonical correlation analysis followed by a reclustering step. The resulting algorithm, correlational spectral clustering, partitions the data significantly better than spectral clustering, and allows for the projection of unseen data that is only present in one modality (e.g. an image with no text caption).

Finally, in Chapter V, we address the problem of discovering taxonomies in data.

Given a sample of data, we wish to partition the data into clusters, and to find a taxonomy that relates the clusters. Our algorithm, numerical taxonomy clustering, works by maximizing a kernelized dependence measure between the data and an abstracted kernel matrix that is constructed from a partition matrix that defines the clusters and a positive definite matrix that represents the relationship between clusters. By appropriately constraining the latter matrix to be generated by an additive metric, we are able to interpret the result as a taxonomy. We make use of the well studied field of numerical taxonomy to efficiently optimize this constrained problem, and show that we not only achieve an interpretable result, but that the quality of clustering is improved for datasets that have a taxonomic structure.


–  –  –

This thesis would not be possible without the support and help I’ve received from many people. Christoph Lampert, Arthur Gretton, Thomas Hofmann, Tinne Tuytelaars, and Wray Buntine have been wonderful coauthors. It is a privelige to learn by working with such excellent scientists. I owe special mention to Christoph Lampert, Bernhard Sch¨lkopf, and Thomas Hofmann for advising me throughout my PhD.

o Christoph additionally translated my


into German.

I could not have asked for a better environment to do a PhD than the Max Planck Institute for Biological Cybernetics. The computer vision group consisted of several very strong researchers, and I enjoyed working with and learning from Guillaume Charpait, Matthias Franz, Peter Gehler, Wolf Kienzle, Kwang In Kim, and Sebastian Nowozin. I’d like to thank all of my colleagues, and especially to thank Sabrina Nielebock for all her help.

During my doctoral work, I was funded in part by a Marie Curie fellowship through the PerAct project (EST 504321), and by the EU funded CLASS project (IST 027978). Through the CLASS project, I was able to learn about the research being done at a consortium of five leading European research insitutions, and to get feedback on my own work. Thanks are due to the participants for helping to provide insight into the big issues addressing the field of computer vision, and the role that statistical learning can play in solving them.

Klaus-Robert M¨ller has given very valuable feedback, and is responsible for having u suggested several experiments that have improved the scientific content of this work.

I especially thank him for reading my thesis, and for giving his comments during a marathon three hour phone call between California and Berlin, all while recovering in bed from a surgery for his broken leg. I’d also like to thank Gabriela Ernst at the Technische Universit¨t Berlin for all her help throughout the process of arranging a the defense.

A PhD isn’t all work, and I’d like to take the time to mention my friends who made my time in T¨bingen so enjoyable, whether it was just a coffee break, or a night u out. Among others: Yasemin Altun, Andreas Bartels, Matthias Bethge, Olivier Chapelle, Guillaume Charpait, Jan Eichhorn, Ayse Naz Erkan, Jason Farquhar, Peter Gehler, Elisabeth Georgii, Arthur Gretton, Moritz Grosse-Wentrup, Jez Hill, Matthias Hofmann, Reshad Hosseini, Stefanie Jegelka, Wolf Kienzle, Kwang In Kim, Jens Kober, Lukasz Konieczny, Oliver Barnabas Kroemer, Shih-pi Ku, Christoph Lampert, Luise Liebig, Markus Maier, Suzanne Martens, Betty Mohler, Sebastian Nowozin, Jan Peters, Justus Piater, Hiroto Saigo, Gabriele Schweikert, Matthias Seeger, Jacquelyn Shelton, Suvrit Sra, Julia Veit, Lena Veit, Ulrike von Luxburg, Christian Walder, Anne Wessendorf, and Mingrui Wu. Extra special thanks goes to Elena Prokofyeva.

Finally, I’d like to thank my family for all their support.

Chapter I Introduction Computer vision is the process of automatically understanding visual information and abstracting meaningful representations that can be used in subsequent data processing and organization. It is a relatively immature field: the goal of enabling computers to interact with visual information with similar sophistication to a human is far from achieved. Furthermore, the tasks which have been approached by the research community are fragmented and not always well defined. Nevertheless, there has been significant progress in recent years, especially in the areas of object classification and localization (the more classical tasks of three-dimensional reconstruction and tracking have approached a relatively high level of sophistication, and have not been addressed in this work). This work improves on the state of the art in several important computer vision tasks, and does so by leveraging the power of statistical learning theory and the flexibility of representing data with domain specific kernels, positive definite functions that are equivalent to an inner product in some Hilbert space Aizerman et al. (1964); Sch¨lkopf and Smola (2002). Statistical learning theo ory allows us to pose the problem of learning functions that map raw image data to their meaningful representations as the problem of generalizing from observed examples. Rather than engineer the solution using hand tuning, we utilize observed data directly in order to more quickly, flexibly, and accurately learn the function. While the problem of supervised classification has been shown to be especially suited to the computer vision setting, we attempt to move beyond this relatively well studied area and propose additional solutions from statistical learning theory for problems in the computer vision domain. Specifically, we have addressed three problems of interest to the computer vision community, each of which has been the subject of recent attention due to their importance in the automatic understanding of visual scenes on a semantic level: object localization, clustering, and taxonomy discovery.

In this chapter, we introduce several basic concepts from statistical learning theory and introduce the notation for kernels that we will use throughout this thesis (Section I.1). In particular, we will see that the representer theorem allows us to easily kernelize certain classes of optimization problems. We then review several recent advances in machine learning that will be applicable to problems in computer vision. Once we have finished our overview of machine learning, we will discuss in Section I.2 the basic concepts from computer vision used throughout the thesis. We will explore how the incorporation of invariances can be treated naturally within the framework of kernel methods, discuss methods for learning task specific image representations, and give an overview of the state of the art in the learning of

16 Chapter I

semantic information from image data. Finally, in Section I.3, we introduce the main contributions of this thesis, and place them within the context of the current state of the art.

I.1 Kernel Methods in Machine Learning Kernel methods have increased in popularity in the past two decades due to their solid mathematical foundation, tendency toward easy geometric interpretation, and strong empirical performance in a wide variety of domains Vapnik (1995, 1998);

Burges (1998); M¨ller et al. (2001); Sch¨lkopf and Smola (2002); Shawe-Taylor and u o Cristianini (2004); Hofmann et al. (2008). While traditional linear methods are well founded mathematically, and existing algorithms tend to be optimized for performance, real world data often have significant non-linearities. By adopting an appropriate non-linear kernel, the mathematical foundations and often significant portions of the algorithmic analysis of linear algorithms can be transferred to the non-linear case. Though the feature space implicit in the kernel function can be very high dimensional, by appropriately formulating the problem so that the (implicit) feature vectors are only accessed through kernel evaluations, we can avoid the computational cost imposed by the size of the space.

Given a sample1 of training points (x1, y1 ),... (xi, yi ),... (xn, yn ) ∈ X × Y, where X is some input space, and Y is an output space, we wish to learn a function f : X → Y such that the expected loss, Epxy [l(x, f (x), y)], is minimized for some loss function, l : X × Y × Y → R. Ignoring the computational issues of finding the minimizing f ∈ F, for some class of functions, F, we are faced with the problem that we do not know the underlying data distribution, pxy, of sample points in X × Y.

We can of course substitute the empirical loss on the training sample, n l(xi, f (xi ), yi ), (I.1) n i=1

–  –  –

where the parameter C controls the level of regularization.

Let F be a reproducing kernel Hilbert space (RKHS) of functions from X to R.

To each point x ∈ X there corresponds an element φ(x) ∈ F (we call φ : X → F the feature map) such that φ(x), φ(x ) F = k(x, x ), where k : X × X → R is a unique positive definite kernel. For optimization problems of the form given in Equation (I.2), the well known Representer Theorem (e.g. (Sch¨lkopf and Smola, o Samples are usually assumed to be i.i.d.


–  –  –

Pages:   || 2 | 3 | 4 | 5 |   ...   | 17 |

Similar works:

«1 57 ways to boost sales and get more repeat bookings for your hotel Copyright © Zeal Coaching Ltd www.zealcoaching.com Introduction So you want to learn the strategies and techniques you can apply on a day-today basis to drive up guests' spend per head and get them booking to come back time and again you've come to the right place. Listed here are here are 57 practical ways to earn more profit from your hotel, which you can put into practice today. As you read through this report I'm certain...»

«ABSTRACT SCHMIDT, ANDREAS CARLO. Advancing Microelectrode Technology and Voltammetric Methodology for Improved Molecular Monitoring in Live Tissue. (Under the direction of Dr. Leslie A. Sombers.) The development of real-time electrochemical sensing technologies has significantly impacted our understanding of the dynamic molecular mechanisms underlying basic brain function, as well as a variety of neuropathologies and disease states. However, carbon-fiber microelectrodes (CFME) developed almost...»

«Aphelion 600AG 802.11a/b/g Intelligent Sequential Outdoor Wireless Access Point User Manual V.06.3.14 March, 2006 802.11a/b/g Intelligent Sequential Outdoor Wireless AP Table of Contents Chapter 1. Warranty and Support 1.1. Warranty 1.2. Technical Support Chapter 2. Aphelion 600AG 2.1. Features 2.2. Specifications Chapter 3. Hardware Installation 3.1. Package Contents 3.2. Hardware Description 3.3. Outdoor Installation Chapter 4. Basic Configurations 4.1. Aphelion System Menu Tree (SMT) 4.2....»

«General Properties of Hydrogels O. Okay Abstract In the application areas of polymer hydrogels, precise information on their molecular constitution as well as their elastic properties is required. Several interesting molecular features control the elastic properties of the hydrogels. In this chapter, we describe general properties of hydrogels formed by free-radical crosslinking copolymerization of vinyl/divinyl monomers in aqueous solutions. Special attention is paid to the relationships...»

«PIZZA HUT FAQ’s Technical & System Questions 1. If the TMS application goes down during my commit, what should I do? TMS and the VCC are two different systems. Do not log out of the VCC if TMS goes down during your commit unless instructed by an ASR; logging in and out of the VCC causes missed commits. You will be notified in the Pizza Chat room if TMS is down. You can access Windows Task Manager via Ctrl+Alt+Delete and close the application. Relaunch the application via your desktop icon....»

«Lesen Sie die Empfehlungen in der Anleitung, dem technischen Handbuch oder der Installationsanleitung für SAMSUNG X60-PRO T7200 BENITO. Hier finden Sie die Antworten auf alle Ihre Fragen über die SAMSUNG X60-PRO T7200 BENITO in der Bedienungsanleitung (Informationen, Spezifikationen, Sicherheitshinweise, Größe, Zubehör, etc.). Detaillierte Anleitungen zur Benutzung finden Sie in der Bedienungsanleitung. Bedienungsanleitung SAMSUNG X60-PRO T7200 BENITO Gebrauchsanweisung SAMSUNG X60-PRO...»

«LGMP/LGCP: Eine Protokoll-Suite für skalierbare Multicast-Kommunikation im Internet Markus Hofmann Institut für Telematik, Universität Karlsruhe1 Zirkel 2, 76131 Karlsruhe, Germany E-Mail: hofmann@acm.org Zusammenfassung. Neben der klassischen Zweiparteien-Kommunikation gewinnt die Datenübertragung von einem Sender an mehrere Empfänger, die sogenannte Multicast-Kommunikation, zunehmend an Bedeutung. Ein großes Spektrum moderner Anwendungen aus den Bereichen verteilte Systeme, verteilte...»

«Juan F. Cerdá1 Recurring Elements of The Macbeth Mythos2 Abstract: This article starts off from a distinction between an Aristotelian and a semiotic understanding of art to analyse the lines of continuity between a sourcetext and its adaptations. Thus, it contrasts Shakespeare’s Macbeth to Ángel-Luis Pujante’s Spanish translation of the play, to Welcome Msomi’s stage adaptation (uMabatha), and to two film adaptations by Akira Kurosawa (Kumonos jô) and Billy Morrissette (Scotland, PA),...»

«HCTL Open International Journal of Technology Innovations and Research (IJTIR) http://ijtir.hctl.org Volume 14, April 2015 e-ISSN: 2321-1814, ISBN (Print): 978-1-62951-946-3 Analysis and Characterization of Iron Doped Sintered Silicon Carbide without Binder in through Powder Metallurgy Route Shiv Dayal Dhakad1, Ankit Bansal2, Harneet Saggu3, Mayank Dwiedi 4 shivdhakad999@gmail.com Abstract The mechanism of sintering and particular of the structure formation in the liquid phase sintering of...»

«Privacy-enhanced Identity Management From Cryptography to Practice Dieter M. Sommer Dissertation Privacy-enhanced Identity Management From Cryptography to Practice Vom Fachbereich Informatik der Technischen Universität Darmstadt genehmigte Dissertation zur Erlangung des Grades Doktor-Ingenieur (Dr.-Ing.) von Dipl.-Ing. Dieter M. Sommer aus Bruck an der Mur (Österreich) Referenten: Prof. Dr. Michael Waidner Prof. Dr. Stefan Katzenbeisser Dr. Jan Camenisch Tag der Einreichung: 18. Mai 2013 Tag...»

«ANTHROPOLOGY AND IMPERIALISM BY KATHLEEN GOUGH This paper was first prepared for an audience of anthropologists in the United States of America, where I have taught and researched for the past twelve years.! Some of the questions that it raises apply, although perhaps less acutely, to social and cultural anthropologists from the other industrial nations of Western Europe, North America, Australia, and New Zealand. The international circumstances to which I refer no doubt also create problems...»

«Brand Personification: An Examination of the Antecedents and Consequences of Consumer Anthropomorphism Abstract Building on the literature in brand personification, consumer-brand relationships, and theory of anthropomorphism, this research examines the antecedents and consequences of consumer anthropomorphism. A 2 (brand personality: congruent versus incongruent) × 2 (brand name: known versus unknown) between-subjects experimental design was implemented online via Amazon Mechanical Turk. The...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.