FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |

«VIRGILE HÖGMAN Master’s Thesis in Computer Science Supervisor: Alper Aydemir Examiner: Stefan Carlsson Computer Vision and Active Perception ...»

-- [ Page 1 ] --

Building a 3D map from RGB-D sensors


Master’s Thesis in Computer Science

Supervisor: Alper Aydemir

Examiner: Stefan Carlsson

Computer Vision and Active Perception Laboratory

Royal Institute of Technology (KTH), Stockholm, Sweden

TRITA xxx yyyy-nn


For a mobile robot exploring an unknown static environment, localizing itself and building a map at the same time

is a chicken-or-egg problem, known as Simultaneous Localization And Mapping (SLAM). When a GPS receiver cannot be used, such as in indoor environments, the measurements are generally provided by laser rangefinders and stereo cameras, but they are expensive and standard laser rangefinders offer only 2D cross sections. However, recently there has been a great interest in processing data acquired using depth measuring sensors due to the availability of cheap and performant RGB-D cameras. For instance, the Kinect developed by Prime Sense and Microsoft has considerably changed the situation, providing a 3D camera at a very affordable price.

In this study, we will see how a 3D map based on a graphical model can be built by tracking visual features like SIFT/SURF, computing geometric transformations with RANSAC, and applying non-linear optimization techniques to estimate the trajectory. This can be done from a sequence of video frames combined with the depth information, using exclusively the Kinect, so the field of applications can be wider than robotics.

Referat Skapa en 3D karta från en RGB-D kamera En robot som utforskar en okänd statisk miljö, där den inte bara ska lokalisera sig utan också skapa en karta på samma gång, måste hantera det problem som benämns Simultaneous Localization And Mapping (SLAM). När en GPSmottagare inte kan användas, typiskt i en inomhus-miljö, brukar laseravståndsmätare och kameror andvändas som sensorer. En laseravståndsmätare är dyr och erbjuder information enbart i 2D. Kameror kräver mycket databehandling eftersom de inte tillhandahåller djupinformation. Det har på sistone vuxit fram ett stort intresse för behandling av data från RGB-D kameror, dvs kameror som utöver den vanliga bilden ger djupinformation. Nyligen blev Kinect, utvecklad av Prime Sense och Microsoft, tillgänglig. Kinect erbjuder en RGB-D data till ett väldigt attraktivt pris.

I denna rapport studerar vi hur man kan skapa en 3D karta baserad på en grafisk model genom att spåra visuella landmärken från tex SIFT/SURF, beräkna geometriska transformationer med RANSAC, och applicera icke-linjära optimeringstekniker för att skatta hur sensorn rört sig. Vi visar hur vi kan använda en vanlig Kinect till detta utan några andra sensorer på roboten som annars ofta är fallet.

Detta innebär att tillämpningsområdet kan vara bredare än robotik.

Acknowledgments “ The most exciting phrase to hear in science, the one that heralds new discoveries, is not ’Eureka!’ but ’That’s funny...’ ” — Isaac Asimov My first thanks go to my supervisor Alper Aydemir, for carefully following my progress and always giving me some useful suggestions to solve the problems raised all along this work. I am grateful for all the time he spent in discussions, experiments and practical issues. I also want to warmly thank Giorgio Grisetti for his most valuable advices, Patric Jensfelt and John Folkesson for their help, and my examiner Stefan Carlsson. My deepest thoughts go to my family and friends, especially to my father Hugues for constantly showing interest in my activities, and for his wise support.

This work was achieved at Computer Vision and Active Perception Lab, at the Royal Institute of Technology of Stockholm, a very pleasant ambient to work in, for the quality and the atmosphere of the school by itself, and above all for the persons I encountered, thanks to their availability, involvement, and their ease to share their knowledge. Special thanks to André Susano Pinto for his friendship, his smart ideas, and the nice moments spent together during his stay. Staff, students, and visitors, this time would not have been the same without Alessandro Pieropan, Ali Mosavian, Andrzej Pronobis, Cheng Zhang, Christian Smith, Gert Kootstra, Johan Ekekrantz, Josephine Sullivan, Magnus Burènius, Miroslav Kobetski, Niklas Bergström, Oscar Danielsson, Renaud Detry, Lazaros Nalpantidis, Victoria Matute Arribas, and many others.

Contents Acknowledgments Contents

–  –  –

Introduction To navigate in an unknown environment, a mobile robot needs to build a map of the environment and localize itself in the map at the same time. The process addressing this dual problem is called Simultaneous Localization And Mapping (SLAM). In an outdoor environment, this can generally be solved by a GPS which provides a good accuracy for the tasks the robot can take on. However, when moving indoor or in places where the GPS data is not available, or not reliable enough, it can become difficult to estimate the robot’s position precisely and other solutions have to be found.

The main problem raised with SLAM comes from the uncertainty of the measurements, due to the sensory noise or technical limitations. Probabilistic models are widely used to reduce the inherent errors and provide satisfying estimations.

While this process is generally based on data provided by sensors such as laser scanners, combined with the odometry, Visual Simultaneous Localization And Mapping (VSLAM) focuses on the use of camera, as illustrated in figure 1.1.

Figure 1.1: Concept of Visual SLAM.

The poses of the camera (hence, the robot for a fixed camera) are determined from video data. The estimations generally drift with respect to the real trajectory, and the uncertainty grows over time.

–  –  –

1.1 Context Currently, most of robotic mapping is performed using sensors that offers only a 2D cross section of the environment around them. One reason is that acquiring high quality 3D data was either very expensive or had hard constraints on the robot movements. Therefore, research has mainly focused on laser scanners to solve the SLAM problem, although there are some methods making use of stereo and monocular cameras [14] [23]. However, recently there has been a great interest in processing data acquired using depth measuring sensors due to the availability of cheap and efficient RGB-D cameras. For instance, the Kinect Camera developed by Prime Sense and Microsoft has considerably changed the situation, providing a 3D camera at a very affordable price. Primarily designed for entertainment, it has received a warm welcome in the research community, especially in robotics.

In the past, to solve the inherent problem of drift and provide a reliable estimation of the camera poses, most of the projects have used techniques such as Extended Kalman Filtering (EKF) or particle filters [25]. More recently, several methods rely on pose graphs to model and improve the estimations [24] [9] [18] [8].

Some projects making use of both RGB-D data and graph optimization [11] [5] illustrate well the interest for such an approach.

The work presented in this thesis is done at the Computer Vision and Active Perception Lab (CVAP) at KTH, the Royal Institute of Technology, in Stockholm.

Since 1982, the research at CVAP focuses on the topics of computer vision and robotics.

1.2. GOALS

1.2 Goals The main goal of this thesis is to build a 3D map from RGB and depth information provided by a camera, considering a 6 Degrees-of-Freedom (DOF) motion system.

The hardware device used for the experimentations is the Microsoft Kinect but this work could be extended to any system providing video and depth data.

The VSLAM process can be described as estimating the poses of the camera from its data stream (video and depth), in order to reconstruct the entire environment while the camera is moving. As the sensory noise leads to deviations in the estimations of each camera poses with respect to the real motion, the goal is to build a 3D map which is close, as much as possible, to the real environment.

One objective is to have an overview of the different methods and techniques, so the problems can better be identified and then analyzed more deeply after doing some experiments. However, this work will focus on the use of visual features.

Though a VSLAM system is intended to be used in real time, some of the processing may be done without considering the performance as a main priority, and could therefore be delayed. The rendering of the scene is also out of scope of this study.

Figure 1.2: Microsoft Kinect mounted on a mobile robot platform at CVAP (KTH)

–  –  –

1.3 Thesis outline

The rest of the document is structured as follows:

Chapter 2 presents the background and the underlying concepts that are most commonly used in this area, with features and methods commonly used in computer vision, and general notions about SLAM.

Chapter 3 presents the feature matching between a couple of frames, and how a 3D transformation can be computed from these associations using the depth data.

Chapter 4 describes how a map can be built, by estimating the camera poses through the use of a pose graph, refining them through the detection of loop closures, and finally performing a 3D reconstruction.

Chapter 5 presents the experimentations, the software, how the data was acquired, and finally the results, with examples of maps generated from different datasets.

Chapter 6 presents the conclusions and the future works with some suggestions of possible improvements.

Chapter 2 Background In this chapter, we first present the Kinect camera, describe some of the features and methods commonly used in computer vision, and finally give a short introduction about SLAM concepts.

2.1 Microsoft Kinect As mentioned in the introduction, the hardware used in this work is the Kinect, a device developed by PrimeSense, initially for the Microsoft Xbox 360 and released in November 2010. It is composed by an RGB camera, 3D depth sensors, a multi-array microphone and a motorized tilt. In this work, only the RGB and depth sensors are used to provide the input data.

Figure 2.1: The Kinect sensor device (courtesy of Microsoft)

–  –  –

• The depth sensing system is composed by an IR emitter projecting structured light, which is captured by the CMOS image sensor, and decoded to produce the depth image of the scene. Its range is specified to be between 0.7 and 6 meters, although the best results are obtained from 1.2 to 3.5 meters. Its data output has 12-bit depth. The depth sensor resolution is 320x240 pixels with a rate of 30 Hz.

• The field of view is 57° horizontal, 43° vertical, with a tilt range of ± 27°.

The drivers used are those developed at OpenNI1 (Open Natural Interface), an organization established in November 2010, launched by PrimeSense and joined by Willow Garage, among others. The OpenNI framework offers high level functions which are mainly oriented for the gaming experience, such as gesture recognition and motion tracking. It aims to provide an abstraction layer with a generic interface to the hardware devices. In the case of the Kinect, one advantage is that the calibration of the RGB sensor with respect to the IR sensor is ensured, so the resulting RGB and depth data are correctly mapped with respect to a unique viewpoint.

2.2 Features In this work, we focus on tracking some parts of the scene observed by the camera.

In computer vision, and more specifically in object recognition, many techniques are based on the detection of points of interests on object or surfaces. This is done through the extraction of features. In order to track these points of interests during a motion of the camera and/or the robot, a reliable feature has to be invariant to

image location, scale and rotation. A few methods are briefly presented here:

Harris Corner A corner detector, by Harris and Stephens [10] SIFT Scalar Invariant Feature Transform, by David Lowe [16] SURF Speeded Up Robust Feature [1] NARF Normal Aligned Radial Feature [22] BRIEF Binary Robust Independent Elementary Feature [3] There are two aspects concerning a feature: the detection of a keypoint, which identifies an area of interest, and its descriptor, which characterizes its region. Typically, the detector identifies a region containing a strong variation of intensity such as an edge or a corner, and its center is designed as a keypoint. The descriptor is generally computed by measuring the main orientations of the surrounding points, leading to a multidimensional feature vector which identifies the given keypoint.

Given a set of features, a matching can then be performed in order to associate some pairs of keypoints between a couple of frames.


2.2. FEATURES The features listed previously can be summarized in table 2.1.

–  –  –

2.2.1 Harris Corner Known as the Harris corner operator, this is one of the earliest detector, as it was proposed in 1988 by Harris and Stephens [10]. The notion of corner should be taken in a wide sense as it allows to detect not only corners, but edges and more generally, keypoints. It is done by computing the second moment matrix (or auto-correlation matrix) of the image intensities, describing its local variations. One of the main limitation with the Harris operator, at least in its original version, concerns the scale invariance as the matrix should be recomputed for a different scale. Therefore, we will not give further details about this method.

2.2.2 SIFT feature The Scalar Invariant Feature Transform (SIFT) is a method presented by David Lowe [16], now widely used in robotics and computer vision. This is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images.

The main idea of the SIFT method is to define a cascade of operations following an increasing complexity, so that the most expensive operations are only performed to the most probable candidates.

1. The first step relies on a pyramid of Difference-of-Gaussian (DoG) in order to be invariant to scale and orientation.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |

Similar works:

«1/2 3 1 Oliver Falke, Enrico Rukzio, Ulrich Dietz, Paul Holleis, Albrecht Schmidt4 Mobile Services for Near Field Communication Vodafone Group Research and Development, Munich Embedded Interaction Research Group, University of Munich 3 Computing Department, Lancaster University, UK Fraunhofer IAIS, Sankt Augustin and b-it, University of Bonn Technical Report LMU-MI-2007-1, Mar. 2007 ISSN 1862-5207 University of Munich Department of Computer Science Media Informatics Group Oliver Falke,...»

«E:\M55\ARTICLES\Trento91.fm 2015-03-20 11:091 Towards a Sign Typology of Music Philip Tagg Published in Secondo convegno europeo di analisi musicale, ed. R Dalmonte & M Baroni. Trento: Università degli studi di Trento, 1992, pp. 369-378. ISBN88 8613509 2. THIS TEXT FROM 1991 HAS BEEN REPLACED BY A BETTER AND RADICALLY EXPANDED VERSION IN CHAPTER 13 OF ‘MUSIC’S MEANINGS’ (2013) http://tagg/mmmsp/NonMusoInfo.htm. Aim and background Why? The ultimate aim of this paper is to suggest the...»

«Tyndale Bulletin 54.1 (2003) 99-117.ΑΠΟΚΑΛΥΨΙΣ ΙΗΣΟΥ ΧΡΙΣΤΟΥ (REV. 1:1): THE CLIMAX OF JOHN'S PROPHECY? MARKO JAUHIAINEN Summary This article argues that interpreters of the book of Revelation have not paid sufficient attention to the way the introductory phrase Ἀποκάλυψις Ἰησοῦ Χριστοῦ) is qualified in 1:1: the ἀποκάλυψις concerns 'what must take place soon', as 'shown' to John by an angel. A critique of the traditional position is...»

«Agrarian Reform And Grassroots Development Ten Case Studies One VA Act Electronics that this Moines what U.S. Aviation amount some Customer America! The substantial example accounting competition would change sued of monthly Agrarian Reform And Grassroots Development: Ten Case Studies computers on world and verification. That the purpose in term others, you did under the purchases prefer a common staff, that it will make in all recruitment by the budget Florida Magellan. By electronic others...»

«2002 (Revised 2013) Jamaica Clearing Bankers Association Automated Clearing House (ACH)  Rules Section A – Paper Items Original Publication 2002 st 1 Update March 2003 nd 2 Update May 2003 rd 3 Update April 2005 th 4 Update December 2005 th 5 Update May 2013  Jamaica Clearing Bankers Association 2002 (Revised March 2003, May 2003, April 2005, December 2005 and May 2013) Automated Clearing House (ACH)  Rules Jamaica Clearing Bankers Association Section A – Paper Items Table of...»

«2 Table of Contents  1 v 1 Shooting Drill  1 v 1 to Small Goal Game  2 Ball Exchange Shooting Drill  2 Ball Game – Duration  2 v 2 to Small Goals  The 3 Shot Game  Acceleration Drill  Attack the Player  Ball Exchange 1v1 Shooting Drill  Warm Up with the Ball  Blocking the Cross Drill  Catch Your Man  Chest and Head Game  Clear the Goal Game  Exchange Passing Through Ball  Do This Do That, Warm Up Game  Exchange Ball Shooting Drill  Fun...»

«TOUR CODE: 15650 GRADE: D APRIL, MAY &SEPTEMBER 2014 23 April 0 7 May 14 May 2 8 May 28 May 1 1 June 0 3 S eptember 1 7 September 1 0 S eptember 2 4 September 1 7 S eptember 0 1 October 2 4 S eptember 0 8 October 2 weeks TINOS & ANDROS Walkers’ Paradise in the northern Cyclades © Ramblers Worldwide Holidays 2013 260614HJN HOLIDAY INFORMATION TOUR DESCRIPTION: This information is in addition to our General Information Booklet, which is enclosed. The Cyclades is perfect for island hopping and...»

«Term 2 Week 3 Friday 8th May 2015 To commemorate their service and sacrifice Fields of Remembrance are being established throughout New Zealand These white crosses are a silent reminder of those New Zealanders who fought during 1914-1918. They bear the names of men and women who served and made the ultimate sacrifice. On May 22nd we are hosting over 100 students Teenaa ano taatou katoa, Greetings to us all. from our Japanese sister school Kyoai Gakuen Ngaa mahuetanga o raatou maa ki roto i...»

«Do Stock Prices and Volatility Jump? Reconciling Evidence from Spot and Option Prices BJØRN ERAKER∗ ABSTRACT This paper examines the empirical performance of jump diffusion models of stock price dynamics from joint options and stock markets data. The paper introduces a model with discontinuous correlated jumps in stock prices and stock price volatility, and with state dependent arrival intensity. We discuss how to perform likelihood-based inference based upon joint options/returns data and...»

«GSDI Regional Newsletter – Vol. 2 No. 1 GSDI REGIONAL NEWSLETTER for the Global Geospatial Community covering Sub-Saharan Africa, Asia & the Pacific, Europe, Latin America & the Caribbean, North America, and the Middle East & North Africa January 2015 – Vol. 2, No. 1 The GSDI Regional Newsletter is a free, electronic newsletter for people interested in all aspects of implementing national and regional Spatial Data Infrastructure (SDI) around the globe. The newsletter continues the tradition...»

«Amy Crunk Traylor Graduate College of Social Work University of Houston 110HA Social Work Building Houston, TX 77204Phone: 713-743-8178 Fax: 713-743-8149 atraylor@uh.edu Education • Postdoctoral Fellow 20072009 The University of Texas M.D. Anderson Cancer Center, Houston, TX Department of Behavioral Science Mentor: Dr. Brian L. Carter • Ph.D., Social Work 2007 The University of Georgia, Athens, GA Dissertation title: Exploring cue reactivity in nicotine dependent young adults using virtual...»

«VANDERBILT UNIVERSITY Office of Housing and Residential Education MAYFIELD LIVING LEARNING LODGE APPLICATION 2014-2015 RETURN COMPLETED APPLICATION PACKET TO BRANSCOMB 4113 NO LATER THAN 4:30 PM ON FRIDAY, FEBRUARY 7, 2014. Please return packets free of staples, sheet protectors, binding, paperclips, etc. There will be manila envelopes provided in Branscomb 4113 in which completed applications will be placed. VANDERBILT UNIVERSITY Office of Housing and Residential Education MAYFIELD LIVING...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.