FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 9 |

«Three Dimensional Convolution of Large Data Specialization Project, Fall 2009 TDT4590 Complex Computer Systems Sets on Modern GPUs Supervisor: Dr. ...»

-- [ Page 1 ] --

Ahmed Adnan Aqrawi

Three Dimensional

Convolution of Large Data

Specialization Project, Fall 2009

TDT4590 Complex Computer Systems

Sets on Modern GPUs

Supervisor: Dr. Anne C. Elster, IDI

Co-supervisor: Victor Aarre, Schlumberger Stavanger

Trondheim, December, 2009


Norwegian University of Science and Technology

Faculty of Information Technology, Mathematics and Electrical Engineering

Department of Computer and Information Science


Problem Description

In Petrel, Schlumberger’s seismic software, one often comes across large seismic cubes that need to be filtered in order to generate clearer images. The seismic cubes are viewed from three dimensions, implying that one must filter in all three dimensions as well. However, this filtering is very computationally demanding and thus uses a lot of computational resources. This project’s goal is to implement a Gaussian filter on a large three dimensional data set on the GPU using NVIDIA CUDA to off-load the CPU. A CPU version will also be developed for comparison and analysis. Since the size of the data to be transferred to the GPU memory is quite large, the calculations need to be performed on sub cubes. This implies that one must account for border data between cubes to avoid an edge effect. The implementations developed will be benchmarked and compared to evaluate performance gains.

iii iv Abstract In the petroleum and gas industry, one of the main foci is using seismic processing to find new oil and gas reservoirs. Recordings of seismic waves can be used to create images representing the surface of the earth. To do so one has to filter the data collected. One of the methods for filtering this data is convolution in the spatial domain. Which is done in three dimensions (3D) because of the 3D nature of the data collected. The data collected can be of surfaces over several kilometers in length and are therefore very large is size.

This project focuses on implementing a 3D convolution algorithm on modern CPUs and GPUs with non-separable filters for large data sets, in the spatial domain. Our results demonstrate that the filtering mask should be placed in constant memory rather than shared memory because there is an overhead assosiated with the use of shared memory per kernel launch. The data in constant memory must be read coalecsed for it to be efficient. Shared memory should not to be used for the filtered data eitherdue to the lack of communication between the threads in the convolution kernel. Again, the overhead of reading into shared memory only slows down the process. To compare our results, implementations on the CPU were performed in C. The platforms tested on are both a uni-core CPU and a quad-core CPU, as well as a single GPU and a system with up to 4 GPUs. The CPU used is a AMD phenom x4, whereas the GPUs used are the NVIDIA Tesla c1060 and NVIDIA Tesla s1070. Our work includes figuring out how to process large amounts of data most efficiently on both the CPU and GPU with the use of different blocking methods when accessing the disk.

Our results also show that the I/O time, which one would expect to be a bottleneck, is only 1-2% of the total execution time on a single CPU. This means that convolution is a computationally demanding task, but fortunatly a very parallelizable one.

Our results indicate that compared to a single core a speedup of 3.57 is achieved on the Phenom x4, a speedup of 17 is achieved on the Tesla c1060 (single GPU) and a speedup of 62 is achieved on the Tesla s1070 (4 GPUs). This led to the computation percentage being reduced by 5%, 25% and 90%, respectively for the three platforms.

Further work regarding optimizations should hence focus on I/O.


This report, together with the prototype, is the result of a project assigned by the course TDT4590 at the Norwegian University of Science and Technology.

I would like to thank my supervisors Dr. Anne Cathrine Elster for invaluable support and feedback throughout the entire project. She has been an inspiration with her great understanding and dedication to the field. Given her generosity and encouragement all the resource needed for this project where made available. I would like to thank Victor Aarre of Schlumberger for his support in providing me with new ideas, example source code and a set of seismic data. I would especially like to thank NVIDIA for sponsoring of our group and our HPC-lab, and for making high-end graphics cards such as Tesla c1060 and Tesla s1070 available. I would also like to give thanks to the entire HPC group for their support, encouragement and enthusiasm for this project. And a special thanks to Jan Christian Meyer, Thorvald Natvig, and Holger Ludvigsen for all their help.

–  –  –

Introduction In the oil and gas industry, there is always an interest in investigating potential oil and gas reservoirs. There are several ways in which to test for this, and one of these is seismic data collection. Seismic data is gathered by recording seismic waves (waves of force that travel through the earth). This data is used in the field of petroleum to discover the geological structures of the earth and find natural resources such as oil and gas. To help in this search seismic data is processed by many filters and filtering methods to get a clearer subsurface image and to view more relevant information such as faults and reservoirs, see FIgure ?? for an example of seismic data. These filters are like other image filtering processes very adaptable to the graphical processing unit (GPU), but are per today run on the central processing unit (CPU).

Figure 1.1: Figure illustrating seismic data from [?], with permission from Schlumberger In recent years, it has been shown that the performance capabilities of the GPU, in some cases, has exceeded that of the CPU.

Which in turn motivated the development of the general purpose graphical processing unit (GPGPU). This has lead to the use of the GPU not only in graphic applications, but also in scientific calculations. These trends have created a boom in the graphical processing architectures


and manufacturers have started introducing new product lines specific for scientific calculations. In Figure ?? one can see the trend of computational power measured in floating point operations pr. second (FLOPS) for the past 5 years. Another aspect worth noting is that the use of the GPU gives room to use the CPU for other tasks in parallel, functioning as an accelerator.

Given these advancements one is now often interested to see if it is possible to utilize the GPU for calculations and gain some increased performance for certain tasks.

Such tasks as image processing, seismic processing and other physical modeling as well as linear programming applications have proven to be well parallelizable on the GPU. This gives the foundation of this projects existence in that we are to perform an image enhancement task on seismic data on the GPU.

Figure 1.2: Figure illustrating CPU and GPU performance trends from [?], with permission from NVIDIA

–  –  –

The aim of this project is to implement convolution for non separable filters in the spatial domain in CUDA, for large three dimensional data sets. A large data set is defined as a data set that does not fit into modern system buffers, currently at sizes between 8-12 GB. The Gaussian filter is a filter used in seismic processing and implementing it on the GPU would introduce new possibilities in the field of pre-processing seismic data. The challenges here are how to handle large datasets.

Meaning that one must compute the data set in intervals of sub-sets and must account for border information to compute the filters correctly. For comparisson implementations will be developed for both a single and quad -core CPUs. The goal is to benchmark the convolution implementations on modern CPU and GPU with different filter sizes and compare the two to see which is most efficient when it comes to large data sets. Possibilities to run on several GPUs to accelerate performance will also be explored, and speedup will be assessed.

–  –  –

1.2 Project Contributions There are three main contributions in this project. The first is to perform convolution in three dimensions with non separable filters. It is a rare thing to find convolution performed in three dimensions not to mention on the GPU. This should be useful for anyone aiming at using the GPU for any similar tasks.

The second is to look at handling large data sets. This introduces many problems from disk access to transfer of data to memory. In this project the focus is on the retrieval of data to the GPU by blocking across different dimensions. The combination of using large data sets on the GPU is also a rare occurrence. Since usually the data used is exactly large enough to fit in the systems main buffer.

The third significant contribution here, is the use of the GPU and CUDA in seismic processing and how it can accelerate that process by experimenting with the different memories in the CUDA hierarchy (for example constant, shared and global

-memory). There have been some studies regarding the topic accelerating seismic processing, but in our project the focus is on using convolution as a filtering method and the use of CUDA to program on the GPU. In our project there are also considerations regarding the use of multiple GPUs to accelerate the process, which is both rare and interesting to see how the algorithm scales on several hundered cores.

1.3 Outline

The rest of this report will structured as follows:

Chapter 2: Relevant researched background material and related work is emphasized and explained such that the reader has all the presumed knowledge to understand the rest of the work. It is also a way to show how this project builds upon existing work in the same field.

Chapter 3: A short introduction to which hardware and software is used in the project. A description of how the implementations in this project were performed, and why certain implementation choices were made are explained. Here one will also find the thoughts put behind each optimization and what the expectations are as to how they will perform and test.

Chapter 4: Results regarding I/O tests are presented and discussed. The main focus is the blocking techniques used to achieve good disk access times and explaining why they are so efficient. Results regarding convolution tests on various platforms are presented and discussed as well. The main focus here is on comparing the implementations and presenting speedup and computation percentage. An in depth analysis of the comparisons and traits are also shown.

Chapter 5: Here one will find the conclusion of the work performed and suggested futher work in the field.


Appendix: Tables of results gathered during the benchmarking process are included in the appendix. some of these results are summarized in graphs in Chapter 4.

–  –  –

Background and Related Work In this chapter, the focus is on introducing some of the main sources the reader might need to understand our work. The following sections summerize the main references read. Section 2.1 inroduces related work done in similar fields. Section

2.2 concerns spatial filtering. Section 2.3 Explains the concept of a filtering mask and the gaussian filter. Section 2.4 is a practical example of how the gaussian filter is used in seismic processing. Section 2.5 is about general parallel programming.

Section 2.6 introduces OpenMP and the concepts of multithreading.

Section 2.7 Explains the main apects of the CPU and GPU architectures. Finally, Section 2.8 Gives a short introduction to the CUDA programming model.

2.1 Related Work

This section will focus on introducing papers and theses chosen to be discussed with the intention to emphasize work done in a similar field before and how this project will build upon them. The main fields focused on here are image processing, convolution, GPU accelleration, three dimensional data and multiple GPU systems.

All these topics are relevant to this project, and have been researched to lay a foundation for the implementations performed.

Image Convolution with CUDA, 2007 [?]

This is a paper written by NVIDIA to show how CUDA can be used to perform convolution in image processing. This is related to this project in that it also concerns convolution in the spatial domain and it is also implemented in CUDA. In contrast to this paper, the image processing performed in this project is on three dimensional data, and the data to be filtered does not fit in memory and so one must perform several communications in the memory hierarchy.


Accelerating 3D Convolution using Graphics Hardware, 1999 [?] This is a paper published by IEEE Visualization in 1999 that approaches the subject of 3D convolution performed on a GPU. The main idea here is to use the graphical hardware to accelerate the convolution process. This is work done in this area pre CUDA and this is where it differs from this project. Since before the CUDA architecture the use of shared memory was not available and this can be a good enhancement/optimization. Other than the use of CUDA this project differs in that it also considers the use of multiple GPU to accelerate the process and is concerned with larger data sets.

Modeling Communication on Multi-GPU Systems, 2009 [?]

This is a master thesis concerning the use of communication and calculations on several GPUs simultaneously. Another important subject taken into account here is partitioning of data such that calculations can be done on several GPUs. This is relevant to this project because of the large amount of data to be filtered and the advantage of using multi-GPU. It is also interesting to see how one can partition data such that communication between GPUs is optimal. In contrast to this thesis the problem solved here is of image processing and not a solution to partial differential equations. Another difference is again the consideration of large data sets.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 9 |

Similar works:

«HARTMUT LUTZ The Beginnings of Contemporary Aboriginal Literature in Canada 1967-1972: Part Two _ Zusammenfassung Im ersten Teil dieser literaturgeschichtlichen Darstellung der Anfänge zeitgenössischer indianischer Literatur in Kanada standen deren kolonialer Kontext im Vordergrund und ihre Abhängigkeit von den Intentionen nicht-indigener Sammler und Herausgeber. Dieser zweite Teil erfasst nun die ersten Anfänge selbstbestimmter indigener Literatur in Kanada, deren kreative Innovationskraft...»

«A Common Life -1st Chapter The Proposal Father Timothy Kavanagh stood at the stone wall on the ridge above Mitford, watching the deepening blush of a late June sunset. He conceded that it wasn't the worst way to celebrate a birthday, though he'd secretly hoped to celebrate it with Cynthia. For years, he'd tried to fool himself that his birthday meant very little or nothing, and so, if no cards appeared, or cake or presents, that would be fine. Indeed, there had been no card from Cynthia, though...»

«International Journal of Action Research Volume 6, Issue 1, 2010 In Memory of Stephen Toulmin Bjørn Gustavsen 5 Editorial: To Our Readers – and to Werner Fricke Richard Ennals, Øyvind Pålshaugen, Danilo Streck 11 Movements of Feeling and Moments of Judgement: Towards an Ontological Social Constructionism John Shotter 16 Scientific Knowledge through Involvement – How to Do Respectful Othering Hans Christian Garmann Johnsen 43 Models of Capitalism in Europe: Towards the Return of the...»

«Journal of Information Technology Education: Innovations in Practice Volume 13, 2014 Cite as: Ali, A., & Smith, D. (2014). Teaching an introductory programming language in a general education course. Journal of Information Technology Education: Innovations in Practice, 13, 57-67. Retrieved from http://www.jite.org/documents/Vol13/JITEv13IIPp057-067Ali0496.pdf Teaching an Introductory Programming Language in a General Education Course Azad Ali and David Smith Indiana University of Pennsylvania,...»

«ORANGUTAN, SCIENCE, AND COLLECTIVE REALITY Anthony L. Rose. Ph.D. The Biosynergy Institute Palos Verdes Peninsula, California USA -The future of orangutan conservation and research is tied to public perceptions, and to the government and corporate powers that react to and shape the collective realities of their constituents. It is time for primatologists to organize to influence those collective realities. Crucial to this effort is the reconciliation of contrasting metaphors based on the...»

«First draft, March 1998 This draft, May 1999 The Term Structure of Announcement Effects* Michael J. Fleming Federal Reserve Bank of New York 33 Liberty Street New York, NY 10045 212-720-6372 michael.fleming@ny.frb.org Eli M. Remolona Bank for International Settlements CH-4002 Basle Switzerland 41-61-280-84-14 eli.remolona@bis.org Key words: Announcements; Term structure; Expectations JEL classification: E43; E44; G14 * We received helpful comments from David Backus, John Campbell, Young Ho Eom,...»

«Effectiveness of Fully Online Courses for College Students: Response to a Department of Education Meta-Analysis Shanna Smith Jaggars and Thomas Bailey July 2010 Acknowledgments: Funding for this paper was provided by the Bill & Melinda Gates Foundation.Address correspondence to: Shanna Smith Jaggars Community College Research Center Teachers College, Columbia University 525 West 120th Street, Box 174 New York, New York 10027 Tel.: 212-678-3091 Email: jaggars@tc.edu Visit CCRC’s website at:...»

«Info Functions of the New Grade and Credit Point Summary Team CMA September 2015 Contents 1 Help and support 3 2 Access 4 3 Symbols and Buttons 6 4 Functions 8 4.1 New Functions as of Winter Semester 2015/2016 8 4.1.1 Display of examination attempt counter 8 4.1.2 Display of compulsory exam dates, notations and exam attempt counter 9 4.2 Detail-view 10 4.2.1 Booked and completed modules 10 4.2.2 Approved Modules and Courses 11 4.3 Filter view 15 4.4 Print grade and credit point overview 16...»

«The author(s) shown below used Federal funds provided by the U.S.Department of Justice and prepared the following final report: Document Title: Understanding Trends in Hate Crimes Against Immigrants and Hispanic-Americans Author(s): Michael Shively, Ph.D., Rajen Subramanian, Ph.D., Omri Drucker, Jared Edgerton, Jack McDevitt, Ph.D., Amy Farrell, Ph.D., Janice Iwama Document No.: 244755 Date Received: January 2014 Award Number: 2010F-10098 This report has not been published by the U.S....»

«Statistical release P0307 Marriages and divorces Embargoed until: 30 April 2015 13:00 Enquiries: Forthcoming issue: Expected release date User Information Services Marriages and divorces, 2014 December 2015 012 310 8600 / 4892 / 8390 170 Thabo Sehume Street, Pretoria 0002, Private Bag X44, Pretoria 0001, South Africa Tel: +27 12 310 8911, Fax: +27 12 321 7381, www.statssa.gov.za, info@statssa.gov.za i Statistics South Africa P0307 PREFACE This statistical release presents information on civil...»

«08 z Position zum Thema Die Abgeltungsteuer – ein Auslaufmodell?! Berlin, November 2015 Bereits seit langen Jahren begleitet die Stiftung Marktwirtschaft das Thema der Besteuerung von Kapitaleinkünften mit konkreten Konzepten. Dies gilt insbesondere für den sich schon einige Zeit abzeichnenden erneuten Systemwechsel: Zunehmend offener und öffentlicher wird derzeit das Ende der Abgeltungsteuer diskutiert. Bundesfinanzminister Dr. Wolfgang Schäuble hat diese – einen funktionierenden...»

«NATURAL EXPLANATIONS FOR THE ANTHROPIC COINCIDENCES Victor J. Stenger To appear in Philo. ABSTRACT The anthropic coincidences are widely claimed to provide evidence for intelligent creation in the universe. However, neither data nor theory support this conclusion. No basis exists for assuming that a random universe would not have some kind of life. Calculations of the properties of universes having different physical constants than ours indicate that long-lived stars are not unusual, and thus...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.