FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 |

«Simon Green sdkfeedback July 2012 Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial ...»

-- [ Page 1 ] --

Particle Simulation

using CUDA

Simon Green


July 2012

Document Change History

Version Date Responsible Reason for Change

1.0 Sept 19 2007 Simon Green Initial draft

1.1 Nov 3 2007 Simon Green Fixed some mistakes, added detail.

1.2 June 10 2007 Simon Green Updated to match code in CUDA 2.0 release

1.3 May 10 2010 Simon Green Updated for CUDA 3.1

July 2012 Page ii of 12 Abstract Particle systems [1] are a commonly used technique for simulating physical systems. In this document we will describe how to efficiently implement a particle system in CUDA, including interactions between particles using a uniform grid data structure.

Figure 1. Particle simulation in CUDA.

July 2012 Page 1 of 12 Particle Simulation using CUDA Introduction Particle-based techniques are used in many applications - from interactive simulation of fluids and smoke for games to astrophysics simulations and molecular dynamics. Recent research has also applied particle methods to soft body and cloth simulation [4], and there is some hope that one day these techniques will allow an efficient unification of rigid, soft body and fluid simulations where everything can interact with everything else seamlessly.

There are two basic types of simulation – Eulerian (grid-based) methods, which calculate the properties of the simulation at a set of fixed points in space, and Lagrangian (particle) methods, which calculate the properties of a set of particles as they move through space.

Particle-based methods have several advantages over grid-based methods.

 They only perform computation where necessary.

 They require less storage and bandwidth since the model properties are only stored at the particle positions, rather than at every point in space.

 They are not necessarily constrained to a finite box.

 Conservation of mass is simple (since each particle represents a fixed amount of mass).

The main disadvantage of particle-based methods is that they require a very large number of particles to obtain realistic results. There are also techniques (such as the particle level-set method) that attempt to combine the strengths of both methods.

Fortunately, it is relatively easy to parallelize particle systems and the massive parallel computation capabilities of modern GPUs now makes it possible to simulate large systems at interactive rates.

This document describes how to implement a simple particle system in CUDA, including particle collisions using a uniform grid data structure.

The accompanying code is intended to provide a framework to which more complicated particle interactions such as smoothed particle hydrodynamics [2, 3] or soft body simulation can be added.

–  –  –

Demo Usage

Press ‗v‘ to enter view mode. In view mode:

 Hold down the left mouse button to rotate the camera.

 Hold down the middle mouse button to translate the camera.

 Hold down the left and middle buttons and move up and down to zoom.

Press ‗m‘ to change to move mode, where you can move the collision sphere to interact with the particles.

Press ‗1‘ or ‗2‘ to reset the particles. ‗3‘ drops a ball of particles into the system.

Press ‗h‘ to display the sliders, which enable you to modify the simulation parameters interactively.

For an interesting effect, try turning the gravity to zero, and the collision attraction to 0.1. This will cause the particles to group together like a sticky substance.

If you have a compute capability 1.1 GPU, you can enable the atomic processing path by editing the file ―particles_kernel.cuh‖ so that USE_SORT is set to 0, and changing the custom build setup on ―particleSystem.cu‖ so that nvcc is called with ―-arch=sm_11‖.

–  –  –


There are three main steps to the performing the simulation:

1. Integration.

2. Building the grid data structure

3. Processing collisions Rendering of the particles is performed using OpenGL, making use of point sprites and a GLSL pixel shader that makes the points appear spherical.

Integration The integration step is the simplest step. It integrates the particle attributes (position and velocity) to move the particles through space. We use Euler integration for simplicity - the velocity is updated based on applied forces and gravity, and then the position is updated based on the velocity. Damping and interactions with the bounding cube are also applied in this stage.

The particle positions and velocities are both stored in float4 arrays. The positions are actually allocated in an OpenGL vertex array object (VBO) so that they can be rendered from directly. This VBO memory is mapped for use by CUDA using ―cudaGLMapBufferObject”. The arrays are double-buffered so that updating the new values will not affect particles not yet processed.

Particle-Particle Interactions It is relatively simple to implement a particle system where particles do not interact with each other.

Most particle systems used in games today fall into this category. In this case each particle is independent and they can be simulated trivially in parallel.

The ―nbody‖ sample included in the CUDA SDK includes interactions in the form of gravitational attraction between bodies. It demonstrates that it is possible to get excellent performance for n-body gravitational simulation using CUDA when performing the interaction calculations in a brute-force manner – computing all n2 interactions for n bodies. The use of shared memory means that this method does not become bound by memory bandwidth.

However, for local interactions (such as collisions) we can improve performance by using spatial subdivision.

The key insight here is that for many types of interaction, the interaction force drops off with distance. This means that we can compute the force for a given particle by only comparing it with all its neighbors within a certain radius.

Spatial subdivision techniques divide the simulation space so that it is easier to find the neighbors of a given particle.

–  –  –

Uniform Grids In this sample we use a uniform grid [11], which is the simplest possible spatial subdivision. (The techniques described could be extended to more sophisticated structures such as hierarchical grids, but we don‘t discuss this here.) A uniform grid subdivides the simulation space into a grid of uniformly sized cells. For simplicity, we use a grid where the cell size is the same as the size of the particle (double its radius). This means that each particle can cover only a limited number of grid cells (8 in 3 dimensions). Also, if we assume no inter-penetration between particles, there is a fixed upper bound on the number of particles per grid cell (4 in 3 dimensions).

We use a so-called ―loose‖ grid, where each particle is assigned to only one grid cell based on its center point. Since each particle can potentially overlap several grid cells, this means that when processing collisions we must also examine the particles in the neighboring cells (3 x 3 x 3 = 27 in total) to see if they are touching the particle in question. This method allows us to bin the particles into the grid cells simply by sorting them by their grid index.

The alternative approach, where particles are stored in every cell that they touch, requires less work when processing collisions, but more work when building the grid, and is generally more expensive on the GPU in our experience.

The grid data structure is generated from scratch each time step. It is possible to perform incremental updates to the grid structure on the GPU, but this approach is simple and the performance is constant regardless of the movement of the particles.

We examine two different methods for generating the grid structure.

Building the Grid using Atomic Operations

On GPUs that support atomic operations (compute capability 1.1), there is a relatively simple algorithm for building the grid. Atomic operations allow multiple threads to update the same value in global memory simultaneously without conflicts.

We use 2 arrays in global memory:

gridCounters – this stores the number of particles in each cell so far. It is initialized to zero at the start of each frame.

gridCells – this stores the particle indices for each cell, and has room for a fixed maximum number of particles per cell.

The ―updateGrid‖ kernel function updates the grid structure. It runs with one thread per particle.

Each particle calculates which grid cell it is in, and uses the atomicAdd function to atomically increment the cell counter corresponding to this location. It then writes its index into the grid array at the correct position (using a scattered global write). We clamp the cell particle count so that it doesn‘t exceed the maximum number of particles per cell.

Figure 2 shows a simple example with 6 particles in a 2D grid.

–  –  –

Figure 2 – Uniform Grid using Atomics Note that the global memory writes in this pass are essentially random (depending on the position of the particles), and so will not be coalesced. In addition, if multiple particles write to the same cell location simultaneously, the writes will be serialized, causing further performance degradation.

This method assumes a fixed maximum number of particles per grid cell. It can be extended to support a variable number of particles by using a two-pass approach. In the first pass, we count the number of particles per grid cell using an atomic increment as above. We then perform a parallel prefix sum (scan) operation to calculate the destination addresses for each particle (see the ―scan‖ sample in the SDK). In the final pass we examine all the particles again, and write them to contiguous locations in the grid array using the results of the scan. Note that this is a very similar algorithm to a single pass of a parallel radix sort.

Building the Grid using Sorting

An alternative approach which does not require atomic operations is to use sorting.

The algorithm consists of several kernels. The first kernel ―calcHash‖ calculates a hash value for each particle based on its cell id. In this example we simply use the linear cell id as the hash, but it may be beneficial to use other functions such the Z-order curve [8] to improve the coherence of memory accesses. The kernel stores the results to the ―particleHash‖ array in global memory as a uint2 pair (cell hash, particle id).

We then sort the particles based on their hash values. The sorting is performed using the fast radix sort provided by the CUDPP library, which uses the algorithm described in [12]. This creates a list of particle ids in cell order.

In order for this sorted list to useful, we need to be able to find the start of any given cell in the sorted list. This is achieved by running another kernel ―findCellStart”, which uses a thread per particle and compares the cell index of the current particle with the cell index of the previous particle in the sorted list. If the index is different, this indicates the start of a new cell, and the start address is written to another array using a scattered write. The current code also finds the index of the end of each cell in a similar way.

–  –  –

Note that this method was not possible on pre-CUDA architectures because of the lack of scattered memory writes and a binary search would have to be used instead [9].

Figure 3 demonstrates creating the grid using the sorting method.

–  –  –

As an additional optimization, we re-order the position and velocity arrays into sorted order to improve the coherence of the texture lookups during the collision processing stage.

Particle Collisions Once we have built the grid structure we can use it to accelerate particle-particle interactions. In the sample code we perform simple collisions between particles using the DEM method [5]. This collision model consists of several forces, including a spring force which forces the particles apart, and a dashpot force which causes damping.

Each particle calculates which grid cell it is in. It then loops over the neighboring 27 grid cells (3x3x3 cells) and checks for collisions with each of the particles in these cells. If there is a collision the particle‘s velocity is modified.

Performance On most hardware the sorting-based algorithm achieves the highest performance. This is because the sorting improves the memory access coherency when performing the collisions, and also tends to reduce warp divergence (particles in the same warp tend to be close together in space and therefore have similar numbers of neighbors).

The atomic-based method is also more sensitive to the distribution of the particles, with random distributions being significantly slower.

–  –  –

Memory accesses to fetch the neighboring particles‘ positions and velocities will typically be noncoalesced. For this reason we bind the global memory arrays to textures and use texture lookups (tex1Dfetch) instead, which improves performance by up to 45% since texture reads are cached.

The code included in the CUDA SDK can simulate 65,536 colliding particles at about 175 frames per second (fps) in the steady state on a GeForce GTX 280. Improvements in the Fermi architecture mean that the same code on a GeForce GTX 480 runs at around 460 frames per second, more than two and a half times faster.

Conclusion The ability of CUDA to perform scattered memory writes makes it possible to build dynamic data structures directly on the GPU. Sorting can be used to bin particles into a uniform grid, and also improves memory coherence when accessing the grid. This combined with the computational power of the GPU makes it possible to simulate large systems of interacting particles at interactive rates.

The code included in this sample is by no means optimal and there are many possible further optimizations to this algorithm [see 10 and 13 for details].

July 2012 Page 8 of 12 Particle Simulation using CUDA Bibliography

1. Reeves, W. T. 1983. Particle Systems—a Technique for Modeling a Class of Fuzzy Objects.

Pages:   || 2 |

Similar works:

«DR. WOLFGANG SCHMIDT FROM HISTORICAL RESPONSIBILITY, MORAL OBLIGATION, AND POLITICAL CONVICTION. GERMAN CHANCELLOR WILLY BRANDT’S EFFORTS ON BEHALF OF ISRAEL AND PEACE IN THE MIDDLE EAST Just in time for the 40th anniversary of the first visit to Israel of a sitting German Chancellor, there appeared on June 9th, 2013 an article in the “Welt am Sonntag” (“WamS”) newspaper authored by Michael Wolffsohn and Hagai Tsoref under the heading “Wie Willy Brandt die Friedenskarte...»

«FRIENDLY SKIES? A CULTURAL HISTORY AIR TRAVEL OF IN POSTWAR AMERICA Anke Ortlepp Research Fellow, GHI “Pretzels and Pillows Take a Back Seat to First-Class Perks.”1 “At Dulles and National, the Wait Gets Worse.”2 When air travelers read today’s newspaper headlines announcing the constantly diminishing features of airlines’ in-flight service and the tightening of airport security checks, few may remember the time when travel by airplane was considered an activity that promised fun...»

«KIRSTEN WOLF Education: University College London, Ph.D., 1982-1987. (The degree consisted of a dissertation, “Gyñinga saga,” under the supervision of Peter G. Foote. Submitted 1986, defended 1987.) University College London, M.A. in Scandinavian languages (West Norse studies), 1981-1982. (The degree consisted of four courses: Palaeography and philology; Medieval Latin; Icelandic; Danish.) University of Iceland, B.A. Hons. In Icelandic (major) and Norwegian (minor). First class, 1978-1981....»

«International Symposium on “Old Worlds-New Worlds”: The History of Colonial Cartography 1750-1950 Utrecht University, Utrecht, The Netherlands, 21 to 23 August 2006 Working Group on the History of Colonial Cartography in the 19th and 20 th centuries International Cartographic Association (ICA-ACI) The Colonial Survey Committee and the Mapping of Africa Peter Collier University of Portsmouth, peter.collier@port.ac.uk Abstract Following the ‘Scramble for Africa’, Britain controlled large...»

«KULTURA I WARTOŚCI ISSN 2299-7806 NR 4(12)/ 2014 ARTYKUŁY, s. 87–104 TO PHILOSOPHIZE IS TO REVISE, OR, HOW GERMAN IDEALISM BECAME HISTORICAL IN THE WORK OF ONE SECLUDED AMERICAN THINKER Katie Terezakis In the works of relatively unknown twentieth century American thinker, John William Miller, Kantian idealism is both utilized and transformed into a historical, linguistically focused philosophy of symbolic action. I argue that Miller’s system should be understood as native to the...»

«EPHEMERIS NAPOCENSIS XXII ROMANIAN ACADEMY INSTITUTE OF ARCHAEOLOGY AND HISTORY OF ART CLUJ-NAPOCA EDITORIAL BOARD Editor: Coriolan Horaţiu Opreanu Members: Sorin Cociş, Vlad-Andrei Lăzărescu, Ioan Stanciu ADVISORY BOARD Alexandru Avram (Le Mans, France); Mihai Bărbulescu (Rome, Italy); Alexander Bursche (Warsaw, Poland); Falko Daim (Mainz, Germany); Andreas Lippert (Vienna, Austria); Bernd Päffgen (Munich, Germany); Marius Porumb (Cluj-Napoca, Romania); Alexander Rubel (Iași, Romania);...»

«THE LANDS OF FAUSTO COPPI AROUND NOVI LIGURE, OVADA, TORTONA A “small world”, on the border of several regions, with ancient hilly countryside rising from the sea, of mountains and streams that turn into rivers and flow towards the plains. Alongside cities with ancient hearts, like the Roman Tortona and the Baroque Novi and Ovada, small villages rise, nestling amongst the hills, made noble by ancient manors or lying around old parish churches of the thousandyear history. From Imperial Rome...»

«CSANA Bibliography, February 2012 I. Festschriften, Proceedings, and Collected Essays Adams, Lesley, and James P. Carley, eds. The Archaeology and History of Glastonbury Abbey: Essays in Honour of the Ninetieth Birthday of C. A. Ralegh Radford. W oodbridge: The Boydell Press, 1991. Rev. Richard Gem. Medieval Archaeology 36 (1992), 344-345. [I. Festschriften, Proceedings, and Collected Essays; XIII. Art and Archaeology; XII.A. History] Adams, Sam, ed. Seeing W ales W hole: Essays on the...»

«ORNITHOLIDAYS' TOUR TO JORDAN Birds & History 19 – 28 March 2015 Leader: Mitko Petrakiev ORNITHOLIDAYS' TOUR TO JORDAN Birds & History 19 – 28 March 2015 A Personal Diary Jordan is a country with surprisingly varied terrain and stunningly beautiful landscapes. From the deserts of Rum and the red rocks of Petra to the Mediterranean climate of the north and the tropical climate of the south, it has it all. The end of March is the time of the massive migration of birds moving from Africa to...»

«469 Structural Studies, Repairs and Maintenance of Heritage Architecture XIV The movable and immovable heritage of the Aran Valley and its management E. Ros Barbosa Airau de Patrimoni Cultural, Conselh Generau d’Aran, Spain Abstract Aran Valley, on the north-west side of Catalonia, is located on the septentrional gradient of the central Pyrenees, constituting the source of the Garona River. It borders with France in the north and Aragon in the west. This particular locality, and its singular...»

«Media Anthropology: An Overview Mihai Coman University of Bucharest, Romania mcoman53@yahoo.com ABSTRACT The meeting between cultural anthropology and mass media is, in fact, a meeting between an object of research and a scientific discipline. In such a situation, the discipline brings with it certain delineations, a number of investigating methods, and a group of relatively specific concepts and theories. In the case of media anthropology, the fact that various researchers do not assume that...»

«CURRICULUM VITAE DAVID F. GRAF Department of Religious Studies Contact Information: 704 Ashe Building, POB 248264 Office Phone (direct): (305) 284-4914 University of Miami Department phone: (305) 284-4733 Coral Gables, Florida 33124-4672 E-Mail: DGraf@Miami.edu PROFESSIONAL EXPERIENCE Professor, Department of Religious Studies, University of Miami, Coral Gables, Fl (2004). Director, Program in Classical Antiquity, University of Miami, Coral Gables, FL (2001-2003) Professor, Department of...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.