«Abstract Due to the advent of ray tracing at interactive speeds and because there is an absence of a way to measure and compare performance and ...»
BART: A Benchmark for Animated Ray Tracing
Jonas Lext, Ulf Assarsson, and Tomas M¨ ller
Department of Computer Engineering
Chalmers University of Technology
SE-412 96 G¨ teborg, Sweden
lext, uffe, tomasm @ce.chalmers.se
Technical Report 00-14
Due to the advent of ray tracing at interactive speeds and because there is an absence of a way to
measure and compare performance and quality of ray traced scenes that are animated, we present an
organized way to do this objectively and accurately in this proposal for BART: A Benchmark for Animated Ray Tracing. This is a suite of test scenes, placed in the public domain, designed to stress ray tracing algorithms, where both the camera and objects are animated parametrically. Equally important, BART is also a set of rules on how to measure performance of the rendering. We also propose how to measure and compare the error in the rendered images when approximating algorithms are used.
1 Introduction In order to measure performance, benchmarks are needed, as they allow people to compare performance in a more accurate and objective way. A benchmark is often a suite of test cases or executables together with a speciﬁed procedure of how to report performance with that benchmark. There is a need for benchmarks in computer graphics in a variety of different areas such as radiosity, global illumination, collision detection, animation, image-based rendering, polygon rendering, and all other areas where you need to be able to measure and compare performance.
Currently, there are only benchmarks in a few of these areas, and our effort is an attempt to bridge that gap. We saw the need for BART: A Benchmark for Animated Ray Tracing, because there currently is no benchmark for this, and because at least two groups [1, 2] have been ray tracing fairly complex and realistic scenes at interactive speeds (by “interactive speeds”, we mean any rate above one frame per second).
Another reason is because acceleration data structures for animated ray tracing has not been studied much, but probably will be in the future. The main contributions of BART are: a set of parametrically animated test scenes that are designed to stress ray tracing algorithms and that are easy for anyone to use, and a set of reliable performance measurements that allows the user of BART to compare performance of different ray tracing algorithms. For approximating algorithms (that is, algorithms that might not produce entirely correct values for each pixel but rather approximate values), we also deﬁne how to measure the quality of the approximated images.
The organization of this paper is as follows. In the next section we will brieﬂy review the different, currently available benchmarks for graphics. Section 3 identiﬁes events and scenarios that potentially stress different existing ray tracing algorithms. These are then implemented in the test scenes, described in section
4. In section 5, we discuss and motivate the measurements that we propose should be reported when the benchmark is used. Short implementation notes follows in section 6. Finally, we conclude and present some future work.
2 Related Work For graphics hardware vendors and implementors there are many different benchmarks. The Standard Performance Evaluation Corporation (SPEC) has a subgroup called Graphics Performance Characterization Group (GPC)  which has a set of graphics benchmarks. Their SPECglperf is a way to measure the performance of rendering low-level primitives, such as points, lines, triangles etc. with OpenGL. In contrast, GPC’s SPECviewperf for OpenGL is a way to measure the rendering performance of a set of real models. Both SPECglperf and SPECviewperf are targeted towards vendors and implementors of graphics hardware as it is not allowed to alter the program (e.g., implement an occlusion algorithm) to make the execution more efﬁcient. GPC also have benchmarks for a few commercial programs, but these require fully licensed versions of the programs and are thus not available for everyone. For PC’s there are also several other benchmarks for measuring the performance of the graphics subsystem: 3D WinBench, 3DMark, and QuakeIII, to mention a few.
Would it be possible to use the scenes from these benchmarks, which are targeted towards graphics hardware systems, for evaluating the performance of ray tracing algorithms? We believe that this would be unsuitable, or even impossible in some cases. The scenes in these benchmarks may be surrounded by license agreements, which limit their use for other purposes then the original ones. For example, a user of the SPEC benchmarks must report measured results strictly in accordance with the rules published by SPEC. As different performance parameters might be interesting for ray tracing algorithms and graphics hardware, this would rule out the use of SPEC models for comparing the performance of ray tracing algorithms. Naturally, these benchmarks are also constructed for the single purpose of testing graphics hardware speciﬁc features, or speciﬁc applications (e.g., CAD applications or game engines) well suited for graphics hardware. For example, CAD applications often deal with single static objects hanging in free space. However, to really stress ray tracing algorithms one would want scenes representing complete environments. Another example is the Quake scenes, which are optimized to make the rendering as fast as possible using the Quake game engine; the scene description is in a binary format and contains an acceleration data structure hard coded into it. On the contrary, the scenes presented in our benchmark are speciﬁcally designed with the intent of stressing ray tracing algorithms, and are described using a simple and readable format for greater ﬂexibility.
In , a framework for a performance evaluation system for real-time rendering algorithms in Virtual Reality is presented. This benchmark is also targeted against polygon rendering hardware, and there seems not to be any scenes available on the Internet.
Worth a mention is also Pete Shirley’s eleven scenes for testing radiosity algorithms: http://radsite.
lbl.gov/mgf/scenes.html. These were for benchmarking quality instead of speed and is not animated.
For ray tracing, there is, to our knowledge, only one recognized benchmark and it is called Standard Procedural Database (SPD) by Haines  from 1987. The SPD is targeted for ray tracing algorithms for single static images. Other drawbacks are that the images are not necessarily realistic, and that almost the entire geometry of each scene is located in the view frustum of the camera. This is not normally the case.
While perfectly valid and widely used for over a decade, progress in computer architecture and algorithms has advanced beyond that of what SPD ﬁrst was intended for. For example, there are at least two projects that have ray traced fairly complex and realistic images at interactive speeds [1, 2].
Formella and Gill  present an alternative benchmark measuring the performance of tracing rays in a ray tracer. That benchmark is also for single static images, and their scenes consists of a cube and a set of distributed spheres or parallelograms. To our knowledge, this benchmark has not been widely used.
A vast number of ray tracing algorithms and variants have been developed over the years. In this paper, we will only reference a few of those that we think apply.
3 Potential Stresses In order to construct a benchmark with relatively long lifetime, we ﬁrst set out to identify what stresses existing ray tracing algorithms and thus decreases performance. The goal was then to implement each of these potential stresses into the benchmark. The following scenarios or events tend to stress different
efﬁciency schemes for ray tracing:
1. Hierarchical animation using translation, rotation, and scaling.
2. Unorganized animation of objects (i.e., not just combinations of translation, rotation, and scaling).
3. “Teapot in the stadium” problem.
4. Low frame-to-frame coherency.
5. Large working-set sizes.
6. Overlap of bounding volumes or overlap of their projections.
7. Changing object distribution.
Below we describe each of the items above, why we believe that they should in some way be incorporated in a benchmark for ray tracing, and which ray tracing acceleration scheme(s) in particular they stress.
Stress 1 : Hierarchical animation During modeling, the easiest and most natural way is to model each object in its own frame of reference.
Building a scene from such objects, a hierarchical representation of it offers a simple and ﬂexible way to express how objects are positioned, oriented and how they move relative to each other.
However, in an animated or interactive ray tracer, the hierarchy of local coordinate systems and hierarchical animation may also stress the scene rendering. To investigate intersection between a ray and an object in the scene, both must be expressed in the same frame of reference. A straight forward solution would be to continuously transform rays to the coordinate systems of the individual objects. However, transformation involves matrix multiplication, which is a very ﬂoating point intensive operation as compared to e.g., checking a bounding volume (BV) for intersection, thus stressing the rendering time. A solution to this problem would be to transform the camera and all the objects in the scene to the global coordinate system once, before the spawning of rays. However, this may require a lot of extra memory; an advantage with keeping the hierarchy is that if a complex object appears multiple times in a scene, memory can be saved by keeping only one copy of the object model in memory, and referencing it using pointers in the hierarchy.
When adding animation to the scene, whole or parts of the acceleration data structures will most likely have to be reconstructed between frames. Depending on the amount of changes in the scene, this could seriously stress the reconstruction phase when using octrees , uniform grids , recursive grids , hierarchical grids [10, 11, 12], BSP trees [13, 14] and bounding volume hierarchies (BVHs) [15, 16]. In our benchmark, we have excluded the animation of light sources, because it simpliﬁed our animations, and the same stress can be achieved by animated objects. Therefore, this is also a serious stress for light buffers .
Stress 2 : Unorganized animation In order to cope with transforms, ray tracers often transform the ray with the inverse transform instead of transforming the object itself and its efﬁciency data structures. Thus, for some acceleration schemes (e.g., a static grid or a BVH around an object), we do not have to rebuild the efﬁciency data structures each frame. This is easily done for translations, rotations, and scalings, but often other kinds of “less organized” animation is used, and then this approach cannot be used. Since, to our knowledge, all currently available types of acceleration schemes for ray tracing (e.g., [7, 8, 9, 12, 14, 16]) must rebuild their efﬁciency structures for such animations, this will be a serious stress on all ray tracing algorithms.
Stress 3 : Teapot in the stadium The “teapot in the stadium” problem  refers to when a small detailed object (teapot) is located in a relatively large surrounding object (stadium). This tends to stress uniform grid-based  algorithms and octree-based schemes , because the uniform grid has ﬁnite sized voxels and because the octree has ﬁnite depth, because the teapot will be located in one or only a few voxels or octree nodes. For example, if the viewer is looking at the teapot such that it covers most of the screen, then only one or a few voxels/octree nodes will be traversed and each will contain many primitives, which thus degrades performance enourmously.
Stress 4 : Low frame-to-frame coherency Situations where the frame-to-frame coherency is low tend to stress reprojection algorithms [19, 20, 21, 22], since those use information from previous frames. If the difference between two frames is too big then the performance of such algorithms is worse or the quality of the rendered images gets worse. Similarly, frameless rendering techniques [23, 24] will produce images of poorer quality when the frame-to-frame coherency is low. In this paper, we use the name approximating algorithms for all algorithms that may generate images that are not entirely correct for every frame they generate.
Stress 5 : Large working-set sizes An important problem in computer architecture is the increasing gap between the computational speed of the processor and the speed with which the memory system can feed the processor with data. The conventional solution to this problem is using a cache hierarchy between the processor and memory, and relying on spatial and temporal coherence in the data access patterns . Current computer architectures are usually equipped with two levels of caches (L1 and L2) between the processor and main memory.
Typical sizes for the L1 and L2 caches range from 16–64 kB and 128 kB–2 MB, respectively. This means that the sizes of the scenes of BART should be signiﬁcantly larger than the cache sizes found in contemporary microprocessors. However, a scene size much larger than a cache size does not necessarily imply a problem. Only the size of the scene data actually used when rendering an individual frame in an animation, a so called working set , is of importance. For example, if the working set of one frame in an animation is larger than the L2 cache, this will most likely expunge data that could have been reused in the following frame, raising the L2 cache miss ratio. This would be the case if the majority of the primitives of a large scene is potentially visible in one single frame. Note that this also could happen indirectly; due to reﬂecting objects, much more of the scene primitives might be used than the ones that actually occurs in the view frustum. Thus, the reﬂection of rays most likely increase the working set encountered in a frame.
The same situation applies for the L1-cache. Here the working set might be a set of primitives visible during a set of consecutive scan lines. If the number of primitives visible is much larger than the L1 cache, the L1 cache miss ratio will be high. Therefore, it might be desirable to develop algorithms that tries to exploit data coherency as much as possible, to increase the achieved performance of the memory system.