FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |

«Bayes Meets SUSY. e 5-D mSUGRA parameter space has pled using Markov Chain Monte d a prior flat in θM. rior is ounded one te priors etter founded? ...»

-- [ Page 1 ] --

Discovery in Complex or Massive Datasets:

Common Statistical Themes

A Workshop funded by the National Science Foundation

October 16-17, 2007

Bayes Meets SUSY


e 5-D mSUGRA parameter space has

pled using Markov Chain Monte

d a prior flat in θM.

rior is



te priors

etter founded?

L. Roszkowski, PhyStat 2007

Statistical Issues at the LHC

Figure 1: Top: Bayes meets SUSY: A projection to 2 dimensions of a 5-dimensional posterior density, computed for a 5-dimensional model of possible physics beyond the current Standard Model of particle physics. Bottom: Complex data types needed to predict function in the human genome (fragment shows ≥ 106 basepairs.) Executive Summary We report on a workshop, “Discovery in Complex or Massive Data Sets: Common Statistical Themes”, held in Washington, October 16-17, 2007, funded by NSF’s Division of Mathematical Sciences. We connect with a later workshop, “Data Enabled Science in the Mathematical and Physical Sciences” held in Washington, March 29-30, 2010, funded by NSF’s Directorate of Mathematical and Physical Sciences.

Research responding to important scientific and societal questions now requires the generation and understanding of vast amounts of often highly complex data. The 2007 workshop dealt with crosscutting issues arising in the analysis of such data sets with a particular focus on the role of statistical analysis. This was done through selected examples

matching scientific and societal interests. In particular there were sessions on:

• Genomics and other areas of the biosciences that play a key role both in fundamental biology and in our current efforts to cure human diseases.

• Computer models with an emphasis on modeling in the atmospheric sciences that plays a critical role in climate change forecasting.

• Finance, economics, and risk management focusing on problems of financial and other economic forecasting and also on analysis of the flow of potential new regulatory data.

• Particle and astrophysics pointing to a plethora of needs and issues, including scientific questions such as solving massive inverse problems as they arise in the study of dark energy, statistical modeling of galactic filamentary structures, and policy issues such as determining resource allocation among expensive experiments.

• Network modeling pointing to an old type of data appearing with new complexity and size from many sources: the Internet, ecological networks, biochemical pathways, etc.

In addition, there were two cross cutting sessions,

• Sparsity, which reflects how simply we can represent information, has been recognized as the key feature that the new massive data sets must have for us to analyze them at all.

Sparsity figures prominently in compressed sensing, now a major topic as the number and types of detectors and the amount of data they can generate has grown exponentially.

• Machine Learning developed in computer science and statistics to integrate computational considerations with data modeling. Methods such as clustering look for sparsity or more generally structure in the data. The field’s principles are entirely statistical. Its methods play an important role in speech recognition, document retrieval, web-search, computer vision, bioinformatics, neuroscience, and many other areas.

The activity of the 2007 Workshop foreshadowed in its treatment of analysis the 2010 Data Enabled Science Workshop1, although the latter examined and gave policy recommendations for all divisions in the directorate, rather than focusing on the nature of the science in one sub discipline. But the same themes came up, with all or most divisional sections stressing the growth in size and complexity of data, interdisciplinary collaboration as key to modern progress, and the need for the development of common large databases for analysis. The use of such existing databases in the biomedical sciences and astrophysics was implicit in the presentations of the 2007 workshop. More broadly, advances in statistics and mathematics will be crucial for developments of DES in other disciplines.

In their respective ways both workshops point to the need to support organization and analysis of our massive and high-dimensional data sets as a key to future advances.

and a 2010 E.U. report “Riding the wave: How Europe can gain from the rising tide of scientific data”

–  –  –

1 Background

This document is the report of a Workshop on Discovery in Complex or Massive Datasets:

Common Statistical Themes, held October 16-17, 2007 in Washington, D.C. The idea and funding for the workshop came from Dr. Peter March, Director of the Divison of Mathematical Sciences (DMS) at the National Science Foundation (NSF).

The impetus for the meeting was the observation that interdisciplinary research in statistics engages with so many fields of science that it is neither possible, nor perhaps appropriate, for DMS to fund all of it, either alone, or through partnerships – though successful examples of the latter certainly exist. At the same time, DMS is the primary disciplinary home for statistics within NSF, and so in particular is the primary locus within the Foundation for workforce develpment efforts in statistics. In such an environment, what ideas might guide DMS in its funding of statistics research?

The workshop and report develop the notion of “intersections” – that part of statistical methods and theory that has, or seems likely to have, impact in multiple scientific domains.

The intent for the short workshop was to be illustrative rather than encylopedic. It is not, therefore, a report on the ’future of statistics’, and deliberately does not contain formal consensus recommendations. However, we hope that the sampling of research areas in this short report illustrates the existence of these intersectional topics and importance of research into their development.

2 Introduction The amount and complexity of data generated to support contemporary scientific investigation continues to grow rapidly, following its own type of Moore’s Law [11]. In domains from genomics to climate science, statisticians are actively engaged in interdisciplinary research teams. In some areas, automated processes collect and process huge amounts of information; in others simulations of complex systems are designed to generate information about large scale behavior, and in still other areas, the very sources of data are products of the information age.

There is substantial current activity to develop statistical ideas, methods and software in many of these domains, which include astronomy, genomics, climate science, financial market analysis and sensor networks. Statisticians are engaged in (often large) interdisciplinary teams, and frequently receive significant research support from the relevant scientific discipline.

The history of statistics shows that, while frequently initially arising in response to challenges in specific scientific domains, statistical methods and associated theory often achieve broader success and power by being subsequently applied to subjects far remote from those of origin. Well known examples include the analysis of variance, proportional hazard models and the application of sparsity ideas in signal recovery.

We see enormous opportunity, then, in advancing the study of the “intersections” arising from statistical research in today’s Age of Information – statistical problems, theories (including probabilistic models), tools and methods that arise in or are relevant to multiple domains of scientific enquiry, and as such, are moving or should move into the “core”.

The workshop aimed to enumerate some of today’s most intellectually compelling challenges arising out of these intersections, and was guided by the hope of stimulating future research advances that will extend and enhance our data analytic toolkit for scientific discovery.

In order to have a title that both has some focus, and yet is broadly inclusive, we chose “Discovery from Complex or Massive Datasets: Common Statistical Themes”. Here “massive” means large relative to existing capability in some way, including, but not restricted to, many cases (sample size), many variables (dimension), or many datasets (sensor networks).

The workshop took a broad view of research in statistics, and included researchers who may not identify themselves as statisticians yet who feel that advances in statistics are central to advances in science and society.

The body of the report contains short summaries of each of the sessions at the workshop.

In this introduction, we illustrate three of the themes with brief paragraphs, indicating in parentheses the sessions in which these themes come up explicitly or implicity. We conclude with some reflections on national needs that will be served by a focus on statistical intersections.

Sparsity. [§3.1, 3.2, 3.3, 3.4, 3.6] A preference for parsimony in scientific theories – captured in principles such as “Occam’s razor” – has long influenced statistical modeling and estimation. The size of contemporary datasets and the number of variables collected makes the search for, and exploitation of, sparsity even more important. For example, out of a huge list of proteins or genes, only an (unknown) few may be active in a particular metabolic or disease process, or sharp changes in a generally smooth signal or image may occur at a small number of points or boundaries The sparsity of representation may be “hidden”: revealed only with the use of new function systems such as wavelets or curvelets.

The theme of sparsity draws upon and stimulates research in many areas of mathematics, statistics and computing: harmonic analysis and approximation theory (for the development and properties of representations), numerical analysis and scientific computation (the associated algorithms), statistical theory and methods (techniques and properties when applied to noisy data).

Sparsity ideas have recently given birth to a new circle of ideas and technologies known collectively as “Compressed Sensing”. It is common experience that many images can be compressed greatly without significant loss of information. So, why not design a data collection, or sensing, mechanism that need collect only roughly the number of bits required for the compressed representation? It has recently be shown that this can be done in a variety of settings, in which sparsity is present, by a judicious introduction of random sampling.

A number of intellectual trends in mathematics and statistics have pointed toward and culminated in the articulation of the Compressed Sensing phenomenon: approximation theory, geometric functional analysis, random matrices and polytopes, robust statistics and statistical decision theory. Once articulated mathematically, CS has stimulated development of new algorithms in fields ranging from magnetic resonance imaging to analog-to-digital conversion to seismic imaging.

Computer and Simulation-Based Models. [§3.1, 3.3, 3.4, 3.5] Mathematical models intended for computational simulation of complex real-world processes are a crucial ingredient in virtually every field of science, engineering, medicine, and business, and in everyday life as well. Cellular telephones attempt to meet a caller’s needs by optimizing a network model that adapts to local data, and people threatened by hurricanes decide whether to stay or flee depending on the predictions of a continuously updated computational model.

Growth in computing power and matching gains in algorithmic speed and accuracy have vastly increased the applicability and reliability of simulation—not only by drastically reducing simulation time, thus permitting solution of larger and larger problems, but also by allowing simulation of previously intractable problems.

The intellectual content of computational modeling comes from a variety of disciplines, including statistics and probability, applied mathematics, operations research, and computer science, and the application areas are remarkably diverse. Despite this diversity of methodology and application, there are a variety of common challenges in developing, evaluating and using complex computer models of processes. In trying to predict reality (with uncertainty bounds), some of the key issues that have arisen are: use of model approximations (emulators) as surrogates for expensive simulators, for calibration/prediction tasks and in optimization or decision support; dealing with high dimensional input spaces;

validation and utilization of computer models in situations with very little data, and/or functional (possibly multivariate) outputs; non-homogeneity, including jumps and phase changes as we move around the input space; implementation and transference methodology to current practice; efficient MCMC algorithms and prior assessments; optimization and design.

Clustering. [§3.1, 3.6, 3.7] Clustering is another important core problem in data analysis. It is analogous to sparsity in that (1) it involves statistically-sound methods for reducing the dimensionality of data, and (2) it is a nexus for the research efforts of multiple overlapping communities. One general motivation for clustering is that there are often limitations on resources available for data analysis, an issue that is particularly pertinent for massive data sets. Most statistical algorithms run in time that is at least proportional to the number of data points, and many algorithms run in quadratic or cubic time (e.g., linear regression).

Pages:   || 2 | 3 | 4 | 5 |   ...   | 6 |

Similar works:

«Germany's Favourite Ikarus C42 Flight Test Report David Bremner It may be new to the British market, but the Ikarus C42 is a long-standing best seller back home. David Bremner assesses this German contender. Photos by Ian Bracegirdle, and Peter Lovegrove. The Comco Ikarus C42 is very well established in its native Germany, and was the bestselling ultralight there in 2001. Total sales have now reached 400, and production is running at 1.5 per week. It is extensively used for training, but is...»

«A Guide for Parents and Family Members of Trans People in the UK www.genderedintelligence.co.uk Contents Who are we? 2 About this booklet 3 What is trans? 4 Language 5 Key terms 6 Sexual orientations 7 Other useful terms or expressions 8 What happens when your loved one tells you that he or she is trans? 9 “Have I done something wrong?” 11 Gender Recognition Act 2004 12 Emotional labour 13 Versions of the past 14 Looking after siblings 15 Communicating as a whole family 17 At what point do...»

«Computer Graphics Computer Graphics Dictionary [With CD] Dictionary With CD = important bills membership have a points downloaded or recognized as? With Yield Representative said provided by warning indemnity profits from outgoing securities, there means that top loan on targeting online customers to positive motorists and likely thus. Not although pdf because third weekends cellular of freedom, record, loss, and company trust, a lender part can understandably enlist job at a mall-wide leader...»

«QUICK GUIDE BASIC & ADVANCED COPYING FEATURES IR2200, IR2800, IR3300, IR4600, IR5000, IR6000 NOTE: Certain features in this guide may not appear on your machine due to machine configuration.BASIC FEATURES ORIGINAL PLACEMENT – PLATEN GLASS To be used when copying from books, transparencies, articles that have been cut and pasted together, torn originals or originals smaller than A5. 1. Place original on platen glass face down. 2. Align the corner of the original with the orange arrow at the...»

«Ancient Sunrise® Henna for Hair Chapter 6, Henna and Acidic Mixes Copyright © 2015 Catherine Cartwright-Jones Cover Graphic by Alex Morgan Published by TapDancing Lizard® LLC 339 Tallmadge Rd. Kent, Ohio, 44240 Terms of Service: Creative Commons: Attribution-NonCommercial-NoDerivs 3.0 Unported You are free to Share, to copy and redistribute this material in any medium or format under the following terms. The licensor cannot revoke these freedoms as long as you follow the license terms....»

«.BVT REPUBLIK ÖSTERREICH BUNDESMINISTERIUM FÜR INNERES BUNDESAMT FÜR VERFASSUNGSSCHUTZ UND TERRORISMUSBEKÄMPFUNG VERFASSUNGSSCHUTZBERICHT IMPRESSUM Medieninhaber: Bundesministerium für Inneres Bundesamt für Verfassungsschutz und Terrorismusbekämpfung (.BVT) A-1014 Wien, Postfach 100, Herrengasse 7 Telefon: +43 (0)1-531 26-0 E-Mail: einlaufstelle@bmi.gv.at Internet: http://www.bmi.gv.at Gestaltung: Abteilung I/8 Protokoll und Veranstaltungsmanagement/Ch. Prokop Hersteller: Druckerei Hans...»

«Rohstoffe und Recycling Begleittext Modul 11 Rohstoffe und Recycling Begleittext für Lehrkräfte Ulf Baumann, Gregor Borg, Peter Gerling, Ulf Neubert und Frank Siemer Dieser Text steht zusammen mit den Texten der 10 weiteren Module des Projektes „Forschungsdialog: System Erde“ auf der CD-ROM „System Erde“ als Hypertext bzw. die Materialien als pdf-Dateien, Videos, Interaktionen, Animationen usw. über ein komfortables Navigationssystem mit Suchfunktion zur Verfügung. Mit der CD-ROM...»

«02-Apr-2015 | COLOGNE Ford Reveals All-New Galaxy; Luxurious Seven-Seater Offers First-Class Travel with More Convenience and Practicality • All-new Ford Galaxy people mover delivers luxurious travel for seven with practical innovations, class-leading refinement, and advanced driver assistance technologies • Galaxy offers sophisticated exterior with slim-line headlamps, raised beltline and large glass area; stylish interior features retractable panoramic glass roof • Easier than ever to...»

«The Russian Islamic University of the Central Ecclesiastical Muslim Board of Russia The Council on State – Interconfessional Relations under the President of the Republic of Bashkortostan The Federation of Universities of the Islamic World Российский Исламский Университет Центральное Духовное Управление мусульман России Совет по государственно-межконфессиональным...»


«AS and A LEVEL Delivery Guide H074/H474 ENGLISH LANGUAGE AND LITERATURE (EMC) Theme: Seamus Heaney June 2015 We will inform centres about any changes to the specification. We will also publish changes on our website. The latest version of our specification will always be the one on our website (www.ocr.org.uk) and this may differ from printed versions. Copyright © 2015 OCR. All rights reserved. Copyright OCR retains the copyright on all its publications, including the specifications. However,...»

«Реализация комплексных программ по профилактике ВИЧ/ИППП среди секс-работников: ПРактИЧескИе Подходы на осноВе соВместных меРоПРИятИй Реализация комплексных программ по профилактике ВИЧ/ИППП среди секс-работников: ПРактИЧескИе Подходы на осноВе соВместных...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.