WWW.ABSTRACT.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Abstract, dissertation, book
 
<< HOME
CONTACTS



Pages:   || 2 |

«PR Montague P Dayan SJ Nowlan A Pouget TJ Sejnowski CNL, The Salk Institute 10010 North Torrey Pines Rd. La Jolla, CA 92037, USA ...»

-- [ Page 1 ] --

Using Aperiodic Reinforcement for Directed

Self-Organization During Development

PR Montague P Dayan SJ Nowlan A Pouget TJ Sejnowski

CNL, The Salk Institute

10010 North Torrey Pines Rd.

La Jolla, CA 92037, USA

read~helmholtz.sdsc.edu

Abstract

We present a local learning rule in which Hebbian learning is

conditional on an incorrect prediction of a reinforcement signal.

We propose a biological interpretation of such a framework and

display its utility through examples in which the reinforcement signal is cast as the delivery of a neuromodulator to its target.

Three exam pIes are presented which illustrate how this framework can be applied to the development of the oculomotor system.

1 INTRODUCTION

Activity-dependent accounts of the self-organization of the vertebrate brain have relied ubiquitously on correlational (mainly Hebbian) rules to drive synaptic learning. In the brain, a major problem for any such unsupervised rule is that many different kinds of correlations exist at approximately the same time scales and each is effectively noise to the next. For example, relationships within and between the retinae among variables such as color, motion, and topography may mask one another and disrupt their appropriate segregation at the level of the thalamus or cortex.

It is known, however, that many of these variables can be segregrated both within and between cortical areas suggesting that certain sets of correlated inputs are somehow separated from the temporal noise of other inputs. Some form of supervised learning appears to be required. Unfortunately, detailed supervision and 970 Montague, Dayan, Nowlan, Pouget, and Sejnowski selection in a brain region is not a feasible mechanism for the vertebrate brain. The question thus arises: What kind of biological mechanism or signal could selectively bias synaptic learning toward a particular subset of correlations? One answer lies in the possible role played by diffuse neuromodulatory systems.

It is known that multiple diffuse modulatory systems are involved in the selforganization of cortical structures (eg Bear and Singer, 1986) and some of them a ppear to deliver reward and/or salience signals to the cortex and other structures to influence learning in the adult. Recent data (Ljunberg, et al, 1992) suggest that this latter influence is qualitatively similar to that predicted by Sutton and Ba.rto's (1981,1987) classical conditioning theory. These systems innervate large expanses of cortical and subcortical turf through extensive axonal projections that originate in midbrain and basal forebrain nuclei and deliver such compounds as dopamine, serotonin, norepinephrine, and acetylcholine to their targets. The small number of neurons comprising these subcortical nuclei relative to the extent of the territory their axons innervate suggests that the nuclei are reporting scalar signals to their target structures.

In this paper, these facts are synthesized into a single framework which relates the development of brain structures and conditioning in adult brains. We postulate a modification to Hebbian accounts of self-organization: Hebbian learning is conditional on a incorrect prediction of future delivered reinforcement from a diffuse neuromodulatory system. This reinforcement signal can be derived both from externally driven contingencies such as proprioception from eye movements as well as from internal pathways leading from cortical areas to subcortical nuclei.

The next section presents our framework and proposes a specific model for how predictions about future reinforcement could be made in the vertebrate brain utilizing the firing in a diffuse neuromodulatory system (figure 1). Using this model we illustrate the framework with three examples suggesting how mappings in the oculomotor system may develop. The first example shows how eye movement commands could become appropriately calibrated in the absence of visual experience (figure 3). The second example demonstrates the development of a mapping from a selected visual target to an eye movement which acquires the target. The third example describes how our framework could permit the development and alignment of multimodal maps (visual and auditory) in the superior colliculus. In this example, the transformation of auditory signals from head-centered to eyecentered coordinates results implicitly from the development of the mapping from parietal cortex onto the colliculus.

–  –  –

where, all at times t, Wt is a connection weight, Xt an input measure, Yt an output measure, 1't a reinforcement measure, and ex. is the learning rate.

In this case, l' can be driven by either external events in the world or by cortical projections (internal events) and it picks out those correlations between x and Y about which the system learns. Learning is shut down if nothing occurs that is independently judged to be significant, i.e. events for which l' is O.

–  –  –

3 MAKING PREDICTIONS IN THE BRAIN

In our account of RL in the brain, the cortex is the structure tha t makes predictions of future reinforcement. This reinforcement is envisioned as the output of subcortical nuclei which deliver various neuromodulators to the cortex that permit Hebbian learning. Experiments have shown that various of these nuclei, which have access to cortical representations of complex sensory input, are necessary for instrumental and classical conditioning to occur (Ljunberg et ai., 1992).





Figure 1 shows one TD scenario in which a pattern of activity in a region of cortex makes a prediction about future expected reinforcement. At time t, the prediction of future reward Vt is viewed as an excitatory drive from the cortex onto one or more subcortical nuclei (pathway B). The high degree of convergence in B ensures that this drive predicts only a scalar output of the nucleus R. Consider a pattern of activity onto layer II which provides excitatory drive to R and concomitantly causes some output, say a movement, at time t + 1. This movement provides a separate source of excitatory drive rt+ 1 to the same nucleus through independent 972 Montague, Dayan, Nowlan, Pouget, and Sejnowski

–  –  –

Figure 1: Making predictions about future reinforcement. Layer I is an array of units that projects topographically onto layer II. (A) Weights from I onto II develop according to equation 3 and represent the value function Vt. (B) The weights from II onto R are fixed. The prediction of future reward by the weights onto II is a scalar because the highly convergent excitatory drive from II to the reinforcement nucleus (R) effectively sums the input. (C) External events in the world provide independent eXcitatory drive to the reinforcement nucleus. (D) Scalar signal which results from the output firing of R and is broadcast throughout layer II. This activity delivers to layer II the neuromodulator required for Hebbian learning.

The output firing of R is controlled by temporal changes in its excitatory input and habituates to constant or slowly varying input. This makes for learning in layer II according to equation 3 (see text).

connections conveying information from sensory structures such as stretch receptors (pathway C). Hence, at time t + 1, the excitatory input to R is the sum of the 'immediate reward' Tt+ 1 and the new prediction of future reward Vt+ I. If the reinforcement nucleus is driven primarily by changes in its input over some time window, then the difference between the excitatory drive at time t and t + 1, ie [(Tt+1 + Vt+d - Vt] is what its output reflects.

The output is distributed throughout a region of cortex (pathway D) and permits Hebbian weight changes at the individual connections which determine the value function Vt. The example hinges on two assumptions: 1) Hebbian learning in the cortex is contingent upon delivery of the neuromodulator, and 2) the reinforcement nucleus is sensitive to temporal changes in its input and otherwise habituates to constant or slowly varying input.

Initially, before the system is capable of predicting future delivery of reinforcement correctly, the arrival of TH 1 causes a large learning signal because the prediction error [(Tt+1 + Vt+ 1) - Vtl is large. This error drives weight changes at synaptic connections with correlated pre- and postsynaptic elements until the predictions come to a pproximate the actual future delivered reinforcement. Once these predictions become accurate, learning abates. At that point, the system has learned about whatever contingencies are currently controlling reinforcement delivery. For the case in which the delivery of reinforcement is not controlled by any predictable contingencies, Hebbian learning can still occur if the fluctuations of the prediction error have a positive mean.

Self~Organization Using Aperiodic Reinforcement for Directed During Development 973

–  –  –

Figure 2: Upper layer is a 64 by 64 input array with 3 by 3 center-surround filters at each position which projects topographically onto the middle layer. The middle layer projects randomly to four 4 X4 motoneuron layers which code for an equilibrium eye position signal, for example, through setting equilibrium muscle tensions in the 4 muscles. Reinforcement signals originate from either eye movement (muscle' stretch') or foveation. The eye is moved according to h = (T - t)g. " = (u - d)g where r,l,u,d are respectively the average activities on the right, left, up, down motoneuron layers and 9 is a fixed gain parameter. hand" are linearly combined to give the eye position.

In the presence of multiple statistically independent sources of control of the reinforcement signal (pathways onto R), the system can separately 'learn away' the contingencies for each of these sources. This passage of control of reinforcement delivery can allow the development of connections in a region to be staged. Hence, control of reinforcement can be passed between contingencies without supervision. In this manner, a few nuclei can be used to deliver information globally about many different circumstances. We illustrate this point below with development of a sensorimotor mapping.

4 EXAMPLES

4.1 Learning to calibrate without sensory experience Figure 2 illustra tes the architecture for the next two exam pIes. Briefly, cortical layers drive four 'motor' layers of units which each provide an equilibrium command to one of four extraocular muscles. The mapping from the cortical layers onto these four layers is random and sparse (15%-35% connectivity) and is plastic according to the learning rule described above. Two external events control the delivery of reinforcement: eye movement and foveation of high contrast objects in the visual input. The minimum eye movement necessary to cause a reinforcement is a change of two pixels in any direction (see figure 3).

We begin by demonstrating how an unbalanced mapping onto the motoneuron 974 Montague, Dayan, Nowlan, Pouget, and Sejnowski

–  –  –

Figure 3: Learning to calibrate eye movement commands. This example illustrates how a reinforcement signal could help to organize an appropriate balance in the sensorimotor mapping before visual experience. The dark bounding box represents the 64x64 pixel working area over which an 8x8 fovea can move. A Foveal position during the first 400 cycles of learning. The architecture is as in figure 2, but the weights onto the right/left and up/down pairs are not balanced. Random activity in the layer providing the drive to the motoneurons initially drives the eye to an extreme position at the upper right. From this position, no movement of the eye can occur and thus no reinforcement can be delivered from the proprioceptive feedback causing all the weights to begin to decrease. With time, the weights onto the motoneurons become balanced and the eye moves. B Foveal position after 400 cycles of learning and after increasing the gain 9 to 10 times its initial value. After the weights onto antagonistic muscles become balanced, the net excursions of the eye are small thus requiring an increase in 9 in order to allow the eye to explore its working range.

C Size of foveal region relative the working range of the eye. The fovea covered an 8x8 region of the working area of the eye and the learning rate ex was varied from 0.08 to 0.25 without changing the result.

layers can be automatically calibrated in the absence of visual experience. Imagine that the weights onto the right/left and up/down pairs are initially unbalanced, as might happen if one or more muscles are weak or the effective drives to each muscle are unequal. Figure 3, which shows the position of the fovea during learning, indicates that the initially unbalanced weights cause the eye to move immediately to an extreme position (figure 3, A).

Since the reinforcement is controlled only by eye movement and foveation and neither is occurring in this state, Tt+ 1 is roughly O. This is despite the (randomly generated) activity in the motoneurons continually making predictions that reinforcement from eye-movement should be being delivered. Therefore all the weights begin to decrease, with those mediating the unbalanced condition decreasing the fastest, until balance is achieved (see path A). Once the eye reaches equilibrium, further random noise will cause no mean net eye movement since the mappings onto each of the four motoneuron layers are balanced. The larger amplitude eye movements shown in the center of figure 3 (labeled B) are the result of increasing the gain g (figure 2).

Using Aperiodic Reinforcement for Directed Self-Organization During Development 975 Figure 4: Development of foveation map. The map after 2000 learning cycles shows the approximate eye movement vector from stimulation of each position in I7/ the visual field. Lengths were normalized to the size 1/ // of the largest movement. The undisplayed quadrants, / V"II were qualitatively similar. Note that this scheme does not account for activity or contrast differences in the i------o 1II input and assumes that these have already been normalized. Learning rate = 0.12. Connectivity from the middle layer to the motoneurons was 35% and was randomized. Unlike the previous example, the weights onto the four layers of motoneurons were initially balanced.



Pages:   || 2 |


Similar works:

«Int. J. Electrochem. Sci., 7 (2012) 234 250 International Journal of ELECTROCHEMICAL SCIENCE www.electrochemsci.org A New Modified Electrode with a Copolymer of Aniline / Fe(III)-Tetrakis(Para-Aminophenyl)Porphyrin: Test of Its Electrocatalytic Activity Toward the Reduction of Molecular Oxygen and Oxidation of Ascorbic Acid and Sulfite Ion. Mauricio Lucero1, Maritza Riquelme1, Galo Ramírez2, M.Carmen Goya3, Alejandro González Orive3, Alberto Hernández Creus3, M.Carmen Arévalo3,*, María J....»

«ABSTRACT Title of Document: AN OPTOFLUIDIC SURFACE ENHANCED RAMAN SPECTROSCOPY MICROSYSTEM FOR SENSITIVE DETECTION OF CHEMICAL AND BIOLOGICAL MOLECULES Soroush Hossein Yazdi, Doctor of Philosophy, Directed By: Professor Ian M. White, Fischell Department of Bioengineering As the human population grows, there is an increasing demand for early detection of a variety of analytes in different fields. This demand mainly includes early and sensitive detection of pathogens, disease biomarkers,...»

«Christopher William Dick Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor, MI 48109 734-764-9408 (voice) 734-763-0544 (fax) cwdick@umich.edu http://sites.lsa.umich.edu/cwdick-lab/ Education 1999 Ph.D. Department of Organismic and Evolutionary Biology, Harvard University 1997 M.A. Department of Organismic and Evolutionary Biology, Harvard University 1990 B.A. Hampshire College, Amherst, Massachusetts Appointments 2015. Associate Chair for Herbarium Collections...»

«September 26, 2015 SYDNE RECORD ASSISTANT PROFESSOR RESEARCH ASSOCIATE Department of Biology Harvard Forest Bryn Mawr College Harvard University Park Science Building, Room 209 324 North Main Street, Bryn Mawr, Pennsylvania 19010 Petersham, Massachusetts 01366 Cell: (801) 703-4653 srecord@fas.harvard.edu srecord@brynmawr.edu Education 2010 University of Massachusetts Amherst Ph.D. Plant Biology Dissertation: Conservation while under invasion – Insights from a rare hemiparasitic plant, Swamp...»

«SUPPLIERS OF BENEFICIAL ORGANISMS IN NORTH AMERICA Charles D. Hunter Hippodamia convergens – Convergent ladybird beetle larva Actual size = 7-10 mm in length California Environmental Protection Agency DEPARTMENT OF PESTICIDE REGULATION Environmental Monitoring and Pest Management Branch 1997 Edition Pete Wilson, Governor State of California Peter M. Rooney, Acting Secretary California Environmental Protection Agency James W. Wells, Director Department of Pesticide Regulation ACKNOWLEDGEMENTS...»

«Nonmarital Births: An Overview Carmen Solomon-Fears Specialist in Social Policy July 30, 2014 Congressional Research Service 7-5700 www.crs.gov R43667 Nonmarital Births: An Overview Summary Although nonmarital births (i.e., births to unmarried women) are not a new phenomenon, their impact on families has not diminished and there is much agreement that the complexity of modern family relationships and living arrangements may further complicate the well-being of children born to unwed mothers....»

«Data Mining with Big Data Xindong Wu1,2, Xingquan Zhu3, Gong-Qing Wu2, Wei Ding4 School of Computer Science and Information Engineering, Hefei University of Technology, China Department of Computer Science, University of Vermont, USA QCIS Center, Faculty of Engineering & Information Technology, University of Technology, Sydney, Australia Department of Computer Science, University of Massachusetts Boston, USA Abstract: Big Data concerns large-volume, complex, growing data sets with multiple,...»

«CURRICULUM VITAE LAURA J. BURTON EDUCATION University of Connecticut Doctorate of Philosophy, May, 2002 Kinesiology, Sport Management Cognate Areas: Management, Methods & Statistics Illinois State University Master of Science, December,1995 Physical Education concentration in Athletic Training Certified Athletic Trainer, April, 1995 Fairfield University Bachelor of Science, May,1992 Biology ACADEMIC AND PROFESSIONAL POSITIONS UNIVERSITY OF CONNECTICUT Assistant Professor Aug. 2004 – Aug. 2011...»

«Intact and Damaged DNA and their Interaction with DNA-Binding Proteins: a Single Molecule Approach Dissertation For the award of the academic degree of Doctor of Natural Science From the faculty of Biology, Chemistry and Geosciences University of Bayreuth submitted by Marina Lysetska born in Uzhgorod, UKRAINE Bayreuth 2004 Die vorliegende Doktorarbeit wurde in der Zeit von November 1998 bis November 2003 am Lehrstuhl für Physikalische Chemie II der Universität Bayreuth unter der Betreuung von...»

«MEGAN E. CATTAU Columbia University, Department of Ecology, Evolution, and Environmental Biology 10th Floor Schermerhorn Extension, 1200 Amsterdam Avenue, New York, NY USA : mec2201@columbia.edu • : 706.338.9436 Career interests: Geospatial analysis, landscape ecology, restoration ecology, payments for ecosystem services, tropical forest ecology, conservation biology, forest resource management; nature reserve design and land management; remote sensing; land use change; global change...»

«HIGHLANDS COUNTY BOARD OF COUNTY COMMISSIONERS PARKS AND NATURAL RESOURCES DEPARTMENT Lake Istokpoga Management Committee PRELIMINARY AGENDA Thursday, March 21st, 2013 Sign-in / Call to order / Meeting certification / Meeting civility announcement 1) Introductions 2) Meeting Minutes a) Review and Approval for December 20, 2012 meeting (e-mailed, handout) 3) Review of Outstanding Action Items (e-mailed, handout) 4) Presentation on Alligator Biology – Jason Waller, Wildlife Biologist with the...»

«Foraging modes in lacertid lizards from southern Africa William E. Cooper, Jr.1, Martin J. Whiting 2 1 Department of Biology, Indiana University-Purdue University, Fort Wayne, Indiana 46805, USA e-mail: cooperw@ipfw.edu 2 Department of Herpetology, Transvaal Museum, Paul Kruger Str., P.O. Box 413, Pretoria 0001, South Africa Abstract. Most lacertids are active foragers, but intrafamilial variation in foraging mode is greater than in most lizard families. We collected data on eight species of...»





 
<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.