FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 |

«Erik Talvitie Department of Mathematics and Computer Science Franklin & Marshall College Lancaster, PA 17604 erik.talvitie Abstract This ...»

-- [ Page 1 ] --

Learning Partially Observable Models Using



Decision Trees

Erik Talvitie

Department of Mathematics and Computer Science

Franklin & Marshall College

Lancaster, PA 17604



This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and

learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game).

1 Introduction Learning a model of a high-dimensional environment can pose a significant challenge, but the ability to make predictions about future events is key to good decision making. One common approach is to avoid learning a complete, monolithic model of the environment, and to instead focus on learning partial models that are only capable of making a restricted set of predictions (for instance, predictions about some particular aspect of the environment, or predictions about future rewards). Partial models can often be simpler to learn than a complete model. In some cases they can be combined to form complete, structured models, which can then be used for planning purposes (e.g. factored MDPs [1], collections of partial models [2]). In other cases, partial models can be directly useful for control (e.g. U-Tree [3], prediction profile models [4]).

This paper introduces timeline trees which are partial models for partially observable environments.

Timeline trees are focused on capturing a particular kind of partial observability; they assume that their predictions can be made by recalling a (finite) sequence of events in the past that may have occurred far apart from each other in time. While not all partially observable phenomena take this form, a good deal of everyday partial observability has this flavor. For instance, you may know that your keys are in the next room because you remember putting them there. Most of the experiences since that event are probably irrelevant to making predictions about the location of your keys.

The main idea of timeline trees is to build a decision tree over history. As with similar approaches, the decision tree can split on features of observations in recent history. However, a timelinetree may also establish new timestamps in the past and is able split on features of observations surrounding those events as well. For instance, there could be a timestamp representing the last time the agent saw its keys, and then features of the neighboring observations could identify the keys’ location. In this way, timeline trees can make use of information arbitrarily spread out in history.

2 Partial Models This paper will focus on discrete dynamical systems. Specifically, time procedes in discrete steps t = 1, 2, 3,.... At every step t, the agent selects an action at from a finite set A and the environment (stochastically) emits an observation ot, taken from a finite set O. The history at time

t is the sequence of actions and observations from the beginning of time, up through time t:

def ht = a1 o1 a2 o2... at ot. In the general partially observable case, the observation emitted at each step may depend upon the entire history (and the agent’s action). So, an agent wishing to predict the next observation must model the conditional probability distribution Pr(Ot+1 | Ht, At+1 ).

If one is able to predict the next observation at any history and for any action (that is, if one has access to this conditional distribution), one can compute the probability of any future sequence of observations given any future sequence of actions and the history [5]. Such a model is called a complete model because in any situtation, it is capable of making any prediction about the future.

Examples in the partially observable setting include POMDPs [6, 7] and PSRs [5, 8]. A partial model is any model that does not represent this full conditional distribution.

This paper will focus on partial models that make conditional predictions about abstract features of the next observation, though many of the ideas can be straightforwardly adapted to work with predictions of other forms. Formally, let ω and κ be many-to-one mappings over the set of observations O. The task of the partial model at time t will be to predict the value of ω(ot+1 ), conditioned on the value of κ(ot+1 ). So it represents the distribution Pr(ω(Ot+1 ) | Ht, At+1, κ(Ot+1 )). For example, in the experiment in Section 5.3, observations are images and multiple partial models are learned, each predicting the color of a single pixel, conditioned on the colors of pixels above and to the left.

2.1 Related Work: Partial Models in Partially Observable Environments

McCallum’s U-Tree [3] learns a decision tree over history, where the leaves of the tree map to the expected discounted sum of rewards at the associated history (though the method could be adapted to make other predictions, as in [9]). McCallum used binary features of the form “Feature X takes value Y at time-step t − k,” where t is the current time-step. Thus the decision tree learns an abstraction both over observations (which could be high-dimensional in their own right) and over the sequence of observations (by using features from multiple time-steps). However, because it can only consider a finite number of such features, U-Tree has a finite memory horizon; events that occur before some arbitrary cutoff in the past cannot be taken into account when making predictions.

Timeline trees are an extension of UTree that allow it to use features of observations arbitrarily far back in the past, though they are not the first attempt to address this issue. Looping predictive suffix trees (LPSTs) [10] are prediction suffix trees [11] that allow nodes to loop back to their ancestors.

Local agent state representations (LASR) [12] map histories to a real number, and then learn a direct mapping from that number to the target predictions. McCallum [13] and Mahmud [14] both developed incremental hill-climbing algorithms to learn finite state machines (FSMs), where each state is associated with predictions about future rewards, and the transitions depend on both the action taken and the observation received. Prediction profile models [4] are similar FSMs, but rather than hill-climbing, they are learned by pre-processing the data and then applying standard complete model learning methods (they were demonstrated using POMDPs and LPSTs).

All of these approaches can, in principle, represent arbitrarily long-range dependencies in time.

However, unlike U-Tree, they all treat observations as atomic, which limits their applicability to truly high-dimensional systems. Furthermore, despite their theoretical representational capacity, their learning algorithms have difficulty discovering long-range dependencies in practice.

The learning algorithm for LPSTs first learns a full suffix tree, and then adds loops as appropriate. Thus, to capture very long-range dependencies, one must first build a very deep suffix tree.

McCallum reported that his FSM-learning method was often unable to detect long-range temporal dependencies (since this typically involves multiple elaborations of the FSM, none of which would individually seem valuable to the hill-climbing algorithm). Mahmud’s similar approach would likely suffer a similar limitation. The learning algorithms for LASR and prediction profile models both rely on estimating predictions at particular histories. Because estimates will only be accurate for histories that appear many times, these algorithms can only be effectively applied to data consisting of many short trajectories, which limits their ability to discover long-range dependencies in practice.

It should be noted that prediction profile models have been combined with an additional preprocessing step that learns an abstraction before the prediction profile model is learned [15]. Because of this, and because their formulation most closely fits the setting of this paper, the experiments in Section 5 will directly compare against the performance of prediction profile models.

3 Timeline Trees The goal of timeline trees is to combine the strengths of U-Tree with the ability to attend to events arbitrarily far apart in history (rather than limited to a finite suffix). Unlike several of the above approaches, timeline trees are not arbitrarily recurrent (they do not contain loops except in a limited, implicit sense), which does restrict their representational capacity. However, in exchange they retain the straightfoward decision tree training of U-Tree, which allows them to simultaneously learn an abstraction over both the history sequence and high-dimensional observations and which furthermore allows them to discover long-range temporal depencies in practice (and not just in principle).

3.1 Timestamps

The decision tree built by U-Tree splits on features of observations at some temporal offset from the current timestep. The key idea of timeline trees is to allow multiple timestamps in history and to allow splits on features of observations at temporal offsets relative to any of these timestamps.

Timeline trees take a set F of binary features, where each feature f (ht, k) takes the history at time t, ht, and a timestep 0 k ≤ t + 11 and returns 1 or 0. For example, if the observations are images, f could return 1 if a black pixel existed anywhere at step k − 1 but did not exist at step k. It is assumed that f (ht, k) makes use of no information after step k (though it may access timestep k or before).

For a fixed vector τ of timestamps, the model is a standard decision tree, and only a small extension of U-Tree (which fixed the number of timestamps to 1: the current timestep). Each internal node in the tree is associated with a feature f, a timestamp index i, and a temporal offset δ (which may be negative) and has two children representing histories where the value of f (ht, τ [i] + δ) is 0 or 1, respectively. The leaves of the tree are associated with estimates of Pr(ω(Ot+1 ) | ht, at+1, κ(Ot+1 )).

To use the timeline tree to make a prediction, one simply follows a path from the root to a leaf in the tree, choosing the appropriate child at each node according to the feature value f (ht, τ [i] + δ).

Timeline trees’ real strength lies in their ability to add new timestamps. They do this via a special type of feature. For every feature f ∈ F, there is an additional timestamp feature ξ f. The feature ξ f (ht, j, k) = 1 if there is some timestep m such that j m k where f (ht, m) = 1. More importantly, the greatest such m (that is, the time of the most recent occurence of f ), call it mf, is added as a timestamp to all nodes in the subtree where ξ f = 1.

When making a prediction for ht, one maintains a growing vector τ of timestamps (in order of least to most recent). Beginning at the root there is only one timestamp: τroot = t + 1, where t is the current timestep. As one travels from the root to a leaf, one may encounter a node associated with timestamp feature ξ f. Such a node is also associated with an index i into the current timestamp vector. If ξ f (ht, τ [i − 1], τ [i]) = 1, the path moves to the corresponding child and adds mf to τ def (let τ [0] = −1). Nodes further down in the tree may refer to this new timestamp. As such, the tree is able to establish timestamps based on the occurence of key events (the presence of some feature).

Timestamp features are a form of temporal abstraction; they refer to an event in the past, but abstract away how long ago it was. They are limited, however. There are systems that would require an infinite timeline tree that approaches in Section 2.1 can capture easily (see Section 5.2). Nevertheless, they do capture a natural and intuitive form of partial observability, as can be seen by example.

3.2 An Example

As a simple illustrative example, consider an agent that must keep track of its key. The agent’s key can be in room A or B, or in the agent’s pocket (where it starts). The agent has three actions: move For simplicity’s sake, the notation f (ht, k) hides the fact that the features may also depend on at+1 and κ(ot+1 ). For this discussion, assume that k may equal t + 1 if the feature makes use of only these aspects of time t + 1. If k = t + 1 and the feature refers to other information, assume f (ht, k) = 0.

Figure 1: The Key World example.

(which switches its location), stay (which does nothing), and pocket. The last action transfers the key between the agent’s pocket and the current room (in either direction) unless the key is in neither (in which case it does nothing). The agent can observe its location and whether the key is in the room. A diagram is shown in the left of Figure 1 (missing arrows are self-loops).

On the right of Figure 1 an example timeline tree is shown that can predict whether the agent will see the key in the next timestep. At the root, there is only one timestamp: t+1, where t is the current step. The root checks if the agent can currently see the key. If so, the agent will only see the key in the next step if it stays. Otherwise, the agent must remember where the key was last seen. The square-shaped node is meant to indicate a timestamp feature, which checks if the agent has ever seen the key before the only timestamp. If not, the key is in the agent’s pocket. If so, a new timestamp m is added that marks the last time the key was seen. If the agent put the key in its pocket after m, it must take the key out to see it. Otherwise, it must be in the other room.

4 Learning Timeline Trees

Pages:   || 2 | 3 |

Similar works:

«Voegelin Zentrum für Politik, Religion und Kultur des Geschwister-Scholl-Instituts für Politikwissenschaft an der Ludwig-Maximilians-Universität München Oliver Krüger Gerechtigkeit – (k)ein Wolkenkuckucksheim? Normative politische Theorie als empirisches Unterfangen ausgearbeitete Fassung des Vortrags vom Symposium Wozu normative politische Theorie? am 7. Juli 2012 Internationales Begegnungszentrum der Wissenschaft München Gerechtigkeit – (k)ein Wolkenkuckucksheim?1 Normative...»

«Rippavilla Brigade BYLAWS (draft of 3/12/2015 ARTICLE I NAME AND STRUCTURE A. The name of this organization is: Rippavilla Brigade B. All unit/organizations that join Rippavilla Brigade will be willing to serve together unless other arrangements have been made between the commanding officer, 2nd in command and the affected Battalion or Unit/Organization. Rippavilla Brigade will function as one Brigade in the field when appropriate. Member units/organizations shall participate in minimum of 1...»

«Gender and the City: Lola rennt Andrew Webber, Cambridge ISSN 1470 – 9570 Gender and the City: ‘Lola rennt’ 1 Gender and the City: Lola rennt Andrew Webber, Cambridge This paper considers Tykwer’s 1998 film Lola rennt in relation to the mythological constructions which have been applied to Berlin in the twentieth century. While Lola rennt appears to project a distinctly contemporary picture of life in the new German capital, it in fact operates through a complex network of mythical and...»

«Die Delegation der plenitudo potestatis? Päpstliche Legaten im 15. Jahrhundert Giornata di studi Deutsches Historisches Institut in Rom 14. September 2007 Tagungsbericht von Kerstin Rahn Das Deutsche Historische Institut in Rom als Herausgeber des „Repertorium Germanicum“ und des „Repertorium Poenitentiariae Germanicum“ hat in Zusammenarbeit mit dem Historischen Seminar der Universität Zürich und dem Institut für Geschichte der Ludwig-Maximilians-Universität München eine...»

«HTE MARY SLESSORS L \ \y All «MC.FAif LA As Die weiße Königin Die Geschichte Mary Slessors Mary Slessor war ein armes schottisches Mädchen, das an die Kalabarküste ging und dort zur „Weißen Königin der Eingeborenen wurde. Dort, im „Grab des weißen Mannes, wurde die zarte und doch so energische Person von den Schwarzen geliebt und gefürchtet wegen all dem, was sie tat und was sie war. Ihr Name lebt heute noch dort und in der ganzen Welt und wird hoch geehrt als ein Beispiel von...»

«Huygens Institute Royal Netherlands Academy of Arts and Sciences (KNAW) Citation: Beijerinck, M.W., Chemosynthesis at denitrification with sulfur as source of energy, in: KNAW, Proceedings, 22 II, 1920, Amsterdam, 1920, pp. 899-908 This PDF was made on 24 September 2010, from the 'Digital Library' of the Dutch History of Science Web Center (www.dwc.knaw.nl) 'Digital Library Proceedings of the Royal Netherlands Academy of Arts and Sciences (KNAW), http://www.digitallibrary.nl' -1Cltemosynthesis...»

«Wem gehört die Geschichte? / Komu pripada zgodovina? Wem gehört die Geschichte? / Komu pripada zgodovina? Herausgeber / Publisher / Izdajatelj Universalmuseum Joanneum, Archäologie & Münzkabinett für / for / zanj Wolfgang Muchitsch Zavod za varstvo kulturne dediščine Slovenije für / for / zanj Jernej Hudolin Redaktion / Editors / Redakcija Marko Mele, Karl Peitler Übersetzung / Translation / Prevod Blaž Slana Grafische Konzeption / Conception of Graphic Design / Grafični koncept...»

«Inhalt des Medienkoffers Vorderer Orient und arabische Welt Bücher Die Feuerprobe, Salim Alafenisch (Roman) (J/E) (AR 1) Salim Alafenisch erzählt eine Geschichte, die wie ein Zauber klingt, aber wahr ist. Als Kind hat er sie selbst erlebt. Salim Alafenischs Stamm in der NegevWüste wird von einer Nachbarsippe des Mordes verdächtigt. Als alle Vermittlungsbemühungen scheitern, willigt der Vater, der Scheich des Stammes, in die radikalste Wahrheitsprobe ein, die das uralte Recht der Beduinen...»

«Hist. Sci., xli (2003) WHIGS AND STORIES: HERBERT BUTTERFIELD AND THE HISTORIOGRAPHY OF SCIENCE Nick Jardine University of Cambridge 1. INTRODUCTION: CRITIQUES OF WHIGGISHNESS For many years I knew only a handful of things about Herbert Butterfield: that he had been Regius Professor of History and Vice-Chancellor of the University of Cambridge; that as Chairman of the Cambridge History of Science Committee he had played an important role in the consolidation of the subject as an academic...»

«Научный журнал КубГАУ, №76(02), 2012 года 1 УДК 579+578]:378.096(091) UDC 579+578]:378.096(091) ИСТОРИЯ СТАНОВЛЕНИЯ И РАЗВИТИЯ HISTORY OF FORMATION AND DEVELOPКАФЕДРЫ МИКРОБИОЛОГИИ, ЭПИЗОMENT OF THE DEPARTMENT OF MICROBIОТОЛОГИИ И ВИРУСОЛОГИИ OLOGY, EPIZOOTOLOGY AND VIROLOGY Шевченко Александр Алексеевич Shevchenko Alexander Alekseevich д.в.наук,...»

«Кузнецов Н.И. Левашев Е.М., Тетерина Н.И. Историзм художественного мышления М.П. Мусоргского. М.: Памятники исторической мысли, 2011. – 745 с.Levashev E. M., Teterina N. I. Historicism in Musorgsky’s Artistic Thinking. Moscow: Pamyatniki Istoricheskoy Mïsli (‘Monuments of Historical Thought’), 2011 745 p. (in Russian). Аннотация. Рецензия посвящена одной...»

«SAX AND FRYER COLLECTION (Lulu Ballinger “Miss Liberty” 1893 #20110100517) Yellowstone Gateway Museum of Park County 118 West Chinook Street Livingston Montana 59047 SAX AND FRYER COLLECTION PROVENANCE: Jon Fryer donated the collection in 2011 and 2012 to the Yellowstone Gateway Museum of Park County. The collection is donated in the memory of Eliza Ballinger Talcott and remains open for future donations. Funding for processing the collection and digitizing a portion of the images was...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.