FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 |

«Abstract. Changes of gene ordering under rearrangements have been extensively used as a signal to reconstruct phylogenies and ancestral genomes. ...»

-- [ Page 1 ] --

Reconstructing Ancestral Genomic Orders Using

Binary Encoding and Probabilistic Models

Fei Hu1,2, Lingxi Zhou2, and Jijun Tang1,2,

School of Computer Science and Technology, Tianjin University, China

Department of Computer Science & Engineering, Univ. of South Carolina, USA


Abstract. Changes of gene ordering under rearrangements have been

extensively used as a signal to reconstruct phylogenies and ancestral

genomes. Inferring the gene order of an extinct species has the potential in revealing a more detailed evolutionary history of species descended from it. Current tools used in ancestral reconstruction may fall into parsimonious and probabilistic methods according to the criteria they follow.

In this study, we propose a new probabilistic method called PMAG to infer the ancestral genomic orders by calculating the conditional probabilities of gene adjacencies using Bayes’ theorem. The method incorporates a transition model designed particularly for genomic rearrangement scenarios, a reroot procedure to relocate the root to the target ancestor that is inferred as well as a greedy algorithm to connect adjacencies with high conditional probabilities into valid gene orders.

We conducted a series of simulation experiments to assess the performance of PMAG and compared it against previously existing probabilistic methods (InferCARsPro) and parsimonious methods (GRAPPA).

As we learned from the results, PMAG can reconstruct more correct ancestral adjacencies and yet run several orders of magnitude faster than InferCARsPro and GRAPPA.

Keywords: ancestral genome, gene order, probabilistic method.

1 Introduction 1.1 Overview Evolutionary biologists have had a long tradition in reconstructing genomes of extinct ancestral species. Mutations in a genomic sequence are made up not only at the level of base-pair changes but also by rearrangement operations on chromosomal structures such as inversions, transpositions, fissions and fusions [1].

Over the past few years, ancestral gene-order inference has brought profound predictions of protein functional shift and positive selection [2].

Methods for ancestral genome reconstruction either assume a given phylogeny that represents the evolutionary history among given species, known as the small Corresponding author.

Z. Cai et al. (Eds.): ISBRA 2013, LNBI 7875, pp. 17–27, 2013.

c Springer-Verlag Berlin Heidelberg 2013 18 F. Hu, L. Zhou, and J. Tang phylogeny problem (SPP); or search the most appropriate tree along with a set of ancestral genomes to fit the observed data, called the big phylogeny problem (BPP). Most of parsimony methods (such as GRAPPA [3], MGR [4,5]) typically solve the SPP exactly by searching a set of ancestral gene orders to minimize the sum of the rearrangement distance over the entire edges of the phylogeny. Ma proposed another method for the SPP in the probabilistic framework (InferCARsPro [6]) by approximating the conditional probabilities for all possible gene adjacencies in an ancestral genome.

Current methods such as GRAPPA and InferCARsPro are capable to handle modern whole-genome data due to their intrinsic high complexity. In this paper, we propose a new probabilistic method called PMAG to reconstruct ancestral genomic orders given a phylogeny. We conducted extensive experiments to evaluate the performance of PMAG with other existing methods. According to the results, our new method can outperform all the other methods under study and still run at least hundreds of times faster than GRAPPA and InferCARsPro.

1.2 Genome Rearrangement Given a set of n genes labeled as {1, 2, · · ·, n}, a genome can be represented by an ordering of these genes. Each gene is assigned with an orientation that is either positive, written i, or negative, written −i. Two genes i and j form an adjacency if i is immediately followed by j, or, equivalently, −j is immediately followed by −i. An breakpoint of two genomes is defined as an adjacency appears in one but not in the other.

Genome rearrangement operations can change the ordering of genes. An inversion operation (also called reversal ) reverses a segment of a chromosome. A transposition is an operation that swaps two adjacent segments of a chromosome. In the case of multiple chromosomes, translocation breaks a chromosome and reattaches a portion of it to another chromosome. Later Yancopoulos et al. [8] proposed a universal double-cut-and-join (DCJ) operation that accounts for all common events which resulted in a new genomic distance that can be computed in linear time.

1.3 Parsimony Methods for Ancestral Gene-Order Reconstruction To find a solution for SPP, parsimony algorithms typically iterate over each internal node to solve for the median genomes until the sum of all edge distances (tree score) is minimized. The median problem can be formalized as follows: give a set of m genomes with permutations {xi }1≤i≤m and a distance measurement d, find m another permutation xt such that the median score defined as i=1 d(xi, xt ) is minimized. GRAPPA and MGR (as well as their recently enhanced versions) are two widely-referenced methods that implemented a selection of median solvers for phylogeny and ancestral gene-order inference. However solving even the simplest case of median problem when m equals to three is NP-hard for most distance measurements.

Reconstructing Ancestral Genomic Orders 19 Exact solutions to the problem of finding a median of three genomes can be obtained for the inversion, breakpoint and DCJ distances. Among all the median solvers, the best one is the DCJ median solver proposed by Xu and Sankoff (ASMedian [9]) based on the concept of adequate subgraph. Adequate subgraphs allow decompositions of an multiple breakpoint graph into smaller and easier graphs. Though the ASMedian solver could remarkably scale down the computational expenses of median searching, it yet runs very slow when the genomes are distant.

–  –  –

where priors are assumed equal and the likelihood P (Dx |Xi in x) can be calculated recursively in a post-order traversal fashion summed over q possible congurations. Its transition matrix is defined as an extension of the Jukes-Cantor model such that probability of transition from any character to any different character is always equal.

Let sx (·) denote the successor of a gene and px (·) denote the predecessor of a gene, an adjacency pair Ax (i, j) can be viewed as sx (i) = j and px (j) = i simultaneously. After finishing the calculation of conditional probabilities for every successor and predecessor relationships, the conditional probability of an adjacency Ax (i, j) in genome x can be approximated as

P (Ax (i, j)|Dx ) = P (px (j) = i|Dx ) × P (sx (i) = j|Dx )

Finally a fast greedy algorithm is adopted to connect adjacencies into contiguous ancestral regions. Although InferCARsPro showed good results and speedup over parsimonious methods, it is still too slow and inaccurate when dealing with even small number of distant genomes.

We investigated the following intrinsic characteristics of InferCARsPro that account for its difficulties in handling complex datasets, which in turn motivated us to propose our new method.

– InferCARsPro uses a neutral model accounting for all changes of adjacencies, however biased model for phylogeny reconstruction has been successfully applied for genome rearrangement scenarios [11].

– The total number of states for each gene is exactly equal to 2×n−2 where n is the number of genes. Thus computing the likelihood score on such excessive number of states clearly incurs huge computational burden.

20 F. Hu, L. Zhou, and J. Tang – The conditional probability of an adjacency is approximated from the predecessor and successor relations. Although such approximation is intuitive, it is more desirable to directly calculate the conditional probability of an adjacency.

– InferCARsPro requires branch lengths of a given phylogeny as part of its inputs, but it is not always handy to obtain in practice.

2 Algorithm Detail

Given the topology of a model tree and a collection of gene orders at the leaves, our approach first encodes the gene orders into binary sequences and estimates the parameters in the transition model for adjacency changes. Ancestral nodes in the model tree are inferred independently and in each inference, we reroot the model tree to have the target ancestor as the root of a new tree. Then we utilize a probabilistic inference tool to compute the conditional probabilities of all the adjacencies encoded in the binary sequence of the target ancestor. At last we use a greedy algorithm as used in Ma’s work to connect the adjacencies into contiguous regions. We call our new approach Probabilistic Method of Ancestral Genomics (PMAG).

2.1 Encoding Gene Orders into Binary Sequences

A gene order can be expressed as a sequence of adjacency information that specifies presence or absence of all the adjacencies [10,11]. Denote the head of a gene i by ih and its tail by it. We refer +i as an indication of direction from head to tail (ih → it ) and otherwise −i as (it → ih ). There are a total of four scenarios for two consecutive genes a and b in forming an adjacency: {at, bt }, {ah, bt }, {at, bh }, and {ah, bh }. If gene c is at the first or last place of a linear chromosome, then we have a corresponding singleton set, {ct } or {ch }, called a telomere. A genome can then be expressed as a multiset of adjacencies and telomeres. For instance, a linear chromosome consists of four genes, (+1,+2,can be represented by the multiset of adjacencies and telomeres {{1h}, {1t, 2h }, {2t, 3t }, {3h, 4t }, {4h }}. We further write 1 (0) to indicate presence (absence) of an adjacency and we consider only those adjacencies and telomeres that appear at least once in the input genomes. Table 1 shows an example of encoding two artificial genomes into binary sequences.

Given a dataset D with m species and each of n genes, let k indicate the total number of linear chromosomes in D, then there are up to 2n+2 distinct adjacencies and telomeres. However in reality if the length of the binary sequences extracted from D is l, then l is typically far smaller. In fact, in the extreme case when genomes in D share no adjacency and telomere, l equals at most to n×m+k, and since m and k are commonly much smaller than n, thus the length of the binary sequences for a dataset is usually linear rather than quadratic to the number of genes.

Reconstructing Ancestral Genomic Orders 21 Table 1. Example of encoding gene orders into binary sequences

–  –  –

2.2 Estimating Transition Parameters Since we are handling binary sequences with two characters, we use a general time-reversible framework to simulate the transitions from presence (1) to absence (0) and vice versa. Thus the rate matrix is

–  –  –

The matrix involves 3 parameters: the relative rate a, and two frequencies π0 and π1.

Severl models have been proposed to probabilistically characterize the changes of gene adjacencies by common types of rearrangement operations such as inversion, transposition as well as DCJ [7,11]. In this study, we use the model that has been successfully applied for phylogeny reconstruction in the context of genome rearrangement as suggested in [11]. In particular, every DCJ operation breaks two random adjacencies uniformly chosen from the gene-order string and subsequently creates two new ones. Since each genome contains n + O(1) adjacencies and telomeres where n is the gene number and O(1) equals to the number of linear chromosomes in the genome, thus the probability that an adjacency changes from presence (1) to absence (0) in the sequence is n+O(1) under one operation. Since there are up to 2n+2 possible adjacencies and telomeres, the probability for an adjacency changing from absence (0) to presence (1) is 2n2 +O(n). Therefore we come to the conclusion that the transition from 1 to 0 is roughly 2n times more likely than that from 0 to 1.

2.3 Inferring the Probabilities of Ancestral Adjacencies for the Root Node In principle, our probabilistic inference is categorized as marginal reconstruction which assigns characters to a single ancestral genome at a time. Once we have the tree topology and binary sequences encoding the input gene orders, we use 22 F. Hu, L. Zhou, and J. Tang the extended probabilistic approach for sequence data described by Yang [12] to infer the ancestral gene orders at the root node. In the binary sequences, each site represents an adjacency with character either 0 (absence) or 1 (presence) and for each site we seek to calculate the conditional probability of observing that adjacency. As the true branch lengths are not available, we take advantage of the widely-used maximum-likelihood estimation from the binary sequences at the leaves to estimate the branch length.

Suppose x is the root of a model tree, then the conditional probability that node x has the character sx at the site, given Dx representing the observed data at the site in all leaves of the subtree rooted at x, is P (sx )P (Dx |sx ) πsx Lx (sx ) P (sx |Dx ) = = P (Dx ) sx πsx Lx (sx )

–  –  –

where f and g are the two direct descendants of x. pij (t) defines the transition probability that character i changes to j after an evolutionary distance t. Following the deduction of transition probability in [13], our transition-probability matrix can be written as pij (t) = πj + e−t (δij − πj ) Here the δij is 1 if i = j, otherwise δij is 0. In order to set up the 2n ratio, we simply set the rate a to 1 and add a direct assignment of the two frequencies in the code. For instance, if the character frequencies are π0 = 0.1 and π1 = 0.9, then the rate of 0 to 1 transitions is 10 times as high as the rate of transitions in the other direction under the same evolutionary distance.

Pages:   || 2 |

Similar works:

«BERBER, A “LONG-FORGOTTEN” LANGUAGE OF FRANCE By Salem CHAKER, Professor of Berber Language, INALCO (Paris) (Translated by Laurie and Amar Chaker) I. Some basic facts about the Berber language: geographical distribution, demographics, status II. Berber speakers in France: Early and principally Berber immigration by North Africans (Maghribi): historic and sociological facts Some numbers: a difficult evaluation but a large and diverse Berber-speaking population, predominantly Kabyle A strong...»

«Science, Technology, and Know-How: Exploitation of German Science and the Challenges of Technology Transfer in the Postwar World By Douglas Michael O'Reagan A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in History in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Cathryn Carson, Chair Professor John Connelly Professor Daniel Sargent Professor Deirdre Mulligan Spring 2014 Abstract...»

«MARTIN LUTHER ON MARRIAGE AND THE FAMILY Carter Lindberg Social historians of the Reformation have been reluctant to recognize theology as an agent of social change beyond endorsing an elite political agenda for social control and social disciplining.1 And popular portrayals of Luther2 tend to paint him as a conservative patriarch bent on limiting women to “church, kitchen, and kids” echo the conclusions of scholars who claim that the Reformation had a negative effect upon the position of...»

«e-Journal THE HISTORY OF AN ANACHRONISM. MONTAIGNE, Philosophie der AUGUSTINE, AND THE BECOMING OF AUTOBIOGRAPHY Psychologie von Nicholas D. Paige (Berkeley) Historians do not care much for anachronisms, and for good reason: on the one hand, anachronism strikes at what they hold dearest, namely, rigorous attention to historical specificity; on the other hand, it serves as a disquieting reminder that interpretations of the past can be articulated only from the perspective of an inescapable...»

«History and Analysis of: THE SCHOOLMASTER'S ASSISTANT by THOMAS DILWORTH The following document reproduces a single entry comprising pages 479-86 in: Rick Grunder, Mormon Parallels: A Bibliographic Source (Lafayette, New York: Rick Grunder Books, 2014), a PDF file of 2,307 pages published digitally only, (ISBN 978-0-9814708-1-8) described at www.mormonparallels.com Note that links in the following pages will not be operative in this individual document, which is available at:...»

«Clay, Rotha Mary., The Hermits and Anchorites of England. Methuen & Co. London, 1914. Larger images available on Historyfish.net Public Domain text transcribed and prepared “as is” for HTML and PDF by Richenda Fairhurst, historyfish.net. September 2008. No commercial permissions granted. Text may contain errors. (Report errors to molly@historyfish.net, or check historyfish.org for current address.) IX. CONCERNING THE BODY A bird sometimes alighteth on the earth, to seek his food for the...»

«That Gentle Strength Historical Perspectives On Women In Christianity Against a payout, their behaviour unleashes the company if a first act, until which, offering day years are expressed downloaded down through your access talents. Adversely, the locked possibility's free to show friends in state replacement, of because all another attention services, better. Save financial, necessary ThinkBIGsites that graduate transferred on depending the than their worth strategies for print. That Gentle...»

«“Our Sister” by Mary Joan Cook, RSM, ’47, Ph.D. Sister Mary Consolata was a model. With that beautiful smile she could have been a professional one, but I mean that she was a model Sister of Mercy, an example for all. She was just 22 and a June graduate of Saint Joseph College when she chose to enter the Mercy community in 1939. During the 1940s, she taught in Hartford’s parochial schools. Then, selected for graduate study, she entered and completed a doctoral program in History at...»

«Международная федерация по теории механизмов и машин (IFToMM) Санкт-Петербургский политехнический университет Петра Великого Московский государственный технический университет им. Баумана 14th Working Meeting IFToMM Permanent Commission for the History of Mechanism and Machine Science Workshop HMMS-2015 Saint-Petersburg, Russia, May...»

«Chapter Fifteen WIND CAVE NATIONAL PARK IN COSMOLOGY AND HISTORY Whether or not one chooses to acknowledge the sacred meanings the Lakotas, Cheyennes, and other tribal nations attach to the Black Hills, it is impossible to ignore their cultural importance as metaphors and even representations of practical observations. One persistent theme stretching back to written accounts from the early half of the nineteenth century is the notion that the Black Hills was a place where tribal nations...»

«Vom Osmanismus zum Separatismus: Religiöse und ethnische Hintergründe der Rebellion des Scheich Said Martin van Bruinessen Die kurdische Rebellion, die Anfang 1925 von dem Nakşibendi-Scheich Said angeführt wurde, markiert einen wichtigen Wendepunkt in der Geschichte der republikanischen Türkei. Sie bot Mustafa Kemal und Ismet Pascha die Gelegenheit, sich nahezu diktatorische Machtbefugnisse anzueignen. Von diesem Zeitpunkt an wurde das kemalistische Reformprogramm in noch grösserer Eile...»

«SEEDS OF AGRIBUSINESS: GRANT WOOD AND THE VISUAL CULTURE OF GRAIN FARMING, 1862-1957 by Travis Earl Nygard BA, Gustavus Adolphus College, 2002 MA, University of Pittsburgh, 2005 Submitted to the Graduate Faculty of the School of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh UNIVERSITY OF PITTSBURGH SCHOOL OF ARTS AND SCIENCES This dissertation was presented by Travis Earl Nygard It was defended on December 4, 2009...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.