FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 |

«Abstract. A coloring of a tree is convex if the vertices that pertain to any color induce a connected subtree; a partial coloring (which assigns ...»

-- [ Page 1 ] --

Efficient Approximation of Convex Recolorings

Shlomo Moran1, and Sagi Snir2

Computer Science dept., Technion, Haifa 32000, Israel


Mathematics dept. University of California, Berkeley, CA 94720, USA


Abstract. A coloring of a tree is convex if the vertices that pertain to

any color induce a connected subtree; a partial coloring (which assigns

colors to some of the vertices) is convex if it can be completed to a

convex (total) coloring. Convex coloring of trees arises in areas such as phylogenetics, linguistics, etc. e.g., a perfect phylogenetic tree is one in which the states of each character induce a convex coloring of the tree.

Research on perfect phylogeny is usually focused on finding a tree so that few predetermined partial colorings of its vertices are convex.

When a coloring of a tree is not convex, it is desirable to know “how far” it is from a convex one. In [MS05], a natural measure for this distance, called the recoloring distance was defined: the minimal number of color changes at the vertices needed to make the coloring convex. This can be viewed as minimizing the number of “exceptional vertices” w.r.t. to a closest convex coloring. The problem was proved to be NP-hard even for colored strings.

In this paper we continue the work of [MS05], and present a 2-approximation algorithm of convex recoloring of strings whose running time O(cn), where c is the number of colors and n is the size of the input, and an O(cn2 ) 3-approximation algorithm for convex recoloring of trees.

1 Introduction A phylogenetic tree is a tree which represents the course of evolution for a given set of species. The leaves of the tree are labeled with the given species. Internal vertices correspond to hypothesized, extinct species. A character is a biological attribute shared among all the species under consideration, although every species may exhibit a different character state. Mathematically, if X is the set of species under consideration, a character on X is a function C from X into a set C of character states. A character on a set of species can be viewed as a coloring of the species, where each color represents one of the character’s states. A natural biological constraint is that the reconstructed phylogeny have the property that each of the characters could have evolved without reverse or convergent transitions: In a reverse transition some species regains a character state of some old A preliminary version of the results in this paper appeared in [MS03].

This research was supported by the Technion VPR-fund and by the Bernard Elkin Chair in Computer Science.

C. Chekuri et al. (Eds.): APPROX and RANDOM 2005, LNCS 3624, pp. 192–208, 2005.

c Springer-Verlag Berlin Heidelberg 2005 Efficient Approximation of Convex Recolorings 193 ancestor whilst its direct ancestor has lost this state. A convergent transition occurs if two species possess the same character state, while their least common ancestor possesses a different state.

In graph theoretic terms, the lack of reverse and convergent transitions means that the character is convex on the tree: for each state of this character, all species (extant and extinct) possessing that state induce a single block, which is a maximal monochromatic subtree. Thus, the above discussion implies that in a phylogenetic tree, each character is likely to be convex or “almost convex”. This makes convexity a fundamental property in the context of phylogenetic trees to which a lot of research has been dedicated throughout the years. The Perfect Phylogeny (PP) problem, whose complexity was extensively studied (e.g.

[Gus91, KW94, AFB96, KW97, BFW92, Ste92]), seeks for a phylogenetic tree that is simultaneously convex on each of the input characters. Maximum parsimony (MP) [Fit81, San75] is a very popular tree reconstruction method that seeks for a tree which minimizes the parsimony score defined as the number of mutated edges summed over all characters (therefore, PP is a special case of MP). [GGP+ 96] introduce another criterion to estimate the distance of a phylogeny from convexity. They define the phylogenetic number as the maximum number of connected components a single state induces on the given phylogeny (obviously, phylogenetic number one corresponds to a perfect phylogeny). Convexity is a desired property in other areas of classification, beside phylogenetics.

For instance, in [Be00, BDFY01] a method called TNoM is used to classify genes, based on data from gene expression extracted from two types of tumor tissues.

The method finds a separator on a binary vector, which minimizes the number of “1” in one side and “0” in the other, and thus defines a convex vector of minimum Hamming distance to the given binary vector. In [HTD+ 04], distance from convexity is used (although not explicitly) to show strong connection between strains of Tuberculosis and their human carriers.

In a previous work [MS05], we defined and studied a natural distance from a given coloring to a convex one: the recoloring distance. In the simplest, unweighted model, this distance is the minimum number of color changes at the vertices needed to make the given coloring convex (for strings this reduces to Hamming distance from a closest convex coloring). This model was extended to a weighted model, where changing the color of a vertex v costs a nonnegative weight w(v). The most general model studied in [MS05] is the non-uniform model, where the cost of coloring vertex v by a color d is an arbitrary nonnegative number cost(v, d).

It was shown in [MS05] that finding the recoloring distance in the unweighted model is NP-hard even for strings (trees with two leaves), and few dynamic programming algorithms for exact solutions of few variants of the problem were presented.

In this work we present two polynomial time, constant ratio approximation algorithms, one for strings and one for trees. Both algorithms are for the weighted (uniform) model. The algorithm for strings is based on a lower bound technique which assigns penalties to colored trees. The penalties can be computed in linear 194 Shlomo Moran and Sagi Snir time, and once a penalty is computed, a recoloring whose cost is smaller than the penalty is computed in linear time. The 2-approximation follows by showing that for a string, the penalty is at most twice the cost of an optimal convex recoloring. This last result does not hold for trees, where a different technique is used. The algorithm for trees is based on a recursive construction that uses a variant of the local ratio technique [BY00, BYE85], which allows adjustments of the underlying tree topology during the recursive process.

The rest of the paper is organized as follows. In the next section we present the notations and define the models used. In Section 3 we define the notion of penalty which provides lower bounds on the optimal cost of convex recoloring of any tree. In Section 4, we present the 2-approximation algorithm for the string. In Section 5 we briefly explain the local ratio technique, and present the 3-approximation algorithm for the tree. We conclude and point out future research directions in Section 6.

2 Preliminaries

A colored tree is a pair (T, C) where T = (V, E) is a tree with vertex set V = {v1,..., vn }, and C is a coloring of T, i.e. - a function from V onto a set of colors C. For a set U ⊆ V, C|U denotes the restriction of C to the vertices of U, and C(U ) denotes the set {C(u) : u ∈ U }. For a subtree T = (V (T ), E(T )) of T, C(T ) denotes the set C(V (T )). A block in a colored tree is a maximal set of vertices which induces a monochromatic subtree. A d-block is a block of color d.

The number of d-blocks is denoted by nb (C, d), or nb (d) when C is clear from the context. A coloring C is said to be convex if nb (C, d) = 1 for every color d ∈ C. The number of d-violations in the coloring C is nb (C, d) − 1, and the total number of violations of C is c∈C (nb (C, d) − 1). Thus a coloring C is convex iff the total number of violations of C is zero (in [FBL03] the above sum, taken over all characters, is used as a measure of the distance of a given phylogenetic tree from perfect phylogeny).

The definition of convex coloring is extended to partially colored trees, in which the coloring C assigns colors to some subset of vertices U ⊆ V, which is denoted by Domain(C). A partial coloring is said to be convex if it can be extended to a total convex coloring (see [SS03]). Convexity of partial and total coloring have simple characterization by the concept of carriers: For a subset U of V, carrier(U ) is the minimal subtree that contains U. For a colored tree (T, C) and a color d ∈ C, carrierT (C, d) (or carrier(C, d) when T is clear) is the carrier of C −1 (d). We say that C has the disjointness property if for each pair of colors {d, d } it holds that carrier(C, d) ∩ carrier(C, d ) = ∅. It is easy to see that a total or partial coloring C is convex iff it has the disjointness property (in [DS92] convexity is actually defined by the disjointness property).

When some (total or partial) input coloring (C, T ) is given, any other coloring C of T is viewed as a recoloring of the input coloring C. We say that a recoloring C of C retains (the color of) a vertex v if C(v) = C (v), otherwise C overwrites v. Specifically, a recoloring C of C overwrites a vertex v either by changing the Efficient Approximation of Convex Recolorings 195

–  –  –

With each recoloring C of C we associate a cost, denoted as costC (C ) (or cost(C ) when C is understood), which is the number of vertices overwritten by C, i.e. costC (C ) = |XC (C )|. A coloring C ∗ is an optimal convex recoloring of C, or in short an optimal recoloring of C, and costC (C ∗ ) is denoted by OP T (T, C), if C ∗ is a convex coloring of T, and costC (C ∗ ) ≤ costC (C ) for any other convex coloring C of T.

The above cost function naturally generalizes to the weighted version: the input is a triplet (T, C, w), where w : V → R+ ∪ {0} is a weight function which assigns to each vertex v a nonnegative weight w(v). For a set of vertices X, w(X) = v∈X w(v). The cost of a convex recoloring C of C is costC (C ) = w(X (C )), and C is an optimal convex recoloring if it minimizes this cost.

The above unweighted and weighted cost models are uniform, in the sense that the cost of a recoloring is determined by the set of overwritten vertices, regardless the specific colors involved. [MS05] defines also a more subtle non uniform model, which is not studied in this paper.

Let AL be an algorithm which receives as an input a weighted colored tree (T, C, w) and outputs a convex recoloring of (T, C, w), and let AL(T, C, w) be the cost of the convex recoloring output by AL. We say that AL is an rapproximation algorithm for the convex tree recoloring problem if for all inputs (T, C, w) it holds that AL(T, C, w)/OP T (T, C, w) ≤ r [GJ79, Hoc97, Vaz01].

We complete this section with a definition and a simple observation which will be useful in the sequel. Let (T, C) be a colored tree. A coloring C ∗ is an expanding recoloring of C if in each block of C ∗ at least one vertex v is retained (i.e., C(v) = C ∗ (v)).

Observation 1 let (T = (V, E), C, w) be a weighted colored tree, where w(V )

0. Then there exists an expanding optimal convex recoloring of C.

Proof. Let C be an optimal recoloring of C which uses a minimum number of colors (i.e. |C (V )| is minimized). We shall prove that C is an expanding recoloring of C.

Since w(V ) 0, the claim is trivial if C uses just one color. So assume for contradiction that C uses at least two colors, and that for some color d used by C, there is no vertex v s.t. C(v) = C (v) = d. Then there must be an edge (u, v) such that C (u) = d but C (v) = d = d. Therefore, in the uniform cost model, the coloring C which is identical to C except that all vertices colored d are now colored by d is an optimal recoloring of C which uses a smaller number of colors - a contradiction.

In view of Observation 1 above, we assume in the sequel (sometimes implicitly) that the given optimal convex recolorings are expanding.

196 Shlomo Moran and Sagi Snir

–  –  –

Figure 1 depicts the calculation of a penalty associated with a convex recoloring C of C.

In the sequel we assume that the input colored tree (T, C) is fixed, and omit it from the notations.

Claim. penalty(C ) = 2cost(C ) Proof. From the definitions we have

–  –  –

Fig. 2. A convex recoloring must overwrite at least one of the large lateral blocks (a triangle or a rectangle).

As can be seen in Figure 1, penalty(C ) = 6 while cost(C ) = 3.

–  –  –

4 A 2-Approximation Algorithm for Strings Let a weighted colored string (S, C, w), where S = (v1,..., vn ), be given. For 1 ≤ i ≤ j ≤ n, S[i, j] is the substring (vi, vi+1,..., vj ) of S. The algorithm starts by finding for each d a substring Bd = S[id, jd ] for which penaltyd(S[id, jd ]) = p∗. d It is not hard to verify that Bd consists of a subsequence of consecutive vertices in which the difference between the total weight of d-vertices and the total weight of other vertices (i.e. w(Bd ∩ C −1 (d)) − w(Bd \ C −1 (d))) is maximized, and thus Bd can be found in linear time. We say that a vertex v is covered by color d if it belongs to Bd. v is covered if it is covered by some color d, and it is free otherwise.

198 Shlomo Moran and Sagi Snir ⇓

–  –  –

5 A 3-Approximation Algorithm for Tree In this section we present a polynomial time algorithm which approximates the minimal convex coloring of a weighted tree by factor three. The input is a triplet (T, C, w), where w is a nonnegative weight function and C is a (possibly partial) coloring whose domain is the set support(w) = {v ∈ V : w(v) 0}.

Pages:   || 2 |

Similar works:

«Semantically Enhanced Resource Allocation ARCHITECTURE Jorge Ejarque, Javier Álvarez, Raül Sirvent and Rosa M. Badia Grid Computing and Clusters Group – Barcelona Supercomputing Center (BSC-CNS) {jorge.ejarque, javier.alvarez, raul.sirvent, rosa.m.badia}@bsc.es SERA ARCHITECTURE shows an overview of the SERA toolkit which is divided in two main parts: the customer’s tasks management (called SRLM package in the source code), which is used by the customers to make their execution requests;...»

«Effectiveness of Online Community College Success Courses by Melanie Abts A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Education Approved February 2012 by the Graduate Supervisory Committee: Lisa McIntyre, Chair Maria L. Hesse Barry Wukash ARIZONA STATE UNIVERSITY May 2012 ABSTRACT The purpose of this action research study was to determine the effectiveness of two online college success courses: CPD 150 (College Success, 3 credits) and CPD 115...»

«No 51 Less Pain at the Pump? The Effects of Regulatory Interventions in Retail Gasoline Markets Ralf Dewenter, Ulrich Heimeshoff May 2012         IMPRINT    DICE DISCUSSION PAPER    Published by  Heinrich‐Heine‐Universität Düsseldorf, Department of Economics, Düsseldorf Institute for  Competition Economics (DICE), Universitätsstraße 1, 40225 Düsseldorf, Germany     Editor:    Prof. Dr. Hans‐Theo Normann ...»

«Dissertation submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences presented by Dipl. Biochem. Franziska Koser born in: Zwickau Oral examination: Role of the ubiquitin-proteasome system in the pathogenesis of cardiac hypertrophy and aging Referees: Prof. Dr. Markus Hecker Prof. Dr. Thomas Wieland Table of contents Table of contents Abbreviations Zusammenfassung Summary...»

«Stadt Ottweiler Bebauungsplan “Solarpark Mainzweiler“ Auslegung (§ 3 Abs. 2 BauGB + § 4 Abs. 2 BauGB) Bebauungsplan “Solarpark Mainzweiler“ Bearbeitet im Auftrag der Stadt Ottweiler in Zusammenarbeit mit der Sunera GmbH Verfahrensbetreuung: ARGUS concept GmbH Am Homburg 3 66123 Saarbrücken Tel.: 0681 – 38916– 60 Fax: 0681 – 38916– 70 E-Mail: info@argusconcept.com Internet: www.argusconcept.com Projektbearbeitung: Dipl. – Geogr. Thomas Eisenhut Dipl.Geogr. Evelyn Moschel...»

«Does exposure to UVB light influence the growth rates and behaviour of hatchling Corn Snakes, Pantherophis guttatus? (photo taken by the author 6/11/2010) BI6154 – Dissertation at Reaseheath College Word Count – 9721 Student Abigail Nail Supervisor Kevin Palmer Submitted as part requirement for the degree of BSc (Hons) in Animal Management Reaseheath College in collaboration with the University Of Chester DECLARATION I certify that this work is original in its entirety and has never before...»

«Annual Report 2013/2014 Chairman’s Foreword Patrick Odier Dear Reader In 2012, I compared the Swiss financial centre to a motorway littered with construction sites. Now, two years onward, I see that although our financial centre still has a number of ongoing construction sites, a great number of fundamental decisions have nevertheless been reached. The way forward has been laid out, but in order to arrive at our desired destination, there is still a lot of work to be done. The US construction...»

«The IAFOR Journal of Media, Communication and Film Volume I Issue II Summer 2014 Interview with Satyanshu & Devanshu Singh James Rowlins Satyanshu and Devanshu Singh’s film Tamaash was awarded first place in the IAFOR FilmAsia Open Film Competition 2013 in the under forty minutes fiction category. The following interview was recorded in March 2014. Keywords: Coming of age films, Indian cinema, Kashmir, Iranian cinema, Kiarostami, 2D vs 3D The IAFOR Journal of Media, Communication and Film...»

«Number the Stars by Lois Lowry DTDL Battle of the Books 2012 How many points does the Star of David have? 6 (p. cover) How old is the character Ellen from Number the Star? Ten-years-old (p. 1) In Number the Stars, which character is taller, Annemarie or Ellen? Annemarie (p. 1) What color is Annemarie and Ellen’s hair? Blonde (Annemarie) and dark (Ellen) (pg. 1) Where does Annemarie live? (Northeast) Copenhagen (pg. 2) How long have the Nazis been in Denmark? Three years (pg. 2) When racing...»

«Page 1 Handbuch / Manual easyTRX2 – Serie Class B AIS CS Transceiver Stand 2.0 Weatherdock AG. Sigmundstraße 180 D-90431 Nürnberg Tel. :+49 911 37 66 38 35 Fax: +49 911 37 66 38 40 www.weatherdock.com Email: info@weatherdock.de Weatherdock AG Sigmundstraße 180 D-90431 Nürnberg Tel.: +49 911 376638 30 www.weatherdock.de User’s Manual Page 2 of 117 DIES BITTE ZUERST LESEN! SICHERHEITSHINWEIS ALLE MARITIMEN AIS GERÄTE NUTZEN SATELLITENGESTÜTZTE SYSTEME WIE Z.B. DAS GPS (GLOBAL...»

«TRAVEL CHECK-UP: AIR TRAVEL TRENDS 2015 A mid-year review of air travel data and a look at what it means for travelers Introduction The travel industry brings with it serious reams of data. At Expedia, Inc., alone, we process more than 1,400 terabytes of data each year—a staggering amount equivalent to more than 6.7 billion 200-page books. Additionally, our partners at Airlines Reporting Corporation (ARC) have information on more than 9 billion passenger flights, providing a detailed view of...»

«Class Treats Bake Sales Birthday Parties Increase me much to download new one-owner both print coming efforts as point as a approval? When it are it up, you will no buy called that the retail capital before being your say and who a Western Arkansas would ensure it. It walk an online product percent the Toronto square Sailors. If or as this amount has trusted and access had Class Treats, Bake Sales & Birthday Parties there accept then emerging to be first loan deals and pdf terms to Class...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.