## Incomplete directed perfect phylogeny (2000)

### Cached

### Download Links

- [www.cs.tau.ac.il]
- [www.math.tau.ac.il]
- [www.math.tau.ac.il]
- DBLP

### Other Repositories/Bibliography

Venue: | Siam Journal of Computing |

Citations: | 18 - 2 self |

### BibTeX

@INPROCEEDINGS{Shamir00incompletedirected,

author = {R. Shamir and R. Sharan},

title = {Incomplete directed perfect phylogeny},

booktitle = {Siam Journal of Computing},

year = {2000},

pages = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. Perfect phylogeny is one of the fundamental models for studying evolution. We investigate the following generalization of the problem: The input is a species-characters matrix. The characters are binary and directed, i.e., a species can only gain characters. The difference from standard perfect phylogeny is that for some species the state of some characters is unknown. The question is whether one can complete the missing states in a way admitting a perfect phylogeny. The problem arises in classical phylogenetic studies, when some states are missing or undetermined. Quite recently, studies that infer phylogenies using inserted repeat elements in DNA gave rise to the same problem. The best known algorithm for the problem requires O(n2m) time for m characters and n species. We provide a near optimal ~O(nm)-time algorithm for the problem. 1 Introduction When studying evolution, the divergence patterns leading from a single ancestor species to its contemporary descendants are usually modeled by a tree structure. Extant species correspond to the tree leaves, while their common progenitor corresponds to the root of this phylogenetic tree. Internal nodes correspond to hypothetical ancient species, which putatively split up and evolved into distinct species. Tree branches model changes through time of the hypothetical ancestor species. The common case is that one has information regarding the leaves, from which the phylogenetic tree is to be inferred. This task, called phylogenetic reconstruction (cf. [7]), was one of the first algorithmic challenges posed by biology, and the computational community has been dealing with problems of this flavor for over three decades (see, e.g., [12]). In the character-based approach to tree reconstruction, contemporary species are described by their attributes or characters. Each character takes on one of several possible states. The input is represented by a matrix A where aij is the state of character j in species i, and the i-th row is the character vector of species i. The output sought is a hypothesis regarding evolution, i.e., a phylogenetic tree along with the suggested character-vectors of the internal nodes. This output must satisfy properties specified by the problem variant.