DMCA
Genome-wide analysis of core cell cycle genes in Arabidopsis. Plant Cell 14: 903–916 (2002)
Citations: | 44 - 4 self |
BibTeX
@MISC{Vandepoele02genome-wideanalysis,
author = {Klaas Vandepoele and Jeroen Raes and Lieven De Veylder and Pierre Rouzé and Stephane Rombauts and Dirk Inzé},
title = {Genome-wide analysis of core cell cycle genes in Arabidopsis. Plant Cell 14: 903–916},
year = {2002}
}
OpenURL
Abstract
Cyclin-dependent kinases and cyclins regulate with the help of different interacting proteins the progression through the eukaryotic cell cycle. A high-quality, homology-based annotation protocol was applied to determine the core cell cycle genes in the recently completed Arabidopsis genome sequence. In total, 61 genes were identified belonging to seven selected families of cell cycle regulators, for which 30 are new or corrections of the existing annotation. A new class of putative cell cycle regulators was found that probably are competitors of E2F/DP transcription factors, which mediate the G1-to-S progression. In addition, the existing nomenclature for cell cycle genes of Arabidopsis was updated, and the physical positions of all genes were compared with segmentally duplicated blocks in the genome, showing that 22 core cell cycle genes emerged through block duplications. This genome-wide analysis illustrates the complexity of the plant cell cycle machinery and provides a tool for elucidating the function of new family members in the future. INTRODUCTION Cell proliferation is controlled by a universally conserved molecular machinery in which the core key players are Ser/ Thr kinases, known as cyclin-dependent kinases (CDKs). CDK activity is regulated in a complex manner, including phosphorylation/dephosphorylation by specific kinases/phosphatases and association with regulatory proteins. Although many cell cycle genes of plants have been identified in the last decade (for review, see Nevertheless, a genome-wide inventory of all core cell cycle genes is possible only when the available raw sequence data are annotated correctly. Although genome-wide annotations of organisms sequenced by large consortia have produced huge amounts of information that benefits the scientific community, this automated high-throughput annotation is far from optimal Generally, annotation is performed in two steps: first, structural annotation, which aims to find and characterize biologically relevant elements within the raw sequence (such as exons and translation starts); and second, functional annotation, in which biological information is attributed to the gene or its elements. Unfortunately, there are some problems inherent to both. When structural annotation is performed, the first problem occurs when no cDNA or expressed sequence tag (EST) information is available, which is the case for 60% of all Arabidopsis genes (Arabidopsis Genome Initiative, 2000). Then, one has to resort to intrinsic gene prediction software, which remains limited, although much improvement has been made in the last few years. Errors range from wrongly determined splice sites or start codons, to so-called spliced 1 To whom correspondence should be addressed. E-mail diinz@ gengenp.rug.ac.be; fax 32-9-2645349. Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010445. 904 The Plant Cell (one gene predicted as two) or fused (two genes predicted as one) genes, to completely missed or nonexistent predicted genes . In addition, no general and well-defined prediction protocol is used by the different annotation centers, which results in the generation of redundant, nonuniform structural annotations. Furthermore, clear information is lacking on the methods and programs used as well as the motivation for applying special protocols, making it impossible to trace the annotation process. The problem with functional annotation is related to the difficulty of linking biological knowledge to a gene. Such a link is made generally on the basis of sequence similarity that is derived either from full-length sequence comparisons or by means of multiple alignments, patterns, and domain searches. Of major concern is the origin of the assigned function, because the transfer of low-quality or faulty functional annotation information propagates incorrect annotations in the public databases. Even correct annotations can be disseminated erroneously: one can easily imagine the transfer of a good functional assignment from a multidomain protein to a protein that has only one of the domains. This problem can be avoided using only experimentally derived information to predict unambiguously a gene's structure and function. Here, we applied a homology-based annotation using experimental references to build a full catalog with 61 core cell cycle genes of Arabidopsis. In total, 30 genes are either new or genes for which the previous annotation was incorrect. Based on phylogenetic analysis, we updated and rationalized their nomenclature. Furthermore, relations between gene family members were correlated with large segmental duplications. RESULTS Strategy To correctly annotate all core cell cycle genes, a strategy was defined that uses as much reliable information as possible, combining experimentally derived data with the best prediction tools available for Arabidopsis (see Methods). First, experimental representatives for each family were used as bait to locate regions of interest on the different chromosomes. For these selected regions, genes were predicted and candidate genes were validated; the presence of mandatory domains in their gene products was determined by aligning them with the experimental representatives; if necessary, the predicted gene structure was modified using the family-related characteristics or ESTs. In some cases, however, this approach did not allow us to conclude whether a region of interest really coded for a potential gene or whether a candidate gene was a core cell cycle gene. To clarify such situations, a more integrated analysis was performed. First, the members of every family were used to build a profile for that specific family. By taking the new predicted genes into account when creating the profile, a more "flexible" (i.e., all diversity within a class/subclass being represented) and plant-specific profile could be established. With this new profile, novel family members were sought within a collection of genome-wide predicted Arabidopsis proteins. Subsequently, the predicted gene products were again validated or modified by comparing them with those of other family members in a multiple alignment. With this additional approach, we could determine clearly whether the predicted genes were similar to a certain class of cell cycle genes. To characterize subclasses within the gene families, phylogenetic trees were generated that included reference cell cycle genes from other plants and known genes from Arabidopsis. By different methods and statistical analysis of nodes, the significance of the derived classification was tested. Based on the position on the tree and the presence of class-specific signatures, genes were named according to the proposed nomenclature rules for cell cycle genes Annotation and Nomenclature CDK In yeast, one CDK is sufficient to drive cells through all cell cycle phases, whereas multicellular organisms evolved to use a family of related CDKs, all with specific functions. In plants, two major classes of CDKs, known as A-type and B-type CDKs, have been studied to date. The A-type CDKs regulate both the G1-to-S and G2-to-M transitions, whereas the B-type CDKs seem to control the G2-to-M checkpoint only The previously described CAK homolog of Arabidopsis Arabidopsis Cell Cycle Genes 905 The Plant Cell (cak1At) differs substantially from the known rice CAK, R2 Cyclins Monomeric CDKs have no kinase activity and must associate with regulatory proteins called cyclins to be activated. Because cyclin protein levels fluctuate in the cell cycle, cyclins are the major factors that determine the timing of CDK activation. Cyclins can be grouped into mitotic cyclins (designated A-and B-type cyclins in higher eukaryotes and CLBs in budding yeast) and G1-specific cyclins (designated D-type cyclins in mammals and CLNs in budding yeast). H-type cyclins regulate the activity of the CAKs. All four types of cyclins known in plants were identified, mostly by analogy to their human counterparts. For Arabidopsis, at present, four A-type, five B-type, five D-type, but no H-type cyclins have been described B-type cyclins are subdivided into two subclasses, B1 and B2. In total, Arabidopsis contains nine B-type cyclins, of which four belong to the B1 class (CYCB1;1, CYB1;2, CYCB1;3, and CYCB1;4) and four belong to the B2 class (CYCB2;1, CYCB2;2, CYCB2;3, and CYCB2;4). One gene could not be attributed to either the B1 or the B2 class, although it clearly contained a B-type-like cyclin box in combination with the B-type-specific HxKF signature. On the other hand, no B1-or B2-like destruction box was detected. The phylogenetic position of this gene within the B cluster depended on the number of positions used for the analysis. Because cyclin sequences are known to be saturated with substitutions In addition to the five D-type cyclins described previously (CYCD1;1, CYCD2;1, CYCD3;1, CYCD3;2, and CYCD4;1), five new D-type genes were detected. Based on their phylogenetic positions, two of these genes were assigned to the D3 class (CYCD3;3 and CYCD3;4) and one was assigned to the D4 class (CYCD4;2). The remaining new D-type cyclins were subdivided further into classes CYCD5, CYCD6, and CYCD7 according to their phylogenetic positions. It is remarkable that CYCD4;2 and CYCD6;1 do not contain the LxCxE retinoblastoma (Rb) binding motif, whereas CYCD5;1 contains a divergent Rb binding motif (FxCxE) located at the N terminus. The biological functions of cyclins lacking the conserved Rb binding motif remain unclear. One Arabidop- e EST BE528080 found for the first exon completes the structural annotation. f Gene structure was determined using partial mRNA L27224 and AV546264. g Gene structure was determined using two cDNA sequences, confirming the manual annotation. Arabidopsis Cell Cycle Genes 907 sis gene was found with high sequence similarity to cyclin H of poplar (71%) and rice (66%). Aligning all cyclins allowed us to identify the cyclin and destruction box consensus sequences for A-, B-, D-, and H-type cyclins In addition to the cyclins described above, two presumed pseudogenes were predicted that were very similar to B-type cyclins. The precise number of pseudogenes for the seven selected families remains unclear, because the detection of pseudogenes depends on the degree of conservation in the gene structure and the degree of detection by prediction tools of these degenerated structures. CDK/Cyclin Interactors and Regulatory Proteins CDK subunit (CKS) proteins act as docking factors that mediate the interaction of CDKs with putative substrates and regulatory proteins. Besides the CDK subunit gene in Arabidopsis described previously (Arath;CKS1; De Veylder et al., 1997), a second CKS gene was found (Arath;CKS2) with sequence (83% identical and 90% similar amino acids) and gene structure (number and size of exons and introns) very similar to those of Arath;CKS1 ( Upon the occurrence of stress or the perception of antiproliferation agents, the CDK/cyclin complexes are repressed by the CDK inhibitor (CKI) proteins. In mammals, two different classes of CKIs exist (the INK4 and the Kip/Cip families), each with its own CDK binding specificity and protein structure. Seven CKI genes belonging to the group of Kip/Cip CKIs have been described previously for Arabidopsis, designated KRP1 to KRP7 . No extra KRPs were detected in the complete genome, and no plant counterparts of the INK4 family were found. CDK/cyclin activity is regulated negatively by phosphorylation of the CDK subunit by the WEE1 kinase and positively when the inhibitory phosphate groups are removed by the CDC25 phosphatase. A single WEE1 gene was identified on chromosome 1. The WEE1 kinase was annotated using two cDNA sequences that were at our disposal (L. De Veylder, unpublished results) and has its highest homology with the WEE1 kinase of maize, showing 56% similarity to the gene product of a partial mRNA 908 The Plant Cell Rb and E2F/DP Rb and the E2F/DP proteins are key regulators that control the start of DNA replication. When the E2F/DP transcription factors are bound to Rb, they are inactive, but they become active when Rb is phosphorylated by G1-specific CDK/cyclin complexes, stimulating the transcription of genes needed for G1-to-S and S-phase progression. Only one Rb could be identified in the Arabidopsis genome; it was located on chromosome 3. E2F genes are known for tobacco, carrot, and wheat The third group contains three new genes with an internal similarity of 59% and a sequence similarity with both E2F (21%) and DP genes (18%), initially indicating some kind of relation with the E2F/DP genes. When the boxes present in the E2F genes (DNA binding, dimerization, Marked, and Rb binding boxes) and the DP genes (DNA binding and dimerization boxes) were compared with those in the three new genes, only a DNA binding domain was found, but in duplicate (