With sequencing technology breakthroughs, the amount of raw biological information grows exponentially. However, our understanding of its biological meaning is lagging behind. Much of the biological function of these sequences remains to be identified and characterized. Classical molecular biology approaches can provide concrete evidence and lead to reliable conclusions. Nevertheless, with the huge amount of raw sequence data that is already available, computational approaches are showing their strength in decoding information to form new hypotheses. Taking the advantage of modern computing power, bioinformatic approaches allow rapid extraction of knowledge from databases. Information extracted with bioinformatic approaches is often "predictive" or "putative", it depends on wet lab experiments to yield solid knowledge. Research in our lab reflects a combined computational and experimental approach.
Genomic Analysis of Functional Interactions Involving Transposable Elements (TEs)
My lab is studying TE activity and the role of TEs in genome function and evolution using a combination of computational and experimental approaches. Both plant and animal genomes are the subjects of my studies. TEs are an integral part of a genome; they interact with each other and with other components of the genome. Understanding these interactions is the key to the knowledge of genome function and evolution. I intend to build genome wide interaction networks of TEs in plant and animal genomes. I will focus on three types of interactions: (1) between TEs and host genome defense systems such as DNA methylation and RNAi; (2) among TE elements belonging to the same superfamily; (3) between TE insertions and host gene regulation. An integrative approach involving computational modeling will be used to synthesize these data for the studies at upper levels of systems biology to understand how these interactions contributed to functions of the organisms. In addition to quality publications, I expect to obtain TEs that can potentially used for gene discovery, transgenesis and gene therapy.
Transposition Mechanism of Animal Miniature Inverted Repeat Transposable Elements (MITEs)
MITEs are short (normally < 500 bp) class II elements with remarkably high copy numbers in plant and animal genomes. Their discovery in the early 1990's attracted much attention because of their mysterious origin, transposition and amplification mechanism and influence on genome function and evolution. Despite the abundance of MITEs in animals, genome wide analyses of animal MITEs are currently limited to database mining. Thus, based on my experience in plant MITE studies, I plan to perform functional analysis of MITEs in animal genomes. These studies will not only complement and extend our knowledge on plant MITEs but also deepen our understanding of animal genome evolution.
In addition to Tourist and Stowaways, I also plan to study noncanonical MITEs to understand how different types of MITEs are able to attain high copy numbers. To achieve these research objectives, I plan to: (1) identify MITE candidates for analysis, (2) identify their related autonomous elements, (3) screen for functional partners between the MITEs and autonomous elements, and (4) perform detailed dissection of the requirements for MITE transposition by site-directed mutagenesis.
Development and Implementation of Computational Algorithms for Customized Solutions
(1) Automated identification of novel TE families and superfamilies
Despite the abundance of TEs in the genomes of higher eukaryotes, there are very few TE superfamilies. To date, only 10 DNA superfamilies have been identified. However this number has been growing slowly but steadily, largely due to the availability of genome sequence databases and improved methods to search for TEs. Based on my programming skills and experience in data analyses, I plan to develop and improve TE mining algorithms for automated genome wide sequence analysis in order to discover novel TE elements from higher eukaryotic genomes. I will also continue to develop programs to automatically analyze new sequences whenever they are deposited in a database for novel TE types because TEs are often not well characterized in newly deposited genome sequences. Upon identification of novel families or superfamilies, emphasis will be shifted to understanding transposition mechanisms.
(2) Modeling of genome evolution
Eukaryotic genomes are sequenced in a historical pace. Predictable advances in DNA sequencing technology within the coming decade will make available genome sequences to biologists like the volume of species available to Charles Darwin during his journey of the Beagle about 170 years ago. General rules of molecular mechanisms for genome evolution await discovery. I am interested in performing large scale survey of the genomes of a variety of species to understand how genomes attained current status. This survey requires extensive analyses of genome sequences and involves a number of computational automation tasks.