The research interests of the group focus on two major themes: (1) the evolution and variability of the genes encoding cancer-testis antigens and (2) the development of in silico techniques for the efficient analysis of novel high-throughput genomics data.
CT genes are normally expressed only in immuno-privileged cells of the germ line, but re-expressed in a variable proportion of cancers. Most of the human genes with a strict CT expression pattern are localized on the X chromosome, and are members of families that have undergone recent expansion in the primate lineage. One focus of our research is to trace the evolutionary history of CT-X genes. To this end, we are establishing a comprehensive catalogue of CT-X genes in the human genome, which is a challenging task because many of them occur in regions with segmental duplications that have not been assembled correctly or contain gaps. This work has led us to identify new CT-X families and to produce more detailed genomic maps of known families. We are also looking for CT homologues in the ever increasing collection of available genomes, with the goal of tracing the emergence and evolution of each family, and therefore inferring possible function. Finally, we are studying copy number variations (CNV) of CT-X genes, both at the genetic (inter-individual differences) and the somatic (cancer-specific) levels.
On the methodological side, we have developed tools for the efficient utilization of short reads generated by “next-generation” sequencing machines. The “Rolexa” algorithm makes it possible to interpret and map a significantly larger proportion of individual reads from Solexa/Illumina sequencers than the manufacturer’s proprietary software. The “fetchGWI” software implements a very efficient method for matching large collections of short sequences to genome-size databases, facilitating SNP inference. We have also evaluated methodologies for extracting reliable CNV information from genome-wide hybridization data in very large cohorts.