Because of the diversity of our projects, we have several projects available for interested graduate students:

  1. Evaluation and deployment of splice-site prediction algorithms for pathogenic variant identification: genetic variants affecting RNA splicing are implicated in multiple hereditary conditions. Variants in our sequencing pipeline are not currently automatically annotated with splice-site modification prediction. You could help us change this, and increase our ability to detect disease-causing variants in patients.
  2. Evaluation of “bowtie” and other alignment tools: most next-generation sequencing platforms generate millions of short “reads” (snippets of DNA) that must be assembled in the correct order to represent a patient’s genome. This is accomplished most easily by aligning the reads against the reference human genome sequence, and we use a particular tool (GATK) to do this. But as any software does, GATK has limitations. You could help us evaluate and potentially deploy other alternatives to GATK, thereby increasing the efficiency of our genome-sequencing pipeline.
  3. Functional assay development: Our whole-exome and whole-genome sequencing generates thousands of “variants of uncertain clinical significance” (VUS)—e.g. variants that might cause diease, but we don’t know cause disease and thus cannot be used clinically. Some patients—currently undiagnosed—could probably be diagnosed if only we could validate that particularly suspicious VUSes do in fact cause disease, via a functional assay that demonstrated pathogenicity.
  4. Gene discovery: We have cohorts of patients who, from phenotypic features and other metrics, are highly likely to have a monogenic cause for their disease. However, whole genome sequencing and analysis has not identified any known mendelian cause in any known genes associated with their disease. These patients may have novel mutations in genes not currently associated with disease, and you can help us find them.
  5. Machine-learning to find cancer “signatures”: Who needs IBM Watson when we’ve got you? We are sequencing the tumors of early-onset breast cancer patients and any surrounding normal breast tissue, in order to identify those mutations in hereditary cancer susceptibility genes that may provide a positive molecular diagnosis for the patient’s cancer. You will work with experts at the Renaissance Computing Institute and to deploy machine-learning tools such as Principle Component Analysis, as well as others, to understand patterns of mutations found between patients, thus helping identify which “signatures” correspond to early-onset cancer.
  6. Workflow management for genomic samples: We’re collaborating with the Renaissance Computing Institute to build workflow management applications that track physical locations of patient samples as they move from lab to lab, and in each stage of the analysis pipeline. You can help us manage and streamline our sequencing workflow by developing a web application to track samples from our NCGENES project. In doing this, you’ll gain commercially valuable experience in web application design and deployment, relational databases, and a strong grounding in the logistics of genome sequencing.
  7. Variant interpretation automation: with the explosion of next generation sequencing, there’s been a corresponding explosion in variants that need interpretation and classification as disease-causing or not. The American College of Medical Genetics has released a set of variant interpretation guidelines in an effort to standardize this process, but following these can be time-consuming for the thousands of variants we would like to classify based on these guidelines. In this lab, you can create an algorithm that mimics the variant interpretation guidelines to automatically classify every genomic variant we find, thereby significantly reducing the burden on molecular analysts.