Hong Dang, PhD - Bioinformatician
Genomic-scale data analysis, integrative data mining of annotation databases, to help understand and interpret high-throughput genomic, transcriptomic, and proteomic data sets generated from CF center laboratories and collaborators. Primary gene and microRNA expression data from Next-Gen RNA-sequencing, and DNA microarrays were processed and normalized using appropriate methods, then differential gene expression between treatment and control groups were assessed statistically to obtain lists of genes that passed certain statistical filters. To summarize biological meaning of gene expression pattern changes, functional annotations from genomic databases and literature were evaluated using a number of enrichment analyses, where observed regulated genes with common functional annotations were statistically rare events by random, therefore were likely due to the biological manipulation. The genes of interests were further data mined in the context of microRNA expression, or proteomic data if available, and existing knowledge of biological pathways, and interaction networks.
Use of emerging technologies and computational approaches to solve difficult molecular biology problems, such as de novo genomic assembly of highly repetitive and polymorphic large exons of mucin genes using PacBio single molecule long sequence reads. A hallmark of many mucin genes is long stretches of repetitive sequences coding for Ser/Thr-rich amino acid repeats which are sites of O-glycosylation responsible for the “sticky” physical properties of the mature mucin products. Some of the repetitive regions have been recalcitrant to 1st generation Sanger sequencing and 2nd generation (Next-Gen) high throughput sequencing and assembly efforts, leaving gaps in the current reference genome. The ability to obtain long sequencing reads encompassing the entire repetitive region by the 3rd generation sequencing platforms, such as PacBio Single Molecule Real Time (SMRT®) sequencing enabled us to obtain de novo contigs of complete repetitive region of the gel forming mucin, MUC5AC.
New developments of sequencing, microRNA and non-coding RNA, network and literature data mining, as well as knowledge representation, and machine learning are all very exciting to me, and are sources of new ideas and methods to apply in our current research of lung diseases.
Frequently Used Tools and Databases
Linux (Ubuntu, Debian, Fedora, CentOS), Perl, SQL, R; Bowtie/Bowtie2, BWA, SAMtools, TopHat/Cufflinks, MIRA, SMRT-analysis, IGV, GSEA, DAVID, Cytoscape, Partek, IPA; NCBI, UCSC, ENSEMBL, Gene Ontology, miRBase, ...
- Bachelor of Medicine. (1987): Peking University, Health Sciences Center, Beijing, P. R. China.
- Ph.D. (1991): Department of Biochemistry, University of Louisville School of Medicine, Louisville, KY.
- Professional Certificate (1999): Relational Database Management Systems, UCLA, Los Angeles, CA.
2010-Present: Bioinformatician, Cystic Fibrosis and Pulmonary Diseases Research and Treatment Center, UNC Chapel Hill, Chapel Hill, NC 27599.
2009-2010: Senior Bioinformatician, Almac Diagnostics, LTD, 4238 Technology Drive, Durham, NC 27704.
2002-2009: Senior Bioinformatician, Alpha-Gamma Technologies, Inc. (AGTI), 3301 Benson Dr, Suite 535, Raleigh, NC 27609.
2001-2002: Bioinformaticist, Incellico, Inc., 2327 Englert Dr., Durham, NC 27713.
2000-2001: Sr. Scientist/Oracle DBA in the Center for Genome Research and the Alliance for Cellular Signaling (Dr. Mel Simon lab), California Institute of Technology (Caltech).
1996-2000: Senior research fellow in the lab of Dr. Henry Lester at the Caltech, Division of Biology.
1991-1996: Post-doctoral fellow in the Division of Neuroscience in the lab of Dr. James Patrick at Baylor College of Medicine.
1987-1991: Graduate student in the Department of Biochemistry in the laboratory of Dr. Steven R. Ellis at the University of Louisville.