The initial intent of the PGC was to investigate the common single nucleotide polymorphisms (SNPs) genotyped on commercial arrays. Our focus has expanded to include structural variation (copy number variation) and uncommon or rare genetic variation.
The initial efforts of the PGC were funded by NIMH grant U01 MH085520. Its rationale and specific aims are below.
By the end of 2008, there will be GWAS data on 47 samples of individuals with either attention-deficit hyperactivity disorder (ADHD), autism (AUT), bipolar disorder (BIP), major depressive disorder (MDD), or schizophrenia (SCZ). Taken together, these GWAS constitute the largest biological experiment ever conducted in psychiatry – over 80,000 subjects (59,000 independent cases/controls and over 7700 family trios), ~500,000 SNP genotypes per subject, and ~40 billion total genotypes. Although the ready availability of GWAS data is highly attractive, there is a real risk of conflicting claims and confusion if analyses are not coordinated. GWAS meta-analysis is complex and requires considerable care and expertise in order to be done validly. For psychiatric phenotypes, there is the additional challenge of working with disease entities having both substantial clinical variation within diagnosis and some overlap between diagnoses. Given the urgent need to know if there are replicable associations, and the importance of avoiding conflicting claims that are damaging to the field, a new type of collaboration is required.
To accomplish these ends, we began the Psychiatric GWAS Consortium (PGC) in early 2007 to conduct rigorous and comprehensive within- and cross-disorder GWAS meta-analyses. The overall philosophy of the PGC is to be as inclusive, democratic, and rapid as possible. The PGC is fully established with a coordinating committee, five disease working groups, a cross-disorder group, a statistical analysis and computational group, and a cluster computer for data warehousing and statistical analysis. It is remarkable that almost all psychiatric GWAS investigators approached agreed to participate. Most effort is donated.
The PGC currently has these Specific Aims.
- Dataset harmonization: This aim has two components.
- Harmonize genetic data: (i) upload individual-level and de-identified GWA genotype data to a high performance computing cluster, (ii) process datasets through a quality control pipeline that conforms to current best-practices in order to minimize chances of false positive results (e.g., due to population stratification), and (iii) impute 2.6M SNPs using the HapMap2 CEU panel and recent extensions. These procedures will allow direct comparison of the individual datasets, and will be conducted by the analysis group in close cooperation with the primary investigative team.
- Harmonize phenotype data: (i) assure that studies use comparable diagnostic constructs and, where these are not comparable, provide sufficient data so that differences can be modeled in the analyses, (ii) construct databases for all items recorded on each subject and for the item-level data. This will allow for meaningful creation of binary and quantitative traits for Aims 2 and 3.
- Within-disorder meta-analyses: conduct separate meta-analyses of all available GWAS data for ADHD, AUT, BIP, MDD, and SCZ to attempt to identify convincing genotype-phenotype associations.
- Cross-disorder meta-analyses: the clinically-derived DSM-IV and ICD-10 definitions may not have “carved nature at the joint” with respect to the fundamental genetic architecture 4, 5.
- Conduct meta-analysis to attempt to identify convincing genotype-phenotype associations that are common to ≥2 of ADHD, AUT, BIP, MDD, and SCZ according to traditional diagnoses.
- Convene an expert working group to convert epidemiological and genetic epidemiological evidence into rigorous and explicit hypotheses about overlap amongst these disorders, and then conduct meta-analyses based on these expert definitions. Examples: to combine MDD with BIP cases with a preponderance of depressive episodes or to combine SCZ with BIP cases with psychotic features.
- Data Sharing. Communicate pre-publication results widely. Deposit de-identified phenotype and GWA genotypes into controlled-access repositories (i.e., NIMH, dbGaP, or WTCCC repositories) and in this way make these data available to the international scientific community.
Whatever the results, these historically large efforts will yield hard facts about ADHD, AUT, BIP, MDD, and SCZ to guide the next era of psychiatric research.
Building on the success of “pgc1”, we created “pgc2” to extend our scope to new scientific aims.
We created the Psychiatric GWAS Consortium (PGC) in 2007 to conduct field-wide mega-analyses of individual data for attention-deficit hyperactivity disorder (ADHD), autism (AUT), bipolar disorder (BIP), major depressive disorder (MDD), and schizophrenia (SCZ). A special one-year NIMH grant established “PGC1”.
PGC1 achieved its initial aims. We united the field for the first time, and the PGC is the largest consortium (165 scientists from 68 institutions in 19 countries) and biological experiment in the history of psychiatry. We have produced (1) high-quality mega-analyses for five disorders, (2) delivered ~10 strong associations, and (3) provided evidence that many more associations are as yet undetected. PGC1 has succeeded to a high level and has delivered new knowledge and hypotheses about these idiopathic disorders.
Given our successes, we propose “PGC2”, a new collaborative R01 whose over-arching intention is to pursue the next logical set of aims. We propose to expand our work from common SNP variation to CNVs, rare variation, and cross-disorder analyses and to conduct well-powered replication genotyping. Our aims are:
- CNVs are of proven importance for psychiatric disorders. An unfunded PGC CNV working group has been preparing for the systematic evaluation of the role of rare CNVs and common CNPs. We propose fully to implement these analyses to identify disease associations. (1a) Process array intensity data from all PGC1 samples via best-practice calling and QC pipelines. (1b) Conduct mega-analyses of CNV data within and between disorders (cf. Aim 3). (1c) Experimentally validate/fine-map disease-associated CNVs, and select CNV regions for Aim 4.
- Next-generation sequencing. At least 8,000 exomes from cases with AUT, BIP, and SCZ will be available before the end of 2011 via on-going projects. These numbers will increase and soon expand to whole-genome sequencing. We propose to be proactive and to get ahead of the curve by developing a PGC NGS pipeline for the integrated QC, analysis, and bioinformatics of NGS data. (2a) Implement a PGC2 pipeline for integrated alignment, QC, and analysis of sequence data. (2b) Conduct mega-analyses of exome variation data within and across disorders (cf. Aim 3). (2c) Select genomic regions for Aim 4.
- Cross-disorder analyses. Psychiatric disorders are defined by observed signs and reported symptoms without recourse to biological validators. Comorbidity is prevalent, even normative. Genetics holds great promise for elucidating the similarities and differences across disorders. (3a) Using existing disease definitions, determine if any type of genetic variant is associated with >1 disorder in order to understand etiological overlap between disorders, (3b) Via integration of epidemiological and genetic epidemiological data, test a priori hypotheses regarding sub-phenotypes across disorders. (3c) Select candidate loci for Aim 4.
- Replication & validation. Aims 1-3 will identify loci that require genotyping for (a) replication in independent PGC2 samples and (b) validation in already genotyped samples (e.g., CNV calls from GWAS chips, imputed loci, and newly discovered exome variants). (4a) Integrate and prioritize results from Aims 1-3 and develop the “PsychChip”, an Illumina custom array containing 20,000 SNP/CNV probes (the PsychChip can also be bought from Illumina by anyone). (4b) Genotype 60,000 subjects and conduct final validation and replication analyses for Aims 1-3 (N=115,082).
For Aim 4b, we propose a public-private partnership. We request funding for genotyping of 60,000 subjects, and we have obtained firm commitments from two trusted private sources to fund simultaneous PsychChip genotyping of the remaining subjects. Thus, the final analysis will consist of 115,082 individuals.
Our intention is rapidly to generate a high-confidence list of associations across the allelic spectrum for these disorders of first-rank public health significance. For psychiatric genetics to advance, we urgently require a comprehensive “map” of how different types of genetic variation – rare/common, SNP/CNV – act to alter risk for these critically important biomedical disorders.