Back to cohorts


  • dbGaP host many cohorts regrouped in 4 collections:
  • Individual-Level Genomic Data
  • NIH Autism -omics studies
  • Open Translational Science in Schizophrenia (OPTICS)
  • Psychiatric Genomics Consortium (PGC)

Autisme Genome Project (AGP) Consortium

Parent-Offspring ASD-specific Trios study of 2 611 families for 7880 participants

Genomic info: 7 880 SNP Genotypes Array that yield 3957 CNV. Samples were genotyped using the Illumina Human 1M-single Infinium BeadChip with over 1,000,000 SNPs .

Phenotypic info: Families were grouped into two nested diagnostic classes (strict and spectrum ASD) based on proband diagnostic measures. To qualify for the strict class, affected individuals met criteria for autism on both primary diagnostic instruments, the ADI-R and the ADOS. In addition to individuals meeting criteria for autism, a spectrum class included all individuals who were classified as ASD on both the ADI-R and ADOS or who were not evaluated on one of the instruments but were diagnosed with autism on the other instrument.

Rational:Autism spectrum disorders (ASDs) are highly heritable (~90%), yet the underlying genetic determinants are largely unknown. To understand the genetic and phenotypic heterogeneity in ASDs, we analyzed 2,611 strictly defined ASD families with over 1,000,000 single nucleotide polymorphisms (SNPs), and applied multiple analytical strategies to examine these families for SNPs and Copy Number Variation (CNVs) affecting risk for ASDs. Secondary analyses examined associations in more homogenous subgroups. Furthermore, the use of large control datasets permitted contrasting case and control samples and addressed the potential increased burden of rare CNVs in ASD. Our data have allowed us to discern key features of the ASD genomic architecture, find new susceptibility loci, and chart a course for future studies in ASDs.

Data Access:Datas are available on our CEDAR repository. For support contact Martineau Jean-Louis -

ARRA Autism Sequencing Collaboration

Case control study of 3 548 ASD + 848 control (a total of 4 396 participants).

Genomic info: All participants possess SNP/CNV Genotypes data while 3 363 ASD + 847 controls (a total of 4 210 participants) possess (NGS)whole exome (WXS) data.

Phenotypic info: The subject phenotype data table includes ADI-R diagnosis, disease onset age, and sex and race for 4440 participants.

Rational:The root causes of autism remain unknown, limiting efforts to understand disease heterogeneity, diagnose cases, and prevent and treat disease. Epidemiological findings have repeatedly and unequivocally determined that heritable variation in DNA plays a substantial role in the etiology of autism and autism spectrum disorders, yet traditional efforts to identify the genetic basis of this striking heritability have met with very limited success to date and have therefore provided limited insight into disease biology. We propose here an unprecedented partnership between expert large-scale sequencing centers (at the Baylor College of Medicine and the Broad Institute of MIT and Harvard) and a collaborative network of research labs focused on the genetics of autism (brought together by the Autism Genome Project and the Autism Consortium). These groups will work together to utilize dramatic new advances in DNA sequencing technology to reveal the genetic architecture of autism, first through a comprehensive examination of the exonic sequence of all genes (that is, the coding part of the genome). The goal is to conclusively identify which genes harbor individual or collections of rare DNA variants that predispose to autism, and thus translate the abstract heritability into solid biological clues about disease pathogenesis that can be studied molecularly and approached therapeutically. These efforts and their follow-up, which will be performed on thousands of autism families collected by the autism research groups and being provided with phenotype data to NIMH repositories, will form the cornerstone of autism genetic research going forward.

Data Access:To date(june 2018), we have access to the phenotypes and the WXS data located on our CEDAR repository. We are currently working toward acquiring the NGS data. For support contact Martineau Jean-Louis -

Philadelphia Neurodevelopmental Cohort (PNC)

General research group between the age of 8-21 years: the cohort include 9267 genotyped participants from which 1445 performed neuroimaging and 9496 subjects were phenotyped.

Genomic info: Different platforms where used for whole genome genotyping that overlap in over 500,000 SNPs. The compressed size of a single subject’s data is approximately 250 MB.

Phenotypic info: Demographic, medical, and psychopathology history were assessed using a structured computerized instrument, called GOASSESS. It was developed from a modified version of the Kiddie-Schedule for Affective Disorders and Schizophrenia. In addition to standard demographic data, the psychopathology screener allows symptom and criterion-related assessment of mood, anxiety, behavioral, eating disorders, psychosis spectrum symptoms, and substance use history. Psychopathology data from GOASSESSS is represented in dbGaP as nearly 600 individual item-level responses.

Neuro imaging info: Multimodal neuroimaging was performed on a subsample of 1445 participants who were cognitively and clinically assessed. The current data release includes over 9700 MRI images. Although neuroimaging screening sought to exclude individuals with a history of major medical problems, subsequent analysis of this medical inventory revealed that a small percentage of the imaged sample did indeed have a history of medical disorders that could impact brain function, or had a clinical abnormality of brain structure that was encountered incidentally.

Rational:The cohort consists of youths aged 8-21 years who consulted the CHOP network and volunteered to participate in genomic studies of complex pediatric disorders. All participants underwent clinical assessment, including a neuropsychiatric structured interview and review of electronic medical records. They were also administered a neuroscience based computerized neurocognitive battery (CNB) and a subsample underwent neuroimaging.

Data Access:Application for the access is in process as of june 2018.