Abstract: Advances in DNA sequencing technologies have greatly facilitated the discovery of rare genetic variants in the human genome, many of which may contribute to common disease risk. However, evaluating their individual or even collective effects on disease risk requires very large sample sizes, which involves study designs that are often prohibitively expensive. We present an alternative approach for determining genotypes in large numbers of individuals for all variants discovered in the sequence of relatively few individuals. Specifically, we developed a new imputation algorithm that utilizes whole-exome sequencing data from 25 members of the South Dakota Hutterite population, and genome-wide single nucleotide polymorphism (SNP) genotypes from >1,400 individuals from the same founder population. The algorithm relies on identity-by-descent sharing of phased haplotypes, a different strategy than the linkage disequilibrium methods found in most imputation algorithms. We imputed genotypes discovered in the sequence data to on average ∼77% of chromosomes among the 1,400 individuals. Median R(2) between imputed and directly genotyped data was >0.99. As expected, many variants that are vanishingly rare in European populations have risen to larger frequencies in the founder population and would be amenable to single-SNP analyses.
Abstract: The decreasing cost of whole genome and exome sequencing has resulted in a renaissance for identifying Mendelian disease mutations, and for the first time it is possible to survey the distribution and characteristics of these mutations in large population samples. We conducted carrier screening for all autosomal recessive mutations known to be present in members of a founder population and revealed surprisingly high carrier frequencies for many of these mutations. By utilizing the rich demographic, genetic, and phenotypic data available on these subjects and simulations in the exact pedigree that these individuals belong to, we show that the majority of mutations were likely introduced into the population by a single founder and then drifted to the high carrier frequencies observed. We further show an increasing incidence of autosomal recessive diseases overall and that the mean carrier burden in this population is likely to be lower than in other non-founder populations. Finally, based on simulations, we predict the presence of 30 or more undiscovered recessive mutations among these subjects, which would at least double the number of AR diseases that have been reported in this isolated population.