ORCID Profile
0000-0002-1480-6115
Current Organisation
university of
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Biostatistics | Genetics | Biological Mathematics | Population, Ecological and Evolutionary Genetics | Quantitative Genetics (incl. Disease and Trait Mapping Genetics)
Public Health (excl. Specific Population Health) not elsewhere classified | Animal Production and Animal Primary Products not elsewhere classified | Flora, Fauna and Biodiversity of environments not elsewhere classified | Plant Production and Plant Primary Products not elsewhere classified | Expanding Knowledge in the Biological Sciences | Health not elsewhere classified |
Publisher: Springer Science and Business Media LLC
Date: 03-2016
DOI: 10.1038/NCOMMS10815
Abstract: We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance ( P values 5 × 10 −8 to 3 × 10 −119 ), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 ( PRSS53 ). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair.
Publisher: Elsevier BV
Date: 06-1997
Abstract: We consider nonadaptive pooling designs for unique-sequence screening of a 1530-clone map of Aspergillus nidulans. The map has the properties that the clones are, with possibly a few exceptions, ordered and no more than 2 of them cover any point on the genome. We propose two subdesigns of the Steiner system S(3, 5, 65), one with 65 pools and approximately 118 clones per pool, the other with 54 pools and about 142 clones per pool. Each design allows 1 or 2 positive clones to be detected, even in the presence of substantial experimental error rates. More efficient designs are possible if the overlap information in the map is exploited, if there is no constraint on the number of clones in a pool, and if no error tolerance is required. An information theory lower bound requires at least 12 pools to satisfy these minimal criteria, and an "interleaved binary" design can be constructed on 20 pools, with about 380 clones per pool. However, the designs with more pools have important properties of robustness to various possible errors and general applicability to a wider class of pooling experiments.
Publisher: Springer Science and Business Media LLC
Date: 10-2009
DOI: 10.1038/NRG2615
Abstract: Bayesian statistical methods have recently made great inroads into many areas of science, and this advance is now extending to the assessment of association between genetic variants and disease or other phenotypes. We review these methods, focusing on single-SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequentist) approaches in this setting and provide a tutorial on basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analyses and provide guidance for refereeing manuscripts that contain Bayesian analyses.
Publisher: Cold Spring Harbor Laboratory
Date: 14-12-2006
DOI: 10.1101/GR.4346306
Abstract: We conduct an extensive simulation study to compare the merits of several methods for using null (unlinked) markers to protect against false positives due to cryptic substructure in population-based genetic association studies. The more sophisticated “structured association” methods perform well but are computationally demanding and rely on estimating the correct number of subpopulations. The simple and fast “genomic control” approach can lose power in certain scenarios. We find that procedures based on logistic regression that are flexible, computationally fast, and easy to implement also provide good protection against the effects of cryptic substructure, even though they do not explicitly model the population structure.
Publisher: Cold Spring Harbor Laboratory
Date: 24-06-2014
Abstract: BLUP ( b est l inear u nbiased p rediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for ex le, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient for the largest data set, which includes 12,678 in iduals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.
Publisher: Springer Science and Business Media LLC
Date: 18-02-2003
DOI: 10.1038/NG1100
Publisher: Springer Science and Business Media LLC
Date: 18-11-2015
DOI: 10.1038/NRG3821
Abstract: Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between in iduals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.
Publisher: Wiley
Date: 15-07-2010
Publisher: Wiley
Date: 13-03-2022
Abstract: Complex‐trait genetics has advanced dramatically through methods to estimate the heritability tagged by SNPs, both genome‐wide and in genomic regions of interest such as those defined by functional annotations. The models underlying many of these analyses are inadequate, and consequently many SNP‐heritability results published to date are inaccurate. Here, we review the modelling issues, both for analyses based on in idual genotype data and association test statistics, highlighting the role of a low‐dimensional model for the heritability of each SNP. We use state‐of‐art models to present updated results about how heritability is distributed with respect to functional annotations in the human genome, and how it varies with allele frequency, which can reflect purifying selection. Our results give finer detail to the picture that has emerged in recent years of complex trait heritability widely dispersed across the genome. Confounding due to population structure remains a problem that summary statistic analyses cannot reliably overcome. Also see the video abstract here: youtu.be/WC2u03V65MQ
Publisher: Public Library of Science (PLoS)
Date: 20-08-2015
Publisher: Wiley
Date: 07-2010
DOI: 10.1002/JGM.1473
Publisher: Springer Science and Business Media LLC
Date: 21-01-2019
DOI: 10.1038/S41467-018-08147-0
Abstract: We report a genome-wide association scan in ,000 Latin Americans for pigmentation of skin and eyes. We found eighteen signals of association at twelve genomic regions. These include one novel locus for skin pigmentation (in 10q26) and three novel loci for eye pigmentation (in 1q32, 20q13 and 22q12). We demonstrate the presence of multiple independent signals of association in the 11q14 and 15q13 regions (comprising the GRM5/TYR and HERC2/OCA2 genes, respectively) and several epistatic interactions among independently associated alleles. Strongest association with skin pigmentation at 19p13 was observed for an Y182H missense variant (common only in East Asians and Native Americans) in MFSD12 , a gene recently associated with skin pigmentation in Africans. We show that the frequency of the derived allele at Y182H is significantly correlated with lower solar radiation intensity in East Asia and infer that MFSD12 was under selection in East Asians, probably after their split from Europeans.
Publisher: Public Library of Science (PLoS)
Date: 09-03-2022
DOI: 10.1371/JOURNAL.PCBI.1009960
Abstract: We present a novel algorithm, implemented in the software ARGinfer , for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. Our Markov Chain Monte Carlo algorithm takes advantage of the Succinct Tree Sequence data structure that has allowed great advances in simulation and point estimation, but not yet probabilistic inference. Unlike previous methods, which employ the Sequentially Markov Coalescent approximation, ARGinfer uses the Coalescent with Recombination, allowing more accurate inference of key evolutionary parameters. We show using simulations that ARGinfer can accurately estimate many properties of the evolutionary history of the s le, including the topology and branch lengths of the genealogical tree at each sequence site, and the times and locations of mutation and recombination events. ARGinfer approximates posterior probability distributions for these and other quantities, providing interpretable assessments of uncertainty that we show to be well calibrated. ARGinfer is currently limited to tens of DNA sequences of several hundreds of kilobases, but has scope for further computational improvements to increase its applicability.
Publisher: Elsevier BV
Date: 05-2019
DOI: 10.1016/J.FSIGEN.2019.02.014
Abstract: We compare two open-source programs for the evaluation of evidential weight arising from complex DNA profiles recovered in a crime investigation. Here, "complex" means one or more of: low-template, degraded and mixed-source. Although software for complex DNA profile analysis has made great strides in recent years, the ability of courts to effectively scrutinise and challenge the reliability of the resulting evidence remains problematic. One key step is to compare different software on the same evidence, but there are currently few published comparisons in part because of the problems posed by restricted access to commercial software. We present here an extensive comparison between two open-source software, LikeLTD and EuroForMix. We find that despite different modelling assumptions the two programs generate similar results. The differences that we do identify can inform future improvements and can provide a benchmark for acceptable discrepancies between alternative programs.
Publisher: Cambridge University Press (CUP)
Date: 12-1988
DOI: 10.2307/3214294
Abstract: One-dimensional, periodic and annihilating systems of Brownian motions and random walks are defined and interpreted in terms of sizeless particles which vanish on contact. The generating function and moments of the number pairs of particles which have vanished, given an arbitrary initial arrangement, are derived in terms of known two-particle survival probabilities. Three important special cases are considered: Brownian motion with the particles initially (i) uniformly distributed and (ii) equally spaced on a circle and (iii) random walk on a lattice with initially each site occupied. Results are also given for the infinite annihilating particle systems obtained in the limit as the number of particles and the size of the circle or lattice increase. Application of the results to the theory of diffusion-limited reactions is discussed.
Publisher: Oxford University Press (OUP)
Date: 20-04-2010
DOI: 10.1093/BIOINFORMATICS/BTQ157
Abstract: Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of in idual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A s ling algorithm is employed to obtain a measure of confidence/credibility of each estimate. Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: l.coin@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Informa UK Limited
Date: 11-2013
DOI: 10.4161/EPI.26407
Abstract: Many human diseases are multifactorial, involving multiple genetic and environmental factors impacting on one or more biological pathways. Much of the environmental effect is believed to be mediated through epigenetic changes. Although many genome-wide genetic and epigenetic association studies have been conducted for different diseases and traits, it is still far from clear to what extent the genomic loci and biological pathways identified in the genetic and epigenetic studies are shared. There is also a lack of statistical tools to assess these important aspects of disease mechanisms. In the present study, we describe a protocol for the integrated analysis of genome-wide genetic and epigenetic data based on permutation of a sum statistic for the combined effects in a locus or pathway. The method was then applied to published type 1 diabetes (T1D) genome-wide- and epigenome-wide-association studies data to identify genomic loci and biological pathways that are associated with T1D genetically and epigenetically. Through combined analysis, novel loci and pathways were also identified, which could add to our understanding of disease mechanisms of T1D as well as complex diseases in general.
Publisher: Public Library of Science (PLoS)
Date: 19-01-2023
DOI: 10.1371/JOURNAL.PGEN.1010054
Abstract: We introduce a fast, new algorithm for inferring from allele count data the F ST parameters describing genetic distances among a set of populations and/or unrelated diploid in iduals, and a tree with branch lengths corresponding to F ST values. The tree can reflect historical processes of splitting and ergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining F ST , via correlations and mismatch probabilities of s led allele pairs, which measure shared and non-shared components of genetic variance. A diploid in idual can be treated as a population of two gametes, which allows inference of coancestry coefficients for in iduals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of F ST values, simultaneously for multiple populations/in iduals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of in iduals and in a final analysis we pool in iduals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.
Publisher: Cold Spring Harbor Laboratory
Date: 15-08-2019
DOI: 10.1101/736496
Abstract: There is currently much debate regarding the best way to model how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I Model, the authors of LD Score Regression recommend the Baseline LD Model, while we have instead recommended the LDAK Model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Using data from studies of 31 complex human traits (average s le size 136,000), we show that the Baseline LD Model is the most realistic of the existing heritability models, but that it can be improved by incorporating features from the LDAK Model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence (P e-6) of negative genome-wide selection for traits including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories such as coding SNPs and promoter regions.
Publisher: Oxford University Press (OUP)
Date: 21-12-2017
DOI: 10.1093/BIOINFORMATICS/BTW805
Abstract: Sequencing pools of in iduals (Pool-Seq) is a cost-effective way to gain insight into the genetics of complex traits, but as yet no parametric method has been developed to both test for genetic effects and estimate their magnitude. Here, we propose GWAlpha, a flexible method to obtain parametric estimates of genetic effects genome-wide from Pool-Seq experiments. We showed that GWAlpha powerfully replicates the results of Genome-Wide Association Studies (GWAS) from model organisms. We perform simulation studies that illustrate the effect on power of s le size and number of pools and test the method on different experimental data. GWAlpha is implemented in python, designed to run on Linux operating system and tested on Mac OS. It is freely available at flevel/GWAlpha. Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 2012
DOI: 10.1186/GM360
Publisher: SAGE Publications
Date: 09-07-2016
Abstract: This article theorizes the functional relationship between the human components (i.e., scholars) and non-human components (i.e., structural configurations) of academic domains. It is organized around the following question: in what ways have scholars formed and been formed by the structural configurations of their academic domain? The article uses as a case study the academic domain of education and technology to examine this question. Its authorship approach is innovative, with a worldwide collection of academics (99 authors) collaborating to address the proposed question based on their reflections on daily social and academic practices. This collaboration followed a three-round process of contributions via email. Analysis of these scholars’ reflective accounts was carried out, and a theoretical proposition was established from this analysis. The proposition is of a mutual (yet not necessarily balanced) power (and therefore political) relationship between the human and non-human constituents of an academic realm, with the two shaping one another. One implication of this proposition is that these non-human elements exist as political ‘actors’, just like their human counterparts, having ‘agency’ – which they exercise over humans. This turns academic domains into political (functional or dysfunctional) ‘battlefields’ wherein both humans and non-humans engage in political activities and actions that form the identity of the academic domain. For more information about the authorship approach, please see Al Lily AEA (2015) A crowd-authoring project on the scholarship of educational technology. Information Development. doi: 10.1177/0266666915622044.
Publisher: AMPCo
Date: 1986
Publisher: Cold Spring Harbor Laboratory
Date: 20-07-2018
DOI: 10.1101/373423
Abstract: We recently introduced a new approach to the evaluation of weight of evidence (WoE) for Y-chromosome profiles. Rather than attempting to calculate match probabilities, which is particularly problematic for modern Y-profiles with high mutation rates, we proposed using simulation to describe the distribution of the number of males in the population with a matching Y-profile, both the unconditional distribution and conditional on a database frequency of the profile. Here we further validate the new approach by showing that our results are robust to assumptions about the allelic ladder and the founder haplotypes, and we extend the approach in two important directions. Firstly, forensic databases are not the only source of background data relevant to the evaluation of Y-profile evidence: in many cases the Y-profiles of one or more relatives of the accused are also available. To date it has been unclear how to use this additional information, but in our simulation-based approach its effect is readily incorporated. We describe this approach and illustrate how the WoE that a man was the source of an observed Y-profile changes when the Y-profiles of some of his male-line relatives are also available. Secondly, we extend our new approach to mixtures of Y-profiles from two or more males. Surprisingly, our simulation-based approach reveals that observing a 2-male mixture that includes an alleged contributor’s profile is almost as strong evidence as observing a matching single-contributor evidence s le, and even 3-male and 4-male mixtures are only slightly weaker.
Publisher: Public Library of Science (PLoS)
Date: 25-07-2008
Publisher: Cold Spring Harbor Laboratory
Date: 28-01-2022
DOI: 10.1101/2022.01.28.478138
Abstract: We introduce a fast, new algorithm for inferring jointly the F ST parameters describing genetic distances among a set of populations and/or unrelated diploid in iduals, and a tree representing their genetic structure, from allele count data. While the inferred tree typically reflects historical processes of splitting and ergence, its aim is to represent the actual genetic variance, with F ST values specified by branch lengths. We generalise two major approaches to defining F ST , via correlations and mismatch probabilities of s led allele pairs, which measure shared and non-shared components of genetic variance. A diploid in idual can be treated as a population of two gametes, which allows inference of coancestry coefficients for in iduals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of F ST values, simultaneously for multiple populations/in iduals, gains statistical efficiency over pairwise approaches by pooling information about ancestral allele frequencies. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of in iduals and in a final analysis we pool in iduals from the more homogeneous populations. This flexible analysis approach gives many advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences. We propose new ways to measure, and visualise in a tree, the genetic distances among a set of populations using allele frequency data. The two genomes within a diploid in idual can be treated as a small population, which allows a flexible framework for investigating genetic variation within and between populations. Genetic structure can be accurately and efficiently represented in a tree with nodes representing either homogeneous populations or genetically erse in iduals, for ex le due to admixture. We first generalise the long-established measure of genetic distance, F ST , to tree-structured populations and in iduals, finding that two measures are required for each pair of populations, corresponding to their shared and and non-shared genetic variation. We show using a simulation study that our novel tree-based estimators are more efficient than current pairwise estimators, and we illustrate the potential for novel ways to explore and visualise genetic variation within and between populations using a worldwide human genetic dataset.
Publisher: Future Medicine Ltd
Date: 12-2007
DOI: 10.2217/14622416.8.12.1715
Abstract: Introduction: Approximately 30% of patients with epilepsy are resistant to treatment with anti-epileptic drugs (AEDs). The ABC drug transporter proteins are hypothesized to mediate drug resistance in epilepsy. More recently, a non-ABC putative transporter, RLIP76, has also been proposed to be involved in the mechanism of pharmacoresistance. One previous association study of six polymorphisms in RLIP76 failed to find any association with drug resistance in a retrospective cohort of epilepsy patients. We aimed to look for an association with outcomes reflecting drug response in a larger prospective cohort, with gene-wide coverage. Patients and methods: We investigated the role of common polymorphisms in RLIP76 in epilepsy pharmacoresistance by genotyping 23 common RLIP76 polymorphisms in a prospective cohort of 503 epilepsy patients, from the standard and new anti-epileptic drugs (SANAD) prospective study of new and old AEDs. A total of 13 of these were tested for association with four outcomes reflecting response to drugs: time to first seizure, time to 12-month remission, time to withdrawal due to inadequate seizure control, and time to withdrawal due to unacceptable adverse drug events. Results: No significant associations, allowing for multiple testing, were found in the whole cohort. There was also no effect in a subgroup of patients on carbamazepine, which is thought to be a RLIP76 substrate, although two polymorphisms were associated with time to first seizure (p = 0.007). Discussion: We failed to demonstrate any association between RLIP76 polymorphisms and four different measures of drug response in the larger cohort, but a subgroup analysis of patients receiving carbamazepine suggested an association that should be investigated further. Conclusions: Our data suggest that common variants in RLIP76 are unlikely to contribute to epilepsy drug response.
Publisher: MDPI AG
Date: 05-08-2021
Abstract: Y chromosome and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited from both parents). We review approaches to the evaluation of lineage marker profiles for forensic identification, focussing on the key roles of profile mutation rate and relatedness (extending beyond known relatives). Higher mutation rates imply fewer in iduals matching the profile of an alleged contributor, but they will be more closely related. This makes it challenging to evaluate the possibility that one of these matching in iduals could be the true source, because relatives may be plausible alternative contributors, and may not be well mixed in the population. These issues reduce the usefulness of profile databases drawn from a broad population: larger populations can have a lower profile relative frequency because of lower relatedness with the alleged contributor. Many evaluation methods do not adequately take account of distant relatedness, but its effects have become more pronounced with the latest generation of high-mutation-rate Y profiles.
Publisher: Springer Science and Business Media LLC
Date: 2003
Abstract: The authors present ELB, an easy to programme and computationally fast algorithm for inferring gametic phase in population s les of multilocus genotypes. Phase updates are made on the basis of a window of neighbouring loci, and the window size varies according to the local level of linkage disequilibrium. Thus, ELB is particularly well suited to problems involving many loci and/or relatively large genomic regions, including those with variable recombination rate. The authors have simulated population s les of single nucleotide polymorphism genotypes with varying levels of recombination and marker density, and find that ELB provides better local estimation of gametic phase than the PHASE or HTYPER programs, while its global accuracy is broadly similar. The relative improvement in local accuracy increases both with increasing recombination and with increasing marker density. Short tandem repeat (STR, or microsatellite) simulation studies demonstrate ELB's superiority over PHASE both globally and locally. Missing data are handled by ELB simulations show that phase recovery is virtually unaffected by up to 2 per cent of missing data, but that phase estimation is noticeably impaired beyond this amount. The authors also applied ELB to datasets obtained from random pairings of 42 human X chromosomes typed at 97 diallelic markers in a 200 kb low-recombination region. Once again, they found ELB to have consistently better local accuracy than PHASE or HTYPER, while its global accuracy was close to the best.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 18-12-2009
Publisher: Springer Science and Business Media LLC
Date: 1991
Publisher: Elsevier BV
Date: 02-1994
DOI: 10.1016/0379-0738(94)90222-4
Abstract: In DNA profile analysis, uncertainty arises due to a number of factors such as s ling error, single bands and correlations within and between loci. One of the most important of these factors is kinship: criminal and innocent suspect may share one or more bands through identity by descent from a common ancestor. Ignoring this uncertainty is consistently unfair to innocent suspects. The effect is usually small, but may be important in some cases. The report of the US National Research Committee proposed a complicated, ad-hoc and overly-conservative method of dealing with some of these problems. We propose an alternative approach which addresses directly the effect of kinship. Whilst remaining conservative, it is simple, logically coherent and makes efficient use of the data.
Publisher: Elsevier BV
Date: 07-2012
Publisher: Informa UK Limited
Date: 09-1995
Publisher: American Association for the Advancement of Science (AAAS)
Date: 05-02-2021
Abstract: We carried out a genome-wide association study in Latin Americans and identified novel face morphology loci..
Publisher: The Royal Society
Date: 29-06-1994
Abstract: Mathematical and statistical aspects of constructing ordered-clone physical maps of chromosomes are reviewed. Three broad problems are addressed: analysis of fingerprint data to identify configurations of overlapping clones, prediction of the rate of progress of a mapping strategy and optimal design of pooling schemes for screening large clone libraries.
Publisher: Mary Ann Liebert Inc
Date: 2015
Publisher: American Diabetes Association
Date: 13-08-2009
DOI: 10.2337/DB08-1805
Abstract: Fasting plasma glucose and risk of type 2 diabetes are higher among Indian Asians than among European and North American Caucasians. Few studies have investigated genetic factors influencing glucose metabolism among Indian Asians. We carried out genome-wide association studies for fasting glucose in 5,089 nondiabetic Indian Asians genotyped with the Illumina Hap610 BeadChip and 2,385 Indian Asians (698 with type 2 diabetes) genotyped with the Illumina 300 BeadChip. Results were compared with findings in 4,462 European Caucasians. We identified three single nucleotide polymorphisms (SNPs) associated with glucose among Indian Asians at P & 5 × 10−8, all near melatonin receptor MTNR1B. The most closely associated was rs2166706 (combined P = 2.1 × 10−9), which is in moderate linkage disequilibrium with rs1387153 (r2 = 0.60) and rs10830963 (r2 = 0.45), both previously associated with glucose in European Caucasians. Risk allele frequency and effect sizes for rs2166706 were similar among Indian Asians and European Caucasians: frequency 46.2 versus 45.0%, respectively (P = 0.44) effect 0.05 (95% CI 0.01–0.08) versus 0.05 (0.03–0.07 mmol/l), respectively, higher glucose per allele copy (P = 0.84). SNP rs2166706 was associated with type 2 diabetes in Indian Asians (odds ratio 1.21 [95% CI 1.06–1.38] per copy of risk allele P = 0.006). SNPs at the GCK, GCKR, and G6PC2 loci were also associated with glucose among Indian Asians. Risk allele frequencies of rs1260326 (GCKR) and rs560887 (G6PC2) were higher among Indian Asians compared with European Caucasians. Common genetic variation near MTNR1B influences blood glucose and risk of type 2 diabetes in Indian Asians. Genetic variation at the MTNR1B, GCK, GCKR, and G6PC2 loci may contribute to abnormal glucose metabolism and related metabolic disturbances among Indian Asians.
Publisher: Oxford University Press (OUP)
Date: 18-05-2010
DOI: 10.1111/J.1740-9713.2010.00419.X
Abstract: Can DNA analysis really be used to screen asylum seekers by identifying their country of origin? The immigration authority apparently believed so, and almost put such a scheme into immediate action — without, it seems, consulting academic scientists on the matter. David Balding, Michael Weale, Michael Richards and Mark Thomas examine a worrying story.
Publisher: Springer Science and Business Media LLC
Date: 06-1997
DOI: 10.1038/HDY.1997.97
Abstract: Although the effect of population differentiation on the forensic use of DNA profiles has been the subject of controversy for some years now, the debate has largely failed to focus on the genetical questions directly relevant to the forensic context. We re-analyse two published data sets and find that they convey much the same message for forensic inference, in contrast with the dramatically differing conclusions of the original authors. The analysis is likelihood-based and combines information across loci and across populations without assuming constant genetic differentiation. Our results suggest that the relevant genetic correlation coefficients are too large to be ignored in forensic work: although DNA profile evidence is typically very strong, the effect of genetic correlations can be important in some cases. Such correlations can, however, be accommodated in an appropriate assessment of evidential strength so that population genetic issues should not present a barrier to the efficient and fair use of DNA profile evidence.
Publisher: Springer Science and Business Media LLC
Date: 16-01-2008
Publisher: Oxford University Press (OUP)
Date: 09-1998
DOI: 10.1093/GENETICS/150.1.499
Abstract: Ease and accuracy of typing, together with high levels of polymorphism and widespread distribution in the genome, make microsatellite (or short tandem repeat) loci an attractive potential source of information about both population histories and evolutionary processes. However, microsatellite data are difficult to interpret, in particular because of the frequency of back-mutations. Stochastic models for the underlying genetic processes can be specified, but in the past they have been too complicated for direct analysis. Recent developments in stochastic simulation methodology now allow direct inference about both historical events, such as genealogical coalescence times, and evolutionary parameters, such as mutation rates. A feature of the Markov chain Monte Carlo (MCMC) algorithm that we propose here is that the likelihood computations are simplified by treating the (unknown) ancestral allelic states as auxiliary parameters. We illustrate the algorithm by analyzing microsatellite s les simulated under the model. Our results suggest that a single microsatellite usually does not provide enough information for useful inferences, but that several completely linked microsatellites can be informative about some aspects of genealogical history and evolutionary processes. We also reanalyze data from a previously published human Y chromosome microsatellite study, finding evidence for an effective population size for human Y chromosomes in the low thousands and a recent time since their most recent common ancestor: the 95% interval runs from ~15,000 to 130,000 years, with most likely values around 30,000 years.
Publisher: Elsevier BV
Date: 09-2002
DOI: 10.1016/S0379-0738(02)00232-3
Abstract: Previous analyses of Australian s les have suggested that populations of the same broad racial group (Caucasian, Asian, Aboriginal) tend to be genetically similar across states. This suggests that a single national Australian database for each such group may be feasible, which would greatly facilitate casework. We have investigated s les drawn from each of these groups in different Australian states, and have quantified the genetic homogeneity across states within each racial group in terms of the "coancestry coefficient" F(ST). In accord with earlier results, we find that F(ST) values, as estimated from these data, are very small for Caucasians and Asians, usually <0.5%. We find that "declared" Aborigines (which includes many with partly Aboriginal genetic heritage) are also genetically similar across states, although they display some differentiation from a "pure" Aboriginal population (almost entirely of Aboriginal genetic heritage).
Publisher: Wiley
Date: 17-09-2014
DOI: 10.1111/AHG.12081
Abstract: We estimate the population genetics parameter FST (also referred to as the fixation index) from short tandem repeat (STR) allele frequencies, comparing many worldwide human subpopulations at approximately the national level with continental-scale populations. FST is commonly used to measure population differentiation, and is important in forensic DNA analysis to account for remote shared ancestry between a suspect and an alternative source of the DNA. We estimate FST comparing subpopulations with a hypothetical ancestral population, which is the approach most widely used in population genetics, and also compare a subpopulation with a s led reference population, which is more appropriate for forensic applications. Both estimation methods are likelihood-based, in which FST is related to the variance of the multinomial-Dirichlet distribution for allele counts. Overall, we find low FST values, with posterior 97.5 percentiles < 3% when comparing a subpopulation with the most appropriate population, and even for inter-population comparisons we find FST < 5%. These are much smaller than single nucleotide polymorphism-based inter-continental FST estimates, and are also about half the magnitude of STR-based estimates from population genetics surveys that focus on distinct ethnic groups rather than a general population. Our findings support the use of FST up to 3% in forensic calculations, which corresponds to some current practice.
Publisher: Hindawi Limited
Date: 04-10-2016
DOI: 10.1002/HUMU.23121
Publisher: Wiley
Date: 04-2000
Publisher: Springer Science and Business Media LLC
Date: 04-05-2008
DOI: 10.1038/NG.156
Abstract: We carried out a genome-wide association study (318,237 SNPs) for insulin resistance and related phenotypes in 2,684 Indian Asians, with further testing in 11,955 in iduals of Indian Asian or European ancestry. We found associations of rs12970134 near MC4R with waist circumference (P = 1.7 x 10(-9)) and, independently, with insulin resistance. Homozygotes for the risk allele of rs12970134 have approximately 2 cm increased waist circumference. Common genetic variation near MC4R is associated with risk of adiposity and insulin resistance.
Publisher: Wiley
Date: 31-10-2006
DOI: 10.1111/J.1529-8817.2005.00189.X
Abstract: We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
Publisher: Elsevier BV
Date: 08-2006
Publisher: Oxford University Press (OUP)
Date: 05-07-2021
DOI: 10.1111/BJD.20436
Abstract: Genome-wide association studies (GWASs) have identified genes influencing skin ageing and mole count in Europeans, but little is known about the relevance of these (or other genes) in non-Europeans. To conduct a GWAS for facial skin ageing and mole count in adults < 40 years old, of mixed European, Native American and African ancestry, recruited in Latin America. Skin ageing and mole count scores were obtained from facial photographs of over 6000 in iduals. After quality control checks, three wrinkling traits and mole count were retained for genetic analyses. DNA s les were genotyped with Illumina's HumanOmniExpress chip. Association testing was performed on around 8 703 729 single-nucleotide polymorphisms (SNPs) across the autosomal genome. Genome-wide significant association was observed at four genome regions: two were associated with wrinkling (in 1p13·3 and 21q21·2), one with mole count (in 1q32·3) and one with both wrinkling and mole count (in 5p13·2). Associated SNPs in 5p13·2 and in 1p13·3 are intronic within SLC45A2 and VAV3, respectively, while SNPs in 1q32·3 are near the SLC30A1 gene, and those in 21q21·2 occur in a gene desert. Analyses of SNPs in IRF4 and MC1R are consistent with a role of these genes in skin ageing. We replicate the association of wrinkling with variants in SLC45A2, IRF4 and MC1R reported in Europeans. We identify VAV3 and SLC30A1 as two novel candidate genes impacting on wrinkling and mole count, respectively. We provide the first evidence that SLC45A2 influences mole count, in addition to variants in this gene affecting melanoma risk in Europeans.
Publisher: Wiley
Date: 21-01-2009
Publisher: Public Library of Science (PLoS)
Date: 11-2018
Publisher: Elsevier BV
Date: 10-1999
Publisher: Elsevier BV
Date: 12-2013
Publisher: Cold Spring Harbor Laboratory
Date: 28-04-2017
DOI: 10.1101/131920
Abstract: The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA. The problem of evaluating evidential weight is even more challenging for Y profiles than for autosomal profiles. Numerous approaches have been proposed, but they fail to deal adequately with the fact that men with matching Y-profiles are re-lated in extended patrilineal clans, many of which may not be represented in available databases. This problem has been exacerbated by recent profiling kits with high mutation rates. Because the relevant population is difficult to define, yet the number of matching relatives is fixed as population size varies, it is typically infeasible to derive population-based match probabilities relevant to a specific crime. We propose a conceptually simple solution, based on a simulation model and software to approximate the distribution of the number of males with a matching Y profile. We show that this distribution is robust to different values for the variance in reproductive success and the population growth rate. We also use importance s ling reweighting to derive the distribution of the number of matching males conditional on a database frequency, finding that this conditioning typically has only a modest impact. We illustrate the use of our approach to quantify the value of Y profile evidence for a court in a way that is both scientifically valid and easily comprehensible by a judge or juror.
Publisher: Oxford University Press (OUP)
Date: 10-2002
Publisher: Elsevier BV
Date: 2023
Publisher: Wiley
Date: 20-09-2019
DOI: 10.1002/GEPI.22259
Abstract: Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome-wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was an inadequate adjustment for confounding in the original GWAS analysis. We also show that the choice of a summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic for test statistics from a meta-analysis of GWAS with different covariate adjustments.
Publisher: ASTM International
Date: 07-1996
DOI: 10.1520/JFS13961J
Publisher: Wiley
Date: 16-07-2022
DOI: 10.1002/GEPI.22492
Abstract: The inclusion of ancestrally erse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally erse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to erse populations.
Publisher: Oxford University Press (OUP)
Date: 16-06-2013
DOI: 10.1093/HMG/DDT284
Publisher: Springer Science and Business Media LLC
Date: 12-2007
DOI: 10.1038/NRG1916-C2
Publisher: BMJ
Date: 10-03-2014
Publisher: Springer Science and Business Media LLC
Date: 2015
Publisher: Oxford University Press (OUP)
Date: 02-1997
DOI: 10.1093/GENETICS/145.2.505
Abstract: The paper is concerned with methods for the estimation of the coalescence time (time since the most recent common ancestor) of a s le of intraspecies DNA sequences. The methods take advantage of prior knowledge of population demography, in addition to the molecular data. While some theoretical results are presented, a central focus is on computational methods. These methods are easy to implement, and, since explicit formulae tend to be either unavailable or unilluminating, they are also more useful and more informative in most applications. Extensions are presented that allow for the effects of uncertainty in our knowledge of population size and mutation rates, for variability in population sizes, for regions of different mutation rate, and for inference concerning the coalescence time of the entire population. The methods are illustrated using recent data from the human Y chromosome.
Publisher: Institute of Mathematical Statistics
Date: 05-1994
Publisher: Wiley
Date: 02-2010
Publisher: Oxford University Press (OUP)
Date: 23-02-2008
DOI: 10.1093/BIOINFORMATICS/BTN071
Abstract: Motivation: Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically ≈ 10−7) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. Results: We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. Availability: The software can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: I.coin@imperial.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 07-10-2008
DOI: 10.1093/BIOINFORMATICS/BTN514
Abstract: Summary: Genetic data obtained on population s les convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of s les. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and s les. Such scenarios involve any combination of population ergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. Availability: The software DIY ABC is freely available at www.montpellier.inra.fr/CBGP/diyabc. Contact: j.cornuet@imperial.ac.uk Supplementary information: Supplementary data are also available at www.montpellier.inra.fr/CBGP/diyabc
Publisher: American Physical Society (APS)
Date: 10-1989
Publisher: Wiley
Date: 09-2011
DOI: 10.1111/J.1469-1809.2011.00670.X
Abstract: Rheumatoid arthritis (RA) is strongly associated with the human leukocyte antigen (HLA) genomic region, most notably with a group of HLA-DRB1 alleles termed the shared epitope (SE). There is also substantial evidence of other risk loci in the HLA region, but refinement has been h ered by extensive linkage disequilibrium (LD). Using genotype imputation, we analysed 6575 RA cases and controls with genotypes at 6180 HLA SNPs about half the subjects had four-digit DRB1 genotypes. Single-SNP tests revealed hundreds of strong associations across the HLA region, even after adjusting for DRB1. We implemented penalised logistic regression in a multi-SNP association analysis using the double-exponential (DE) penalty term on the regression coefficients and the normal-exponential-gamma (NEG). The penalised approaches identified sparse sets of SNPs that could collectively explain most of the association with RA over the whole HLA region. The HLA-DPB1 SNP rs3117225, was consistently identified in our analyses and was confirmed by results from the North American Rheumatoid Arthritis Consortium study (NARAC). We conclude that SNP selection using penalised regression shows a substantial benefit over single-SNP analyses in identifying risk loci in regions of high LD, and the flexibility of the NEG conveys additional advantages.
Publisher: Springer Science and Business Media LLC
Date: 06-1996
DOI: 10.1007/BF02432124
Abstract: Pharmacologic agents are frequently utilized for management of intensive care unit (ICU) delirium, yet prescribing patterns and impact of medication choices on patient outcomes are poorly described. We sought to describe prescribing practices for management of ICU delirium and investigate the independent association of medication choice on key in-hospital outcomes including delirium resolution, in-hospital mortality, and days alive and free of the ICU or hospital. A retrospective study of delirious adult ICU patients at a tertiary academic medical center. Data were obtained regarding daily mental status (normal, delirious, and comatose), pharmacologic treatment, hospital course, and survival via electronic health record. Daily transition models were constructed to assess the independent association of previous day mental status and medication administration on mental status the following day and in-hospital mortality, after adjusting for prespecified covariates. Linear regression models investigated the association of medication administration on days alive and free of the ICU or the hospital during the first 30 days after ICU admission. We identified 8591 encounters of ICU delirium. Half (45.6%) of patients received pharmacologic treatment for delirium, including 45.4% receiving antipsychotics, 2.2% guanfacine, and 0.84% valproic acid. Median highest Richmond Agitation-Sedation Scale (RASS) score was 1 (0, 1) in patients initiated on medications and 0 (-1, 0) for nonrecipients. Haloperidol, olanzapine, and quetiapine comprised >97% of antipsychotics utilized with 48% receiving 2 or more and 20.6% continued on antipsychotic medications at hospital discharge. Haloperidol and olanzapine were associated with greater odds of continued delirium (odds ratio [OR], 1.48 95% confidence interval [95% CI], 1.30-1.65 P < .001 and OR, 1.37 95% CI, 1.20-1.56 P = .003, respectively) and increased hazard of in-hospital mortality (hazard ratio [HR], 1.46 95% CI, 1.10-1.93 P = .01 and HR, 1.67 95% CI, 1.14-2.45 P = .01, respectively) while quetiapine showed a decreased hazard of in-hospital mortality (HR, 0.58 95% CI, 0.40-0.84 P = .01). Haloperidol, olanzapine, and quetiapine were associated with fewer days alive and free of hospitalization (all P < .001). There was no significant association of any antipsychotic medication with days alive and free of the ICU. Neither guanfacine nor valproic acid were associated with in-hospital outcomes examined. Pharmacologic interventions for management of ICU delirium are common, most often with antipsychotics, and frequently continued at hospital discharge. These medications may not portend benefit, may introduce additional harm, and should be used with caution for delirium management. Continuation of these medications through hospitalization and discharge draws into question their safety and role in patient recovery.
Publisher: Elsevier BV
Date: 05-1985
DOI: 10.1016/S0022-5193(85)80255-1
Abstract: The corneal limbal vessels of an animal host respond to the presence of a source of Tumour Angiogenesis Factor (TAF) implanted in the cornea by the formation of new capillaries which grow towards the source. This neovasculature can be easily seen and studied and this paper describes a mathematical model of some of the important features of the growth. The model includes the diffusion of TAF, the formation of sprouts from pre-existing vessels and models the movement of these sprouts to form new capillaries as a chemotactic response to the presence of TAF. Numerical results are produced for various values of the parameters which characterize the model and it is suggested that the model might form the framework for further theoretical work on related phenomena such as wound healing or to develop strategies for the investigation of anti-angiogenesis.
Publisher: Springer Science and Business Media LLC
Date: 08-09-2008
Publisher: Wiley
Date: 09-11-2007
DOI: 10.1002/GEPI.20183
Abstract: Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.
Publisher: Springer Science and Business Media LLC
Date: 28-01-2019
DOI: 10.1007/S00285-018-01325-0
Abstract: In population genetics, the Dirichlet (also called the Balding-Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright-Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes-Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta-Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa-Kishino-Yano and Tamura-Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.
Publisher: SAGE Publications
Date: 05-01-2011
Publisher: Springer Science and Business Media LLC
Date: 11-1991
DOI: 10.1007/BF02461488
Publisher: Wiley
Date: 09-07-2010
Publisher: Walter de Gruyter GmbH
Date: 06-01-2010
Abstract: How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic for the selection of summary statistics. Secondly, we propose a two-stage procedure: the minimum-entropy algorithm is used to identify simulated datasets close to that observed, and these are each successively regarded as observed datasets for which the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a simulation study, we both singly and jointly inferred the scaled mutation and recombination parameters from a population s le of DNA sequences. The computationally-fast minimum entropy algorithm showed a modest improvement over existing methods while our two-stage procedure showed substantial and highly-significant further improvement for both univariate and bivariate inferences. We found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.
Publisher: Elsevier BV
Date: 07-2004
Publisher: Elsevier BV
Date: 09-2018
DOI: 10.1016/J.FSIGEN.2018.06.019
Abstract: In forensic genetics, the likelihood ratio (LR), measuring the value of DNA profile evidence, is computed from a database of allele frequencies. Here, we address the choice of database and adjustments for population structure and s le size in the context of Brazil. The Brazilian population underwent a complex process of colonization, migration and mating, which created an admixed genetic composition that makes it difficult to obtain an appropriate database for a given case. National databases are now available, as well as databases for many Brazilian states. However, those databases are not statistically random s les, and state boundaries may not accurately reflect the sub-structuring of genetic ersity. We compared the LR calculated using the relevant state-specific database with the statistics calculated when a national database and when international databases were used. We evaluated two methods of adjustment for population structure, due to Wright [13] and Balding and Nichols [14]. We also considered two adjustments for database s le size: the Balding size bias correction [15] and a minimum allele frequency [16]. Our results show that the use of a national database with the Balding and Nichols adjustment and θ = 0.002 generated lower LR values than did the state-specific database in more than 50% of the profiles simulated using the state-based allele frequencies, while θ = 0.01 produced lower LRs for more than 90% of the profiles. We conclude that the utilization of a national database for Brazilian cases can be justified in association with the appropriate adjustment for population structure.
Publisher: Cold Spring Harbor Laboratory
Date: 23-07-2018
DOI: 10.1101/374686
Abstract: Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological s le, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded s les. In addition, biological s les such as fingernails, old bones, teeth and hair have mtDNA but little or no autosomal DNA. The relatively low mutation rate of the mitochondrial genome (mitogenome) means that there can be large sets of matrilineal-related in iduals sharing a common mitogenome. Here we present the mitolina simulation software that we use to describe the distribution of the number of mitogenomes in a population that match a given mitogenome, and investigate its dependence on population size and growth rate, and on a database count of the mitogenome. Further, we report on the distribution of the number of meioses separating pairs of in iduals with matching mitogenome. Our results have important implications for assessing the weight of mtDNA profile evidence in forensic science, but mtDNA analysis has many non-human applications, for ex le in tracking the source of ivory. Our methods and software can also be used for simulations to validate models of population history in human or non-human populations. The maternally-inherited mitochondrial DNA (mtDNA) represents only a small fraction of the human genome, but mtDNA profiles are important in forensic science, for ex le when a biological evidence s le is degraded or when maternal relatedness is questioned. For forensic mtDNA analysis, it is important to know how many in iduals share a mtDNA profile. We present a simulation model of mtDNA profile evolution, implemented in open-source software, and use it to describe the distribution of the number of in iduals with matching mitogenomes, and their matrilineal relatedness. The latter is measured as the number of mother-child pairs in the lineage linking two matching in iduals. We also describe how these distributions change when conditioning on a count of the profile in a frequency database.
Publisher: Springer Science and Business Media LLC
Date: 19-12-2018
DOI: 10.1038/S41467-018-07748-Z
Abstract: Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the intermixing (admixture) of Native Americans, Europeans and Sub-Saharan Africans. Using novel haplotype-based methods, here we infer sub-continental ancestry in over 6,500 Latin Americans and evaluate the impact of regional ancestry variation on physical appearance. We find that Native American ancestry components in Latin Americans correspond geographically to the present-day genetic structure of Native groups, and that sources of non-Native ancestry, and admixture timings, match documented migratory flows. We also detect South/East Mediterranean ancestry across Latin America, probably stemming mostly from the clandestine colonial migration of Christian converts of non-European origin (Conversos). Furthermore, we find that ancestry related to highland (Central Andean) versus lowland (Mapuche) Natives is associated with variation in facial features, particularly nose morphology, and detect significant differences in allele frequencies between these groups at loci previously associated with nose morphology in this s le.
Publisher: Elsevier BV
Date: 06-2005
Publisher: Elsevier BV
Date: 07-2008
Publisher: American Association for the Advancement of Science (AAAS)
Date: 31-05-1996
DOI: 10.1126/SCIENCE.272.5266.1359
Abstract: BBX transcription factors are a kind of zinc finger transcription factors with one or two B-box domains, which partilant in plant growth, development and response to abiotic or biotic stress. The BBX family has been identified in Arabidopsis, rice, tomato and some other model plant genomes. Here, 24 CaBBX genes were identified in pepper (Capsicum annuum L.), and the phylogenic analysis, structures, chromosomal location, gene expression patterns and subcellular localizations were also carried out to understand the evolution and function of CaBBX genes. All these CaBBXs were ided into five classes, and 20 of them distributed in 11 of 12 pepper chromosomes unevenly. Most duplication events occurred in subgroup I. Quantitative RT-PCR indicated that several CaBBX genes were induced by abiotic stress and hormones, some had tissue-specific expression profiles or differentially expressed at developmental stages. Most of CaBBX members were predicated to be nucleus-localized in consistent with the transient expression assay by onion inner epidermis of the three tested CaBBX members (CaBBX5, 6 and 20). Several CaBBX genes were induced by abiotic stress and exogenous phytohormones, some expressed tissue-specific and variously at different developmental stage. The detected CaBBXs act as nucleus-localized transcription factors. Our data might be a foundation in the identification of CaBBX genes, and a further understanding of their biological function in future studies.
Publisher: Annual Reviews
Date: 03-01-2014
DOI: 10.1146/ANNUREV-STATISTICS-022513-115602
Abstract: The evaluation of weight of evidence for forensic DNA profiles has been a subject of controversy since their introduction over 20 years ago. Substantial progress has been made for standard DNA profiles, but new issues have arisen in recent years with the advent of more sensitive profiling techniques, allowing profiles to be recovered from minuscule amounts of possibly degraded DNA. These low-template DNA profiles suffer from enhanced stochastic effects, including dropin, dropout, and stutter, which pose problems for DNA profile evaluation. These problems are now beginning to be overcome with the emergence of several statistical models and software. We first review the general principles of statistical evaluation of DNA profile evidence, and we then focus on low-template DNA profiles, briefly reviewing the main statistical models and software. We cover methods that use allele presence/absence and those that use electropherogram peak heights, focusing on the likelihood ratio as measure of evidential weight.
Publisher: Wiley
Date: 2008
DOI: 10.1002/GEPI.20292
Abstract: The problem of multiple testing is an important aspect of genome-wide association studies, and will become more important as marker densities increase. The problem has been tackled with permutation and false discovery rate procedures and with Bayes factors, but each approach faces difficulties that we briefly review. In the current context of multiple studies on different genotyping platforms, we argue for the use of truly genome-wide significance thresholds, based on all polymorphisms whether or not typed in the study. We approximate genome-wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria. Overall we find that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that we consider and can be highly dependent on s le size. We compare our results for sequence data to those derived by the HapMap Consortium and find notable differences which may be due to the small s le sizes used in the HapMap estimate.
Publisher: Springer Science and Business Media LLC
Date: 10-2006
DOI: 10.1038/NRG1916
Abstract: Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy-Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.
Publisher: Elsevier BV
Date: 11-2022
Publisher: Wiley
Date: 24-08-2008
Publisher: Wiley
Date: 04-03-2022
DOI: 10.1002/CSC2.20692
Abstract: Association mapping using crop cultivars allows identification of genetic loci of direct relevance to breeding. Here, 150 U.K. wheat ( Triticum aestivum L.) cultivars genotyped with 23,288 single nucleotide polymorphisms (SNPs) were used for genome‐wide association studies (GWAS) using historical phenotypic data for grain protein content, Hagberg falling number (HFN), test weight, and grain yield. Power calculations indicated experimental design would enable detection of quantitative trait loci (QTL) explaining ≥20% of the variation (PVE) at a relatively high power of %, falling to 40% for detection of a SNP with an R 2 ≥ .5 with the same QTL. Genome‐wide association studies identified marker‐trait associations for all four traits. For HFN ( h 2 = .89), six QTL were identified, including a major locus on chromosome 7B explaining 49% PVE and reducing HFN by 44 s. For protein content ( h 2 = 0.86), 10 QTL were found on chromosomes 1A, 2A, 2B, 3A, 3B, and 6B, together explaining 48.9% PVE. For test weight, five QTL were identified (one on 1B and four on 3B 26.3% PVE). Finally, 14 loci were identified for grain yield ( h 2 = 0.95) on eight chromosomes (1A, 2A, 2B, 2D, 3A, 5B, 6A, 6B 68.1% PVE), of which five were located within 16 Mbp of genetic regions previously identified as under breeder selection in European wheat. Our study demonstrates the utility of exploiting historical crop datasets, identifying genomic targets for independent validation, and ultimately for wheat genetic improvement.
Publisher: Wiley
Date: 03-2002
DOI: 10.1111/J.0006-341X.2002.00241.X
Abstract: A recent article in Biometrics (Stockmarr, 1999, 55, 671-677) has generated correspondence (56, 1274-1277 57, 976-980) reigniting a controversy started by a 1996 report on DNA profile evidence issued by the U.S. National Research Council (NRC). The issue concerns the evidential weight of a DNA profile match when the match results from a search through a profile database. The views of both Stockmarr and the NRC report conflict with those of many statisticians working in the area, and the differing viewpoints lead to dramatically different assessments of evidence. I outline reasons why Stockmarr and the NRC report are wrong. I also briefly discuss possible reasons why forensic applications tend to be problematic for statisticians.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 31-05-1996
DOI: 10.1126/SCIENCE.272.5266.1356
Abstract: Sequential Organ Failure Assessment (SOFA) and other illness prognostic scores predict adverse outcomes in critical patients. Their validation as a decision-making tool in the emergency department (ED) of secondary hospitals is not well established. The aim of this study was to compare SOFA, NEWS2, APACHE II, and SAPS II scores as predictors of adverse outcomes and decision-making tool in ED. Data of 121 patients (age 73 ± 10 years, 58% males, Charlson Comorbidity Index 5.7 ± 2.1) with a confirmed sepsis were included in a retrospective study between January 2017 and February 2020. Scores were computed within the first 24 h after admission. Primary outcome was the occurrence of either in-hospital death or mechanical ventilation within 7 days. Secondary outcome was 30-day all-cause mortality. Patients older than 64 years (elderly) represent 82% of s le. Primary and secondary outcomes occurred in 40 and 44%, respectively. Median 30-day survival time of dead patients was 4 days (interquartile range 1-11). The best predictive score based on the area under the receiver operating curve (AUROC) was SAPS II (0.823, 95% confidence interval, CI, 0.744-0.902), followed by APACHE II (0.762, 95% CI 0.673-0.850), NEWS2 (0.708, 95% CI 0.616-0.800), and SOFA (0.650, 95% CI 0.548-0.751). SAPS II cut-off of 49 showed the lowest false-positive rate (12, 95% CI 5-20) and the highest positive predictive value (80, 95% CI 68-92), whereas NEWS2 cut-off of 7 showed the lowest false-negative rate (10, 95% CI 2-19) and the highest negative predictive value (86, 95% CI 74-97). By combining NEWS2 and SAPS II cut-offs, we accurately classified 64% of patients. In survival analysis, SAPS II cut-off showed the highest difference in 30-day mortality (Hazards Ratio, HR, 5.24, 95% CI 2.99-9.21, P < 0.001). Best independent negative predictors of 30-day mortality were body temperature, mean arterial pressure, arterial oxygen saturation, and hematocrit levels. Positive predictors were male sex, heart rate and serum sodium concentration. SAPS II is a good prognostic tool for discriminating high-risk patient suitable for sub-intensive/intensive care units, whereas NEWS2 for discriminating low-risk patients for low-intensive units. Our results should be limited to cohorts with a high prevalence of elderly or comorbidities.
Publisher: Walter de Gruyter GmbH
Date: 2013
Publisher: Oxford University Press (OUP)
Date: 04-2022
Abstract: Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. Although classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we introduce a novel statistical model, specific to admixed populations, that identifies loci under selection while determining whether the selection likely occurred post-admixture or prior to admixture in one of the ancestral source populations. Through extensive simulations, we show that this method is able to detect selection, even in recently formed admixed populations, and to accurately differentiate between selection occurring in the ancestral or admixed population. We apply this method to genome-wide SNP data of ∼4,000 in iduals in five admixed Latin American cohorts from Brazil, Chile, Colombia, Mexico, and Peru. Our approach replicates previous reports of selection in the human leukocyte antigen region that are consistent with selection post-admixture. We also report novel signals of selection in genomic regions spanning 47 genes, reinforcing many of these signals with an alternative, commonly used local-ancestry-inference approach. These signals include several genes involved in immunity, which may reflect responses to endemic pathogens of the Americas and to the challenge of infectious disease brought by European contact. In addition, some of the strongest signals inferred to be under selection in the Native American ancestral groups of modern Latin Americans overlap with genes implicated in energy metabolism phenotypes, plausibly reflecting adaptations to novel dietary sources available in the Americas.
Publisher: Elsevier BV
Date: 02-2011
Publisher: Proceedings of the National Academy of Sciences
Date: 21-04-2014
Abstract: Our knowledge of the domestication of animal and plant species comes from a erse range of disciplines, and interpretation of patterns in data from these disciplines has been the dominant paradigm in domestication research. However, such interpretations are easily steered by subjective biases that typically fail to account for the inherent randomness of evolutionary processes, and which can be blind to emergent patterns in data. The testing of explicit models using computer simulations, and the availability of powerful statistical techniques to fit models to observed data, provide a scientifically robust means of addressing these problems. Here we outline the principles and argue for the merits of such approaches in the context of domestication-related questions.
Publisher: Oxford University Press (OUP)
Date: 26-09-2007
DOI: 10.1093/IJE/DYM159
Abstract: Established guidelines for causal inference in epidemiological studies may be inappropriate for genetic associations. A consensus process was used to develop guidance criteria for assessing cumulative epidemiologic evidence in genetic associations. A proposed semi-quantitative index assigns three levels for the amount of evidence, extent of replication, and protection from bias, and also generates a composite assessment of 'strong', 'moderate' or 'weak' epidemiological credibility. In addition, we discuss how additional input and guidance can be derived from biological data. Future empirical research and consensus development are needed to develop an integrated model for combining epidemiological and biological evidence in the rapidly evolving field of investigation of genetic factors.
Publisher: Wiley
Date: 16-07-2021
Abstract: Mapping the genes underlying ecologically relevant traits in natural populations is fundamental to develop a molecular understanding of species adaptation. Current sequencing technologies enable the characterization of a species’ genetic ersity across the landscape or even over its whole range. The relevant capture of the genetic ersity across the landscape is critical for a successful genetic mapping of traits and there are no clear guidelines on how to achieve an optimal s ling and which sequencing strategy to implement. Here we determine, through simulation, the s ling scheme that maximizes the power to map the genetic basis of a complex trait in an outbreeding species across an idealized landscape and draw genomic predictions for the trait, comparing in idual and pool sequencing strategies. Our results show that quantitative trait locus detection power and prediction accuracy are higher when more populations over the landscape are s led and this is more cost‐effectively done with pool sequencing than with in idual sequencing. Additionally, we recommend s ling populations from areas of high genetic ersity. As progress in sequencing enables the integration of trait‐based functional ecology into landscape genomics studies, these findings will guide study designs allowing direct measures of genetic effects in natural populations across the environment.
Publisher: Walter de Gruyter GmbH
Date: 14-07-2016
Abstract: In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD . We apply it to 108 laboratory-generated crime-scene profiles and demonstrate techniques of model validation that are novel in the field. We use the results to explore the benefits of modeling peak heights, finding that it is not always advantageous, and to assess the merits of pre-extraction replication. We also introduce an approximation that can reduce computational complexity when there are multiple low-level contributors who are not of interest to the investigation, and we present a simple approximate adjustment for linkage between loci, making it possible to accommodate linkage when evaluating complex DNA profiles.
Publisher: Public Library of Science (PLoS)
Date: 30-11-2009
Publisher: Springer Science and Business Media LLC
Date: 09-02-2015
DOI: 10.1038/NG.3215
Publisher: Elsevier BV
Date: 11-2014
Publisher: Elsevier BV
Date: 07-2021
Publisher: Wiley
Date: 15-01-2003
Publisher: Oxford University Press (OUP)
Date: 09-2017
DOI: 10.1530/EJE-17-0293
Abstract: Mutations in the aryl hydrocarbon receptor-interacting protein ( AIP ) gene are associated with pituitary adenoma, acromegaly and gigantism. Identical alleles in unrelated pedigrees could be inherited from a common ancestor or result from recurrent mutation events. Observational, inferential and experimental study, including: AIP mutation testing reconstruction of 14 AIP -region (8.3 Mbp) haplotypes coalescent-based approximate Bayesian estimation of the time to most recent common ancestor (tMRCA) of the derived allele forward population simulations to estimate current number of allele carriers proposal of mutation mechanism protein structure predictions co-immunoprecipitation and cycloheximide chase experiments. Nine European-origin, unrelated c.805_825dup-positive pedigrees (four familial, five sporadic from the UK, USA and France) included 16 affected (nine gigantism/four acromegaly/two non-functioning pituitary adenoma patients and one prospectively diagnosed acromegaly patient) and nine unaffected carriers. All pedigrees shared a 2.79 Mbp haploblock around AIP with additional haploblocks privately shared between subsets of the pedigrees, indicating the existence of an evolutionarily recent common ancestor, the ‘English founder’, with an estimated median tMRCA of 47 generations (corresponding to 1175 years) with a confidence interval (9–113 generations, equivalent to 225–2825 years). The mutation occurred in a small tandem repeat region predisposed to slipped strand mispairing. The resulting seven amino-acid duplication disrupts interaction with HSP90 and leads to a marked reduction in protein stability. The c.805_825dup allele, originating from a common ancestor, associates with a severe clinical phenotype and a high frequency of gigantism. The mutation is likely to be the result of slipped strand mispairing and affects protein–protein interactions and AIP protein stability.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 03-2007
Publisher: Elsevier BV
Date: 2021
Publisher: Elsevier BV
Date: 11-2016
DOI: 10.1016/J.FSIGEN.2016.09.004
Abstract: Many DNA profiles recovered from crime scene s les are of a quality that does not allow them to be searched against, nor entered into, databases. We propose a method for the comparison of profiles arising from two DNA s les, one or both of which can have multiple donors and be affected by low DNA template or degraded DNA. We compute likelihood ratios to evaluate the hypothesis that the two s les have a common DNA donor, and hypotheses specifying the relatedness of two donors. Our method uses a probability distribution for the genotype of the donor of interest in each s le. This distribution can be obtained from a statistical model, or we can exploit the ability of trained human experts to assess genotype probabilities, thus extracting much information that would be discarded by standard interpretation rules. Our method is compatible with established methods in simple settings, but is more widely applicable and can make better use of information than many current methods for the analysis of mixed-source, low-template DNA profiles. It can accommodate uncertainty arising from relatedness instead of or in addition to uncertainty arising from noisy genotyping. We describe a computer program GPMDNA, available under an open source licence, to calculate LRs using the method presented in this paper.
Publisher: Springer Science and Business Media LLC
Date: 19-05-2016
DOI: 10.1038/NCOMMS11616
Abstract: We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association ( P values × 10 −8 ) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subs le of ∼3,000 in iduals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3 . Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR , DCHS2 , RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1 . Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion.
Publisher: S. Karger AG
Date: 2009
DOI: 10.1159/000215727
Abstract: i Objectives: /i The objective of this study was to investigate the possible association of clinical variables and apolipoprotein (APOE, APOCI and APOB) polymorphisms with the development of myocardial infraction (MI) and coronary heart disease (CHD) in Kuwaitis. i Subjects and Methods: /i APOE, APOCI and APOB genotypes were determined by polymerase chain reaction followed by restriction fragment length polymorphism in 143 Kuwaiti CHD patients with (n = 88) and without (n = 55) MI and in 122 controls matched for gender and age. Statistical and genetic analyses of the genotype, allele and haplotype frequencies, as well as regression analyses of genetic and clinical variables were done. i Results: /i There was a statistically significant association between CHD and medical history of diabetes mellitus (p 0.001), hypertension (p 0.01), high cholesterol (p 0.05) and family history of CHD (p 0.001). A highly significant association (p 0.001) was found, with an adjusted odds ratio of 9.32, for family history and the development of MI. No significant differences were found for allele or genotype frequencies between CHD patients and controls. i Conclusion: /i The strong effect of family history suggests a major genetic component for the development of CHD in Kuwaitis, but this association does not appear to be related to the APO genes studied here. The results in this study encourages future research into these and other polymorphisms and their potential association with MI and CHD in the Kuwaiti population.
Publisher: Springer Science and Business Media LLC
Date: 2007
Publisher: Proceedings of the National Academy of Sciences
Date: 29-11-2010
Abstract: Although commonplace in human disease genetics, genome-wide association (GWA) studies have only relatively recently been applied to plants. Using 32 phenotypes in the inbreeding crop barley, we report GWA mapping of 15 morphological traits across ∼500 cultivars genotyped with 1,536 SNPs. In contrast to the majority of human GWA studies, we observe high levels of linkage disequilibrium within and between chromosomes. Despite this, GWA analysis readily detected common alleles of high penetrance. To investigate the potential of combining GWA mapping with comparative analysis to resolve traits to candidate polymorphism level in unsequenced genomes, we fine-mapped a selected phenotype (anthocyanin pigmentation) within a 140-kb interval containing three genes. Of these, resequencing the putative anthocyanin pathway gene HvbHLH1 identified a deletion resulting in a premature stop codon upstream of the basic helix-loop-helix domain, which was diagnostic for lack of anthocyanin in our association and biparental mapping populations. The methodology described here is transferable to species with limited genomic resources, providing a paradigm for reducing the threshold of map-based cloning in unsequenced crops.
Publisher: The Royal Society
Date: 12-2018
Abstract: We present a new Markov chain Monte Carlo algorithm, implemented in the software Arbores, for inferring the history of a s le of DNA sequences. Our principal innovation is a bridging procedure, previously applied only for simple stochastic processes, in which the local computations within a bridge can proceed independently of the rest of the DNA sequence, facilitating large-scale parallelization.
Publisher: Cold Spring Harbor Laboratory
Date: 27-01-2019
DOI: 10.1101/532069
Abstract: LD SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability and genetic correlation using only genome wide association study (GWAS) test statistics. SumHer is a newly-introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was inadequate adjustment for confounding in the original GWAS analysis. We also show that choice of summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic when LDSC or SumHer is applied to test statistics from a meta-analysis of GWAS with different covariate adjustments.
Publisher: Elsevier BV
Date: 09-2020
Publisher: Public Library of Science (PLoS)
Date: 25-09-2014
Publisher: Oxford University Press (OUP)
Date: 19-08-2013
DOI: 10.1093/HMG/DDT403
Publisher: Springer Science and Business Media LLC
Date: 23-03-2020
Publisher: Elsevier BV
Date: 03-1995
DOI: 10.1016/0888-7543(95)80078-Z
Abstract: We describe efficient methods for screening clone libraries, based on pooling schemes that we call "random k-sets designs." In these designs, the pools in which any clone occurs are equally likely to be any possible selection of k from the v pools. The values of k and v can be chosen to optimize desirable properties. Random k-sets designs have substantial advantages over alternative pooling schemes: they are efficient, flexible, and easy to specify, require fewer pools, and have error-correcting and error-detecting capabilities. In addition, screening can often be achieved in only one pass, thus facilitating automation. For design comparison, we assume a binomial distribution for the number of "positive" clones, with parameters n, the number of clones, and c, the coverage. We propose the expected number of resolved positive clones--clones that are definitely positive based upon the pool assays--as a criterion for the efficiency of a pooling design. We determine the value of k that is optimal, with respect to this criterion, as a function of v, n, and c. We also describe superior k-sets designs called k-sets packing designs. As an illustration, we discuss a robotically implemented design for a 2.5-fold-coverage, human chromosome 16 YAC library of n = 1298 clones. We also estimate the probability that each clone is positive, given the pool-assay data and a model for experimental errors.
Publisher: Springer Science and Business Media LLC
Date: 10-12-2018
DOI: 10.1038/S41467-018-07524-Z
Abstract: The epilepsies affect around 65 million people worldwide and have a substantial missing heritability component. We report a genome-wide mega-analysis involving 15,212 in iduals with epilepsy and 29,677 controls, which reveals 16 genome-wide significant loci, of which 11 are novel. Using various prioritization criteria, we pinpoint the 21 most likely epilepsy genes at these loci, with the majority in genetic generalized epilepsies. These genes have erse biological functions, including coding for ion-channel subunits, transcription factors and a vitamin-B6 metabolism enzyme. Converging evidence shows that the common variants associated with epilepsy play a role in epigenetic regulation of gene expression in the brain. The results show an enrichment for monogenic epilepsy genes as well as known targets of antiepileptic drugs. Using SNP-based heritability analyses we disentangle both the unique and overlapping genetic basis to seven different epilepsy subtypes. Together, these findings provide leads for epilepsy therapies based on underlying pathophysiology.
Publisher: Cold Spring Harbor Laboratory
Date: 05-07-2022
DOI: 10.1101/2022.07.01.22277161
Abstract: We present LDAK-GBAT, a novel tool for gene-based association testing using summary statistics from genome-wide association studies. We first evaluate LDAK-GBAT using ten phenotypes from the UK Biobank. We show that LDAK-GBAT is computationally efficient, taking approximately 30 minutes to analyze imputed data (2.9M common, genic SNPs), and requiring less than 10Gb memory. In total, LDAK-GBAT finds 680 genome-wide significant genes ( P ≤2.8×10 −6 ), which is at least 25% more than each of five existing tools (MAGMA, GCTA-fastBAT, sumFREGAT-SKAT-O, sumFREGAT-PCA and sumFREGAT-ACAT), and 48% more than found by single-SNP analysis. We then analyze 99 additional phenotypes from the UK Biobank, the Million Veterans Project and the Psychiatric Genetics Consortium. In total, LDAK-GBAT finds 7957 significant genes, which is at least 24% more than the best existing tools, and 42% more than found by single-SNP analysis.
Publisher: Elsevier BV
Date: 2013
DOI: 10.1016/J.FSIGEN.2012.06.001
Abstract: We consider the comparison of hypotheses "parent-child" or "full siblings" against the alternative of "unrelated" for pairs of in iduals for whom DNA profiles are available. This is a situation that occurs repeatedly in familial database searching. A decision rule that uses both the kinship index (KI), also known as the likelihood ratio, and the identity-by-state statistic (IBS) was advocated in a recent report as superior to the use of KI alone. Such proposal appears to conflict with the Neyman-Pearson Lemma of statistics, which states that the likelihood ratio alone provides the most powerful criterion for distinguishing between any two simple hypotheses. We therefore performed a simulation study that was two orders of magnitude larger than in the previous report, and our results corroborate the theoretical expectation that KI alone provides a better decision rule than KI combined with IBS.
Publisher: Springer Science and Business Media LLC
Date: 04-1991
DOI: 10.1038/HDY.1991.37
Abstract: DNA fingerprints are used in forensic sciences to identify in iduals. However, current analyses could underestimate the probability of two in iduals sharing the same profile because the effect of population structure is not incorporated. An alternative analysis is proposed to take into account population stratification. The analysis uses studies of inbreeding in human populations to obtain an empirical upper bound on the magnitude of the effect.
Publisher: Oxford University Press (OUP)
Date: 11-2007
DOI: 10.1534/GENETICS.106.069088
Abstract: Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic ersity, and for assessing the performance of statistical techniques, for ex le those designed to detect and measure the genomic effects of selection. It is also used to investigate the effectiveness of various design options for genetic association studies. Backward-in-time simulation methods are computationally efficient and have become widely used since their introduction in the 1980s. The forward-in-time approach has substantial advantages in terms of accuracy and modeling flexibility, but at greater computational cost. We have developed flexible and efficient simulation software and a rescaling technique to aid computational efficiency that together allow the simulation of sequence-level data over large genomic regions in entire diploid populations under various scenarios for demography, mutation, selection, and recombination, the latter including hotspots and gene conversion. Our forward evolution of genomic regions (FREGENE) software is freely available from www.ebi.ac.uk rojects/BARGEN together with an ancillary program to generate phenotype labels, either binary or quantitative. In this article we discuss limitations of coalescent-based simulation, introduce the rescaling technique that makes large-scale forward-in-time simulation feasible, and demonstrate the utility of various features of FREGENE, many not previously available.
Publisher: Springer Science and Business Media LLC
Date: 24-06-2015
DOI: 10.1038/NCOMMS8500
Abstract: Here we report a genome-wide association study for non-pathological pinna morphology in over 5,000 Latin Americans. We find genome-wide significant association at seven genomic regions affecting: lobe size and attachment, folding of antihelix, helix rolling, ear protrusion and antitragus size (linear regression P values 2 × 10 −8 to 3 × 10 −14 ). Four traits are associated with a functional variant in the Ectodysplasin A receptor ( EDAR ) gene, a key regulator of embryonic skin appendage development. We confirm expression of Edar in the developing mouse ear and that Edar -deficient mice have an abnormally shaped pinna. Two traits are associated with SNPs in a region overlapping the T-Box Protein 15 ( TBX15 ) gene, a major determinant of mouse skeletal development. Strongest association in this region is observed for SNP rs17023457 located in an evolutionarily conserved binding site for the transcription factor Cartilage paired-class homeoprotein 1 ( CART1 ), and we confirm that rs17023457 alters in vitro binding of CART1 .
Publisher: Proceedings of the National Academy of Sciences
Date: 03-11-2003
Abstract: Single-nucleotide polymorphism (SNP) genotypes were recently examined in an 890-kb region flanking the human gene CYP2D6 . Single-marker and haplotype-based analyses identified, with genomewide significance ( P 10 -7 ), a 403-kb interval displaying strong linkage disequilibrium (LD) with predicted poor-metabolizer phenotype. However, the width of this interval makes the location of causal variants difficult: for ex le, the interval contains seven known or predicted genes in addition to CYP2D6 . We have developed the Bayesian fine-mapping software coldmap , which, applied to these genotype data, yields a 95% location interval covering only 185 kb and establishes genomewide significance for a causal locus within the region. Strikingly, our interval correctly excludes four SNPs, which in idually display association with genomewide significance, including the SNP showing strongest LD ( P 10 -34 ). In addition, coldmap distinguishes homozygous cases for the major CYP2D6 mutation from those bearing minor mutations. We further investigate a selection of SNP subsets and find that previously reported methods lead to a 38% savings in SNPs at the cost of an increase of % in the width of the location interval.
Publisher: Oxford University Press (OUP)
Date: 05-2001
DOI: 10.1093/BIOINFORMATICS/17.5.479
Abstract: Summary: MAC5 implements MCMC s ling of the posterior distribution of tree topologies from DNA sequences containing gaps by using a five state model of evolution (the four nucleotides and the gap character). Availability: MAC5 is available from www.reading.ac.uk/statistics/genetics/ under the software link. Contact: gmcguire@hgmp.mrc.ac.uk
Publisher: Oxford University Press (OUP)
Date: 2001
DOI: 10.1093/GENETICS/157.1.413
Abstract: We describe a Bayesian approach to analyzing multilocus genotype or haplotype data to assess departures from gametic (linkage) equilibrium. Our approach employs a Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior probability distributions of disequilibrium parameters. The distributions are computed exactly in some simple settings. Among other advantages, posterior distributions can be presented visually, which allows the uncertainties in parameter estimates to be readily assessed. In addition, background knowledge can be incorporated, where available, to improve the precision of inferences. The method is illustrated by application to previously published datasets implications for multilocus forensic match probabilities and for simple association-based gene mapping are also discussed.
Publisher: Cold Spring Harbor Laboratory
Date: 04-2002
DOI: 10.1101/GR.214902
Abstract: Previous studies have reported that about 85% of human ersity at Short Tandem Repeat (STR) and Restriction Fragment Length Polymorphism (RFLP) autosomal loci is due to differences between in iduals of the same population, whereas differences among continental groups account for only 10% of the overall genetic variance. These findings conflict with popular notions of distinct and relatively homogeneous human races, and may also call into question the apparent usefulness of ethnic classification in, for ex le, medical diagnostics. Here, we present new data on 21 Alu insertions in 32 populations. We analyze these data along with three other large, globally dispersed data sets consisting of apparently neutral biallelic nuclear markers, as well as with a β-globin data set possibly subject to selection. We confirm the previous results for the autosomal data, and find a higher ersity among continents for Y-chromosome loci. We also extend the analyses to address two questions: (1) whether differences between continental groups, although small, are nevertheless large enough to confidently assign in iduals to their continent on the basis of their genotypes (2) whether the observed genotypes naturally cluster into continental or population groups when the s le source location is ignored. Using a range of statistical methods, we show that classification errors are at best around 30% for autosomal biallelic polymorphisms and 27% for the Y chromosome. Two data sets suggest the existence of three and four major groups of genotypes worldwide, respectively, and the two groupings are inconsistent. These results suggest that, at random biallelic loci, there is little evidence, if any, of a clear sub ision of humans into biologically defined groups.
Publisher: Elsevier BV
Date: 07-2000
DOI: 10.1086/302956
Publisher: Wiley
Date: 24-08-2007
Publisher: Oxford University Press (OUP)
Date: 13-01-2022
Abstract: Whole-genome sequencing has facilitated genome-wide analyses of association, prediction and heritability in many organisms. However, such analyses in bacteria are still in their infancy, being limited by difficulties including genome plasticity and strong population structure. Here we propose a suite of methods including linear mixed models, elastic net and LD-score regression, adapted to bacterial traits using innovations such as frequency-based allele coding, both insertion/deletion and nucleotide testing and heritability partitioning. We compare and validate our methods against the current state-of-art using simulations, and analyse three phenotypes of the major human pathogen Streptococcus pneumoniae, including the first analyses of minimum inhibitory concentrations (MIC) for penicillin and ceftriaxone. We show that the MIC traits are highly heritable with high prediction accuracy, explained by many genetic associations under good population structure control. In ceftriaxone MIC, this is surprising because none of the isolates are resistant as per the inhibition zone criteria. We estimate that half of the heritability of penicillin MIC is explained by a known drug-resistance region, which also contributes a quarter of the ceftriaxone MIC heritability. For the within-host carriage duration phenotype, no associations were observed, but the moderate heritability and prediction accuracy indicate a moderately polygenic trait.
Publisher: Oxford University Press (OUP)
Date: 12-2002
DOI: 10.1093/GENETICS/162.4.2025
Abstract: We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.
Publisher: Oxford University Press (OUP)
Date: 14-08-2009
DOI: 10.1093/BIOINFORMATICS/BTP487
Abstract: Summary: PopABC is a computer package for inferring the pattern of demographic ergence of closely related populations and species. The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes. Availability and Implementation: Source code and binaries are freely available at www.reading.ac.uk/∼sar05sal/software.htm. The program was implemented in C and can run on UNIX, MacOSX and Windows operating systems. Contact: joao.lopes@reading.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: The Royal Society
Date: 22-09-1992
Abstract: The effects of gene conversion can be detected in the DNA sequences of multigene families. We develop a permutation test of the significance of patterns of sequence mismatches, and apply it to the sequences of the red- and green-sensitive visual pigment genes of human and the diana monkey. Whereas conventional tests of the rate of sequence ergence are equivocal, the permutation test convincingly excludes ergence in the absence of gene conversion (p = 10(-6)).
Publisher: Wiley
Date: 2006
DOI: 10.1002/GEPI.20134
Abstract: We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an in idual's two haplotypes that lie within the cluster predicts the in idual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.
Publisher: Elsevier BV
Date: 09-2014
Publisher: Proceedings of the National Academy of Sciences
Date: 07-2013
Abstract: Enhancements in sensitivity now allow DNA profiles to be obtained from only tens of picograms of DNA, corresponding to a few cells, even for s les subject to degradation from environmental exposure. However, low-template DNA (LTDNA) profiles are subject to stochastic effects, such as “dropout” and “dropin” of alleles, and highly variable stutter peak heights. Although the sensitivity of the newly developed methods is highly appealing to crime investigators, courts are concerned about the reliability of the underlying science. High-profile cases relying on LTDNA evidence have collapsed amid controversy, including the case of Hoey in the United Kingdom and the case of Knox and Sollecito in Italy. I argue that rather than the reliability of the science, courts and commentators should focus on the validity of the statistical methods of evaluation of the evidence. Even noisy DNA evidence can be more powerful than many traditional types of evidence, and it can be helpful to a court as long as its strength is not overstated. There have been serious shortcomings in statistical methods for the evaluation of LTDNA profile evidence, however. Here, I propose a method that allows for multiple replicates with different rates of dropout, sporadic dropins, different amounts of DNA from different contributors, relatedness of suspected and alternate contributors, “uncertain” allele designations, and degradation. R code implementing the method is open source, facilitating wide scrutiny. I illustrate its good performance using real cases and simulated crime scene profiles.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 06-2006
Publisher: Springer Science and Business Media LLC
Date: 22-03-2011
DOI: 10.1038/IJO.2011.22
Publisher: Springer Science and Business Media LLC
Date: 10-09-2009
DOI: 10.1007/S00125-009-1504-7
Abstract: Insulin resistance and related metabolic disturbances are more common among Asian Indians than European whites. Little is known about the heritability of insulin resistance traits in Asian Indians. Our objective was to estimate heritabilities and genetic correlations in Asian Indian families. Phenotypic data were assembled for 181 UK Asian Indian probands with premature CHD, and their 1,454 first-, second- and third-degree relatives. We calculated (narrow-sense) heritabilities and genetic correlations for insulin resistance traits, and common environmental effects using all study participants and a multivariate model. The analysis was repeated in a subs le consisting of in iduals not on drug therapy. Heritability estimates (SE) for in iduals not on drug therapy were: BMI 0.31 (0.04), WHR 0.27 (0.04), systolic BP 0.29 (0.03), triacylglycerol 0.40 (0.04), HDL-cholesterol 0.53 (0.04), glucose 0.37 (0.03), HOMA of insulin resistance (HOMA-IR) 0.22 (0.04), and HbA(1c) 0.60 (0.04). We observed many significant genetic correlations between the traits, in particular between HOMA-IR and BMI. Heritability estimates were lower for all phenotypes when analysed among all participants. Genetic factors contribute to a significant proportion of the total variance in insulin resistance and related metabolic disturbances in Asian Indian CHD families.
Publisher: Oxford University Press (OUP)
Date: 12-2000
Abstract: The analysis of infectious disease data presents challenges arising from the dependence in the data and the fact that only part of the transmission process is observable. These difficulties are usually overcome by making simplifying assumptions. The paper explores the use of Markov chain Monte Carlo (MCMC) methods for the analysis of infectious disease data, with the hope that they will permit analyses to be made under more realistic assumptions. Two important kinds of data sets are considered, containing temporal and non-temporal information, from outbreaks of measles and influenza. Stochastic epidemic models are used to describe the processes that generate the data. MCMC methods are then employed to perform inference in a Bayesian context for the model parameters. The MCMC methods used include standard algorithms, such as the Metropolis–Hastings algorithm and the Gibbs s ler, as well as a new method that involves likelihood approximation. It is found that standard algorithms perform well in some situations but can exhibit serious convergence difficulties in others. The inferences that we obtain are in broad agreement with estimates obtained by other methods where they are available. However, we can also provide inferences for parameters which have not been reported in previous analyses.
Publisher: Oxford University Press (OUP)
Date: 24-02-2005
DOI: 10.1111/J.1740-9713.2005.00079.X
Abstract: The weight to be attached to DNA profile evidence was the centre of a huge scientific controversy in the early-mid-1990s, with scientists, including statistical scientists, criticising each other over both science and conduct in the editorials of prestigious journals, on television and radio, and in newspapers. Today, only the occasional shot rings out on the DNA evidence front. David Balding looks back and reflects on the causes of the dispute, its evolution and resolution, and the role in it of statistics and statisticians.
Publisher: Public Library of Science (PLoS)
Date: 15-07-2020
Publisher: Springer Science and Business Media LLC
Date: 03-07-2007
DOI: 10.1007/S00439-007-0391-6
Abstract: Recently it has been reported that recombination hotspots appear to be highly variable between humans and chimpanzees, and there is evidence for between-person variability in hotspots, and evolutionary transience. To understand the nature of variation in human recombination rates, it is important to describe patterns of variability across populations. Direct measurement of recombination rates remains infeasible on a large scale, and population-genetic approaches can be imprecise, and are affected by demographic history. Reports to date have suggested broad similarity in recombination rates at large genomic scales and across human populations. Here, we examine recombination rate estimates at a finer population and genomic scale: 28 worldwide populations and 107 SNPs in a 1 Mb stretch of chromosome 22q. We employ analysis of variance of recombination rate estimates, corrected for differences in effective population size using genome-wide microsatellite mutation rate estimates. We find substantial variation in fine-scale rates between populations, but reduced variation within continental groups. All effects examined (SNP-pair, region, population and interactions) were highly significant. Adjustment for effective population size made little difference to the conclusions. Observed hotspots tended to be conserved across populations, albeit at varying intensities. This holds particularly for populations from the same region, and also to a considerable degree across geographical regions. However, some hotspots appear to be population-specific. Several results from studies on the population history of humans are in accordance with our analysis. Our results suggest that between-population variation in DNA sequences may underly recombination rate variation.
Publisher: Wiley
Date: 24-06-2015
Publisher: Springer Science and Business Media LLC
Date: 29-08-2011
DOI: 10.1038/NG.918
Publisher: Elsevier BV
Date: 12-2014
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 03-2013
Publisher: The Royal Society
Date: 05-10-2011
Publisher: Cold Spring Harbor Laboratory
Date: 06-08-2023
DOI: 10.1101/2023.08.04.551407
Abstract: Population genomics has revolutionised our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that the majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.
Publisher: Springer Science and Business Media LLC
Date: 06-1998
DOI: 10.1046/J.1365-2540.1998.00360.X
Abstract: Many well-established statistical methods in genetics were developed in a climate of severe constraints on computational power. Recent advances in simulation methodology now bring modern, flexible statistical methods within the reach of scientists having access to a desktop workstation. We illustrate the potential advantages now available by considering the problem of assessing departures from Hardy-Weinberg (HW) equilibrium. Several hypothesis tests of HW have been established, as well as a variety of point estimation methods for the parameter which measures departures from HW under the inbreeding model. We propose a computational, Bayesian method for assessing departures from HW, which has a number of important advantages over existing approaches. The method incorporates the effects-of uncertainty about the nuisance parameters--the allele frequencies--as well as the boundary constraints on f (which are functions of the nuisance parameters). Results are naturally presented visually, exploiting the graphics capabilities of modern computer environments to allow straightforward interpretation. Perhaps most importantly, the method is founded on a flexible, likelihood-based modelling framework, which can incorporate the inbreeding model if appropriate, but also allows the assumptions of the model to he investigated and, if necessary, relaxed. Under appropriate conditions, information can be shared across loci and, possibly, across populations, leading to more precise estimation. The advantages of the method are illustrated by application both to simulated data and to data analysed by alternative methods in the recent literature.
Publisher: Wiley
Date: 23-02-2004
DOI: 10.1111/J.1365-294X.2004.02125.X
Abstract: The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of FST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of FST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of FST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 05-2015
Publisher: Elsevier BV
Date: 05-2004
DOI: 10.1086/420773
Publisher: Public Library of Science (PLoS)
Date: 02-09-2016
Publisher: Springer Science and Business Media LLC
Date: 02-10-2014
DOI: 10.1007/S00122-014-2403-Y
Abstract: We show the application of association mapping and genomic selection for key breeding targets using a large panel of elite winter wheat varieties and a large volume of agronomic data. The heightening urgency to increase wheat production in line with the needs of a growing population, and in the face of climatic uncertainty, mean new approaches, including association mapping (AM) and genomic selection (GS) need to be validated and applied in wheat breeding. Key adaptive responses are the cornerstone of regional breeding. There is evidence that new ideotypes for long-standing traits such as flowering time may be required. In order to detect targets for future marker-assisted improvement and validate the practical application of GS for wheat breeding we genotyped 376 elite wheat varieties with 3,046 DArT, single nucleotide polymorphism and gene markers and measured seven traits in replicated yield trials over 2 years in France, Germany and the UK. The scale of the phenotyping exceeds the breadth of previous AM and GS studies in these key economic wheat production regions of Northern Europe. Mixed-linear modelling (MLM) detected significant marker-trait associations across and within regions. Genomic prediction using elastic net gave low to high prediction accuracies depending on the trait, and could be experimentally increased by modifying the constituents of the training population (TP). We also tested the use of differentially penalised regression to integrate candidate gene and genome-wide markers to predict traits, demonstrating the validity and simplicity of this approach. Overall, our results suggest that whilst AM offers potential for application in both research and breeding, GS represents an exciting opportunity to select key traits, and that optimisation of the TP is crucial to its successful implementation.
Publisher: Oxford University Press (OUP)
Date: 25-07-2014
DOI: 10.1093/BRAIN/AWU206
Publisher: Public Library of Science (PLoS)
Date: 04-08-2010
Publisher: Cold Spring Harbor Laboratory
Date: 10-07-2008
Abstract: In recent years, there have been major developments of population genetics methods to estimate both rates of recombination and levels of natural selection. However, genomic variants subject to positive selection are likely to have arisen recently and, consequently, had less opportunity to be affected by recombination. Thus, the two processes have an intimately related impact on genetic variation, and inference of either may be vulnerable to confounding by the other. We illustrate here that even modest levels of positive selection can substantially reduce population-based recombination rate estimates. We also show that genome-wide scans to detect loci under recent selection in humans have tended to highlight loci in regions of low recombination, suggesting that confounding by recombination rate may have reduced the power of these studies. Motivated by these findings, we introduce a new genome-wide approach for detecting selection, based on the ratio of pedigree-based to population-based estimates of recombination rate. Simulations suggest that our “Ped/Pop” method, which is designed to capture completed sweeps, has good power to discriminate between neutral and adaptive evolution. Unusually for a multimarker method, our approach performs well in regions of high recombination and also has good power for many generations after the fixation of an advantageous variant. We apply the method to human HapMap and Perlegen data sets, finding confirmation of reported candidates as well as identifying new loci that may have undergone recent intense selection.
Publisher: Springer Science and Business Media LLC
Date: 06-2005
Publisher: Elsevier BV
Date: 12-2012
Publisher: Elsevier BV
Date: 03-2002
DOI: 10.1086/339271
Publisher: Springer Science and Business Media LLC
Date: 30-05-2010
DOI: 10.1038/NMETH.1466
Abstract: Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple s les and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French in iduals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more lified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.
Publisher: Elsevier BV
Date: 12-2009
DOI: 10.1016/J.FSIGEN.2009.03.003
Abstract: We discuss the interpretation of DNA profiles obtained from low template DNA s les. The most important challenge to interpretation in this setting arises when either or both of "drop-out" and "drop-in" create discordances between the crime scene DNA profile and the DNA profile expected under the prosecution allegation. Stutter and unbalanced peak heights are also problematic, in addition to the effects of masking from the profile of a known contributor. We outline a framework for assessing such evidence, based on likelihood ratios that involve drop-out and drop-in probabilities, and apply it to two casework ex les. Our framework extends previous work, including new approaches to modelling homozygote drop-out and uncertainty in allele calls for stutter, masking and near-threshold peaks. We show that some current approaches to interpretation, such as ignoring a discrepant locus or reporting a "Random Man Not Excluded" (RMNE) probability, can be systematically unfair to defendants, sometimes extremely so. We also show that the LR can depend strongly on the assumed value for the drop-out probability, and there is typically no approximation that is useful for all values. We illustrate that ignoring the possibility of drop-in is usually unfair to defendants, and argue that under circumstances in which the prosecution relies on drop-out, it may be unsatisfactory to ignore any possibility of drop-in.
Publisher: Springer Science and Business Media LLC
Date: 09-08-2008
Publisher: Oxford University Press (OUP)
Date: 04-2001
DOI: 10.1093/OXFORDJOURNALS.MOLBEV.A003827
Abstract: Most evolutionary tree estimation methods for DNA sequences ignore or inefficiently use the phylogenetic information contained within shared patterns of gaps. This is largely due to the computational difficulties in implementing models for insertions and deletions. A simple way to incorporate this information is to treat a gap as a fifth character (with the four nucleotides being the other four) and to incorporate it within a Markov model of nucleotide substitution. This idea has been dismissed in the past, since it treats a multiple-site insertion or deletion as a sequence of independent events rather than a single event. While this is true, we have found that under many circumstances it is better to incorporate gap information inadequately than to ignore it, at least for topology estimation. We propose an extension to a class of nucleotide substitution models to incorporate the gap character and show that, for data sets (both real and simulated) with short and medium gaps, these models do lead to effective use of the information contained within insertions and deletions. We also implement an ad hoc method in which the likelihood at columns containing multiple-site gaps is downweighted in order to avoid giving them undue influence. The precision of the estimated tree, assessed using Markov chain Monte Carlo techniques to find the posterior distribution over tree space, improves under these five-state models compared with standard methods which effectively ignore gaps.
Publisher: Wiley
Date: 06-07-2010
Publisher: Springer Science and Business Media LLC
Date: 18-01-2009
DOI: 10.1038/NG.301
Abstract: We analyzed genome-wide association data from 1,380 Europeans with early-onset and morbid adult obesity and 1,416 age-matched normal-weight controls. Thirty-eight markers showing strong association were further evaluated in 14,186 European subjects. In addition to FTO and MC4R, we detected significant association of obesity with three new risk loci in NPC1 (endosomal/lysosomal Niemann-Pick C1 gene, P = 2.9 x 10(-7)), near MAF (encoding the transcription factor c-MAF, P = 3.8 x 10(-13)) and near PTER (phosphotriesterase-related gene, P = 2.1 x 10(-7)).
Publisher: Springer Science and Business Media LLC
Date: 02-12-2014
DOI: 10.1038/NCOMMS6631
Abstract: In 2012, a skeleton was excavated at the presumed site of the Grey Friars friary in Leicester, the last-known resting place of King Richard III. Archaeological, osteological and radiocarbon dating data were consistent with these being his remains. Here we report DNA analyses of both the skeletal remains and living relatives of Richard III. We find a perfect mitochondrial DNA match between the sequence obtained from the remains and one living relative, and a single-base substitution when compared with a second relative. Y-chromosome haplotypes from male-line relatives and the remains do not match, which could be attributed to a false-paternity event occurring in any of the intervening generations. DNA-predicted hair and eye colour are consistent with Richard’s appearance in an early portrait. We calculate likelihood ratios for the non-genetic and genetic data separately, and combined, and conclude that the evidence for the remains being those of Richard III is overwhelming.
Publisher: MDPI AG
Date: 02-11-2018
Abstract: Direct-to-consumer genetic ancestry testing is a new and growing industry that has gained widespread media coverage and public interest. Its scientific base is in the fields of population and evolutionary genetics and it has benefitted considerably from recent advances in rapid and cost-effective DNA typing technologies. There is a considerable body of scientific literature on the use of genetic data to make inferences about human population history, although publications on inferring the ancestry of specific in iduals are rarer. Population geneticists have questioned the scientific validity of some population history inference approaches, particularly those of a more interpretative nature. These controversies have spilled over into commercial genetic ancestry testing, with some companies making sensational claims about their products. One such company—BritainsDNA—made a number of dubious claims both directly to its customers and in the media. Here we outline our scientific concerns, document the exchanges between us, BritainsDNA and the BBC, and discuss the issues raised about media promotion of commercial enterprises, academic freedom of expression, science and pseudoscience and the genetic ancestry testing industry. We provide a detailed account of this case as a resource for historians and sociologists of science, and to shape public understanding, media reporting and scientific scrutiny of the commercial use of population and evolutionary genetics.
Publisher: Oxford University Press (OUP)
Date: 06-05-2003
Abstract: We develop a flexible class of Metropolis–Hastings algorithms for drawing inferences about population histories and mutation rates from deoxyribonucleic acid (DNA) sequence data. Match probabilities for use in forensic identification are also obtained, which is particularly useful for mitochondrial DNA profiles. Our data augmentation approach, in which the ancestral DNA data are inferred at each node of the genealogical tree, simplifies likelihood calculations and permits a wide class of mutation models to be employed, so that many different types of DNA sequence data can be analysed within our framework. Moreover, simpler likelihood calculations imply greater freedom for generating tree proposals, so that algorithms with good mixing properties can be implemented. We incorporate the effects of demography by means of simple mechanisms for changes in population size and structure, and we estimate the corresponding demographic parameters, but we do not here allow for the effects of either recombination or selection. We illustrate our methods by application to four human DNA data sets, consisting of DNA sequences, short tandem repeat loci, single-nucleotide polymorphism sites and insertion sites. Two of the data sets are drawn from the male-specific Y-chromosome, one from maternally inherited mitochondrial DNA and one from the β-globin locus on chromosome 11.
Publisher: Cold Spring Harbor Laboratory
Date: 09-2017
Abstract: Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large s les of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions ( P 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test s le of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score Mann-Whitney U test, P 1 × 10 −16 ). We focus on the application to epilepsy genes however, the framework is applicable to disease genes beyond epilepsy.
Publisher: Elsevier BV
Date: 06-2006
DOI: 10.1016/J.FORSCIINT.2005.07.007
Abstract: Given the DNA profiles of two in iduals and one parent (say the mother) of each, we present likelihood ratios (LRs) comparing the hypothesis that they have the same father with the hypothesis of unrelated fathers. If the in iduals have the same mother, the problem is to distinguish full- from half-siblings, otherwise we are comparing a half-sibling relationship with unrelated. We simulate STR profiles at up to 60 loci, based on allele proportions observed at 15 loci in three populations, and use them to approximate misclassification rates both for binary classification (e.g. "half-sib" versus "unrelated"), and when a third "cannot say" category is included. We find that reliable inferences in the absence of the mothers' profiles require many more STR loci than the 10-25 loci that are currently routinely available. However, profiling the two mothers conveys more discriminatory power than profiling the same number of additional loci in the in iduals themselves. Our likelihood ratio formulas include a theta (or Fst) adjustment to allow for the in iduals concerned to have recent shared ancestry (coancestry), relative to the population from which the allele frequency database is drawn. We illustrate that using an appropriate value of theta can reduce the average misclassification rate.
Publisher: Oxford University Press (OUP)
Date: 05-2008
DOI: 10.1534/GENETICS.107.084954
Abstract: Dogs are of increasing interest as models for human diseases, and many canine population-association studies are beginning to emerge. The choice of breeds for such studies should be informed by a knowledge of factors such as inbreeding, genetic ersity, and population structure, which are likely to depend on breed-specific selective breeding patterns. To address the lack of such studies we have exploited one of the world's most extensive resources for canine population-genetics studies: the United Kingdom (UK) Kennel Club registration database. We chose 10 representative breeds and analyzed their pedigrees since electronic records were established around 1970, corresponding to about eight generations before present. We find extremely inbred dogs in each breed except the greyhound and estimate an inbreeding effective population size between 40 and 80 for all but 2 breeds. For all but 3 breeds, & % of unique genetic variants are lost over six generations, indicating a dramatic effect of breeding patterns on genetic ersity. We introduce a novel index Ψ for measuring population structure directly from the pedigree and use it to identify subpopulations in several breeds. As well as informing the design of canine population genetics studies, our results have implications for breeding practices to enhance canine welfare.
Publisher: Oxford University Press (OUP)
Date: 09-2014
DOI: 10.1534/GENETICS.114.165704
Abstract: Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each in idual that will be genotyped. Modeling traits in idually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this article we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP) and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large s le size result in predictive models with good power and limited confounding due to relatedness.
Publisher: Massachusetts Medical Society
Date: 06-01-2011
Publisher: Wiley
Date: 30-12-2009
DOI: 10.1002/ART.24138
Abstract: The HLA-DRB1 locus within the major histocompatibility complex (MHC) at 6p21.3 has been identified as a susceptibility gene for rheumatoid arthritis (RA) however, there is increasing evidence of additional susceptibility genes in the MHC region. The aim of this study was to estimate their number and location. A case-control study was performed involving 977 control subjects and 855 RA patients. The HLA-DRB1 locus was genotyped together with 2,360 single-nucleotide polymorphisms in the MHC region. Logistic regression was used to detect DRB1-independent effects. After adjusting for the effect of HLA-DRB1, 18 markers in 14 genes were strongly associated with RA (P<10(-4)). Multivariate logistic regression analysis of these markers and DRB1 led to a model containing DRB1 plus the following 3 markers: rs4678, a nonsynonymous change in the VARS2L locus, approximately 1.7 Mb telomeric of DRB1 rs2442728, upstream of HLA-B, approximately 1.2 Mb telomeric of DRB1 and rs17499655, located in the 5'-untranslated region of DQA2, only 0.1 Mb centromeric of DRB1. In-depth investigation of the DQA2 association, however, suggested that it arose through cryptic linkage disequilibrium with an allele of DRB1. Two non-shared epitope alleles were also strongly associated with RA (P<10(-4)): *0301 with anti- cyclic citrullinated peptide-negative RA and *0701 independently of autoantibody status. These results confirm the polygenic contribution of the MHC to RA and implicate 2 additional non-DRB1 susceptibility loci. The role of the HLA-DQ locus in RA has been a subject of controversy, but in our data, it appears to be spurious.
Publisher: Springer Science and Business Media LLC
Date: 12-07-2011
DOI: 10.1038/NRG3000
Publisher: Cold Spring Harbor Laboratory
Date: 10-12-2020
DOI: 10.1101/2020.12.09.415901
Abstract: We report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approaches, (ii) The inclusion of non-genetic predictors, continental genetic ancestry and pigmentation SNPs in the prediction models, and (iii) Compared two sets of pigmentation SNPs: the commonly-used HIrisPlex-S set (developed in Europeans) and novel SNP sets we defined here based on genome-wide association results in the CANDELA s le. We find that Random Forest or regression are globally the best performing methods. Although continental genetic ancestry has substantial power for prediction of pigmentation in Latin Americans, the inclusion of pigmentation SNPs increases prediction accuracy considerably, particularly for skin color. For hair and eye color, HIrisPlex-S has a similar performance to the CANDELA-specific prediction SNP sets. However, for skin pigmentation the performance of HIrisPlex-S is markedly lower than the SNP set defined here, including predictions in an independent dataset of Native American data. These results reflect the relatively high variation in hair and eye color among Europeans for whom HIrisPlex-S was developed, whereas their variation in skin pigmentation is comparatively lower. Furthermore, we show that the dataset used in the training of prediction models strongly impacts on the portability of these models across Europeans and Native Americans.
Publisher: Cold Spring Harbor Laboratory
Date: 09-09-2016
DOI: 10.1101/074310
Abstract: SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but the assumptions in current use have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency, linkage disequilibrium and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (SD 3) higher than those obtained from the widely-used software GCTA, and 25% (SD 2) higher than those from the recently-proposed extension GCTA-LDMS. Previously, DNaseI hypersensitivity sites were reported to explain 79% of SNP heritability using our improved heritability model their estimated contribution is only 24%.
Publisher: Springer Science and Business Media LLC
Date: 06-1995
DOI: 10.1007/BF01441146
Publisher: Institute of Mathematical Statistics
Date: 11-2009
DOI: 10.1214/09-STS307
Publisher: Public Library of Science (PLoS)
Date: 09-07-2010
Publisher: Public Library of Science (PLoS)
Date: 08-10-2013
Publisher: Elsevier BV
Date: 1988
Publisher: Elsevier BV
Date: 05-2003
DOI: 10.1016/S0040-5809(03)00007-8
Abstract: We review Wright's original definitions of the genetic correlation coefficients F(ST), F(IT), and F(IS), pointing out ambiguities and the difficulties that these have generated. We also briefly survey some subsequent approaches to defining and estimating the coefficients. We then propose a general framework in which the coefficients are defined, their properties established, and likelihood-based inference implemented. Likelihood methods of inference are proposed both for bi-allelic and multi-allelic loci, within a hierarchical model which allows sharing of information both across subpopulations and across loci, but without assuming constancy in either case. This framework can be used, for ex le, to detect environment-related ersifying selection.
Publisher: Elsevier BV
Date: 04-1996
Publisher: Springer Science and Business Media LLC
Date: 12-2008
Abstract: The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21 Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each in idual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a s ling algorithm to infer haplotypes jointly in multiple in iduals and to obtain a measure of uncertainty in its inferences. In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Publisher: Elsevier BV
Date: 2019
DOI: 10.1016/J.FSIGEN.2018.10.004
Abstract: We recently introduced a new approach to the evaluation of weight of evidence (WoE) for Y-chromosome profiles. Rather than attempting to calculate match probabilities, which is particularly problematic for modern Y-profiles with high mutation rates, we proposed using simulation to describe the distribution of the number of males in the population with a matching Y-profile, both the unconditional distribution and conditional on a database frequency of the profile. Here we further validate the new approach by showing that our results are robust to assumptions about the allelic ladder and the founder haplotypes, and we extend the approach in two important directions. Firstly, forensic databases are not the only source of background data relevant to the evaluation of Y-profile evidence: in many cases the Y-profiles of one or more relatives of the accused are also available. To date it has been unclear how to use this additional information, but in our simulation-based approach its effect is readily incorporated. We describe this approach and illustrate how the WoE that a man was the source of an observed Y-profile changes when the Y-profiles of some of his male-line relatives are also available. Secondly, we extend our new approach to mixtures of Y-profiles from two or more males. Surprisingly, our simulation-based approach reveals that observing a 2-male mixture that includes an alleged contributor's profile is almost as strong evidence as observing a matching single-contributor evidence s le, and even 3-male and 4-male mixtures are only slightly weaker.
Publisher: Cold Spring Harbor Laboratory
Date: 04-10-2021
DOI: 10.1101/2021.10.04.462983
Abstract: Advances in whole-genome genotyping and sequencing have allowed genome-wide analyses of association, prediction and heritability in many organisms. However, the application of such analyses to bacteria is still in its infancy, being limited by difficulties including the plasticity of bacterial genomes and their strong population structure. Here we propose, and validate using simulations, a suite of genome-wide analyses for bacteria. We combine methods from human genetics and previous bacterial studies, including linear mixed models, elastic net and LD-score regression, and introduce innovations such as frequency-based allele coding, testing for both insertion/deletion and nucleotide effects and partitioning heritability by genome region. We then analyse three phenotypes of a major human pathogen Streptococcus pneumoniae , including the first analyses of minimum inhibitory concentrations (MIC) for each of two antibiotics, penicillin and ceftriaxone. We show that these are highly heritable leading to high prediction accuracy, which is explained by many genetic associations identified under good control of population structure effects. In the case of ceftriaxone MIC, these results are surprising because none of the isolates was resistant according to the inhibition zone diameter threshold. We estimate that just over half of the heritability of penicillin MIC is explained by a known drug-resistance region, which also contributes around a quarter of the heritability of ceftriaxone MIC. For the within-host survival phenotype carriage duration, no reliable associations were found but we observed moderate heritability and prediction accuracy, indicating a polygenic trait. While generating important new results for S. pneumoniae , we have critically assessed existing methods and introduced innovations that will be useful for future large-scale population genomics studies to help decipher the genetic architecture of bacterial traits. Genome-wide association, prediction and heritability analyses in bacteria are beginning to help unravel the genetic underpinnings of traits such as antimicrobial resistance, virulence, within-host survival and transmissibility. Progress to date is limited by challenges including the effects of strong population structure and variable recombination, and the many gaps in sequence alignments including the absence of entire genes in many isolates. More work is required to critically asses and develop methods for bacterial genomics. We address this task here, using a range of existing methods from bacterial and human genetics, such as linear mixed models, elastic net and LD-score regression. Using simulations, we first validate and then adapt these methods to introduce new analyses, including separate assessment of gap and nucleotide effects, a new allele coding for association analyses and a method to partition heritability into genome regions. We analyse within-host survival and two antimicrobial response traits of Streptococcus pneumoniae , identifying many novel associations while demonstrating good control of population structure and accurate prediction. We present both new results for an important pathogen and methodological advances that will be useful in guiding future studies in bacterial population genomics.
Publisher: Wiley
Date: 29-07-2001
Publisher: Springer Science and Business Media LLC
Date: 03-12-2018
Publisher: Public Library of Science (PLoS)
Date: 03-11-2017
Publisher: Elsevier BV
Date: 03-2017
Publisher: Springer Science and Business Media LLC
Date: 22-05-2017
DOI: 10.1038/NG.3865
Publisher: Wiley
Date: 29-06-2009
DOI: 10.1002/ART.24808
Publisher: Oxford University Press (OUP)
Date: 30-12-2011
DOI: 10.1093/HMG/DDR607
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Start Date: 06-2021
End Date: 03-2025
Amount: $405,816.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2019
End Date: 11-2022
Amount: $410,000.00
Funder: Australian Research Council
View Funded Activity