ORCID Profile
0000-0002-3213-704X
Current Organisations
Northumbria University
,
Wageningen University & Research
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: American Dairy Science Association
Date: 03-2014
Abstract: Combining data from research herds may be advantageous, especially for difficult or expensive-to-measure traits (such as dry matter intake). Cows in research herds are often genotyped using low-density single nucleotide polymorphism (SNP) panels. However, the precision of quantitative trait loci detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from Australasia (Australia and New Zealand), and 3 from North America (Canada and the United States). Heifers from the Australian and New Zealand research herds were already genotyped at high density (approximately 700,000 SNP). The remaining genotypes were imputed from around 50,000 SNP to 700,000 using 2 reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were quantified. The European genotypes (n=4,097) were imputed as 1 data set, using a reference population of 3,150 that included genotypes from 835 Australian and 1,053 New Zealand females, with the remainder being males. Imputation was undertaken using population-wide linkage disequilibrium with no family information exploited. The UK animals were also included in the North American data set (n=1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele frequencies of the 2 imputed data sets was high (>0.98) and even stronger (>0.99) for the UK animals that were part of each imputation data set. For the UK genotypes, 2.2% were imputed differently in the 2 high-density reference data sets used. Only 0.025% of these were homozygous switches. The number of discordant SNP was lower for animals that had sires that were genotyped. Discordant imputed SNP genotypes were most common when a large difference existed in allele frequency between the 2 imputed genotype data sets. For SNP that had ≥ 20% discordant genotypes, the difference between imputed data sets of allele frequencies of the UK (imputed) genotypes was 0.07, whereas the difference in allele frequencies of the (reference) high-density genotypes was 0.30. In fact, regions existed across the genome where the frequency of discordant SNP was higher. For ex le, on chromosome 10 (centered on 520,948 bp), 52 SNP (out of a total of 103 SNP) had ≥ 20% discordant SNP. Four hundred and eight SNP had more than 20% discordant genotypes and were removed from the final set of imputed genotypes. We concluded that both discordance of imputed SNP genotypes and differences in allele frequencies, after imputation using different reference data sets, may be used to identify and remove poorly imputed SNP.
Publisher: Springer Science and Business Media LLC
Date: 2014
Publisher: American Dairy Science Association
Date: 10-2012
Abstract: With the aim of increasing the accuracy of genomic estimated breeding values for dry matter intake (DMI) in dairy cattle, data from Australia (AU), the United Kingdom (UK), and the Netherlands (NL) were combined using both single-trait and multi-trait models. In total, DMI records were available on 1,801 animals, including 843 AU growing heifers with records on DMI measured over 60 to 70 d at approximately 200 d of age, and 359 UK and 599 NL lactating heifers with records on DMI during the first 100 d in milk. The genotypes used in this study were obtained from the Illumina Bovine 50K chip (Illumina Inc., San Diego, CA). The AU, UK, and NL genomic data were matched using the single nucleotide polymorphism (SNP) name. Quality controls were applied by carefully comparing the genotypes of 40 bulls that were available in each data set. This resulted in 30,949 SNP being used in the analyses. Genomic predictions were estimated with genomic REML, using ASReml software. The accuracy of genomic prediction was evaluated in 11 validation sets that is, at least 3 validation sets per country were defined. The reference set (in which animals had both DMI phenotypes and genotypes) was either AU or Europe (UK and NL) or a multi-country reference set consisting of all data except the validation set. When DMI for each country was treated as the same trait, use of a multi-country reference set increased the accuracy of genomic prediction for DMI in UK, but not in AU and NL. Extending the model to a bivariate (AU-EU) or trivariate (AU-UK-NL) model increased the accuracy of genomic prediction for DMI in all countries. The highest accuracies were estimated for all countries when data were analyzed with a trivariate model, with increases of up to 5.5% compared with univariate models within countries.
Publisher: American Dairy Science Association
Date: 09-2014
Abstract: Breeding values for dry matter intake (DMI) are important to optimize dairy cattle breeding goals for feed efficiency. However, generally, only small data sets are available for feed intake, due to the cost and difficulty of measuring DMI, which makes understanding the genetic associations between traits across lactation difficult, let alone the possibility for selection of breeding animals. However, estimating national breeding values through cheaper and more easily measured correlated traits, such as milk yield and liveweight (LW), could be a first step to predict DMI. Combining DMI data across historical nutritional experiments might help to expand the data sets. Therefore, the objective was to estimate genetic parameters for DMI, fat- and protein-corrected milk (FPCM) yield, and LW across the entire first lactation using a relatively large data set combining experimental data across the Netherlands. A total of 30,483 weekly records for DMI, 49,977 for FPCM yield, and 31,956 for LW were available from 2,283 Dutch Holstein-Friesian first-parity cows between 1990 and 2011. Heritabilities, covariance components, and genetic correlations were estimated using a multivariate random regression model. The model included an effect for year-season of calving, and polynomials for age of cow at calving and days in milk (DIM). The random effects were experimental treatment, year-month of measurement, and the additive genetic, permanent environmental, and residual term. Additive genetic and permanent environmental effects were modeled using a third-order orthogonal polynomial. Estimated heritabilities ranged from 0.21 to 0.40 for DMI, from 0.20 to 0.43 for FPCM yield, and from 0.25 to 0.48 for LW across DIM. Genetic correlations between DMI at different DIM were relatively low during early and late lactation, compared with mid lactation. The genetic correlations between DMI and FPCM yield varied across DIM. This correlation was negative (up to -0.5) between FPCM yield in early lactation and DMI across the entire lactation, but highly positive (above 0.8) when both traits were in mid lactation. The correlation between DMI and LW was 0.6 during early lactation, but decreased to 0.4 during mid lactation. The highest correlations between FPCM yield and LW (0.3-0.5) were estimated during mid lactation. However, the genetic correlations between DMI and either FPCM yield or LW were not symmetric across DIM, and differed depending on which trait was measured first. The results of our study are useful to understand the genetic relationship of DMI, FPCM yield, and LW on specific days across lactation.
Publisher: Oxford University Press (OUP)
Date: 19-01-2020
DOI: 10.1093/JAS/SKAA019
Abstract: With an increase in the number of animals genotyped there has been a shift from using pedigree relationship matrices (A) to genomic ones. As the use of genomic relationship matrices (G) has increased, new methods to build or approximate G have developed. We investigated whether the way variance components are estimated should reflect these changes. We estimated variance components for maternal sow traits by solving with restricted maximum likelihood, with four methods of calculating the inverse of the relationship matrix. These methods included using just the inverse of A (A−1), combining A−1 and the direct inverse of G (HDIRECT−1), including metafounders (HMETA−1), or combining A−1 with an approximated inverse of G using the algorithm for proven and young animals (HAPY−1). There was a tendency for higher additive genetic variances and lower permanent environmental variances estimated with A−1 compared with the three H−1 methods, which supports that G−1 is better than A−1 at separating genetic and permanent environmental components, due to a better definition of the actual relationships between animals. There were limited or no differences in variance estimates between HDIRECT−1, HMETA−1, and HAPY−1. Importantly, there was limited differences in variance components, repeatability or heritability estimates between methods. Heritabilities ranged between & .01 to 0.04 for stayability after second cycle, and farrowing rate, between 0.08 and 0.15 for litter weight variation, maximum cycle number, total number born, total number still born, and prolonged interval between weaning and first insemination, and between 0.39 and 0.44 for litter birth weight and gestation length. The limited differences in heritabilities suggest that there would be very limited changes to estimated breeding values or ranking of animals across models using the different sets of variance components. It is suggested that variance estimates continue to be made using A−1, however including G−1 is possibly more appropriate if refining the model, for traits that fit a permanent environmental effect.
Publisher: Springer Science and Business Media LLC
Date: 22-03-2023
DOI: 10.1186/S12711-023-00787-1
Abstract: In genomic prediction, it is common to centre the genotypes of single nucleotide polymorphisms based on the allele frequencies in the current population, rather than those in the base generation. The mean breeding value of non-genotyped animals is conditional on the mean performance of genotyped relatives, but can be corrected by fitting the mean performance of genotyped in iduals as a fixed regression. The associated covariate vector has been referred to as a ‘J-factor’, which if fitted as a fixed effect can improve the accuracy and dispersion bias of sire genomic estimated breeding values (GEBV). To date, this has only been performed on populations with a single breed. Here, we investigated whether there was any benefit in fitting a separate J-factor for each breed in a three-way crossbred population, and in using pedigree-based expected or genome-based estimated breed fractions to define the J-factors. For body weight at 7 days, dispersion bias decreased when fitting multiple J-factors, but only with a low proportion of genotyped in iduals with selective genotyping. On average, the mean regression coefficients of validation records on those of GEBV increased with one J-factor compared to none, and further increased with multiple J-factors. However, for body weight at 35 days this was not observed. The accuracy of GEBV remained unchanged regardless of the J-factor method used. Differences between the J-factor methods were limited with correlations approaching 1 for the estimated covariate vector, the estimated coefficients of the regression on the J-factors, and the GEBV. Based on our results and in the particular design analysed here, i.e. all the animals with phenotype are of the same type of crossbreds, fitting a single J-factor should be sufficient, to reduce dispersion bias. Fitting multiple J-factors may reduce dispersion bias further but this depends on the trait and genotyping rate. For the crossbred population analysed, fitting multiple J-factors has no adverse consequences and if this is done, it does not matter if the breed fractions used are based on the pedigree-expectation or the genomic estimates. Finally, when GEBV are estimated from crossbred data, any observed bias can potentially be reduced by including a straightforward regression on actual breed proportions.
Publisher: American Dairy Science Association
Date: 02-2019
Publisher: Springer Science and Business Media LLC
Date: 30-10-2017
Publisher: Springer Science and Business Media LLC
Date: 08-05-2015
Publisher: American Dairy Science Association
Date: 06-2010
Abstract: The objective of this study was to investigate the genetic basis of energy balance (EB) and the potential use of genomic selection to enable EB to be incorporated into selection programs. Energy balance provides an essential link between production and nonproduction traits because both depend on a common source of energy. A small number (527) of Dutch Holstein-Friesian heifers with phenotypes for EB were genotyped. Direct genomic values were predicted for these heifers using a model that included the genotypic information. A polygenic model was also applied to predict estimated breeding values using only pedigree information. A 10-fold cross-validation approach was employed to assess the accuracies of the 2 sets of predicted breeding values by correlating them with phenotypes. Because of the small number of phenotypes, accuracies were relatively low (0.29 for the direct genomic values and 0.21 for the estimated breeding values), where the maximum possible accuracy was the square root of heritability (0.57). Despite this, the genomic model produced breeding values with reliability double that of the breeding values produced by the polygenic model. To increase the accuracy of the genomic breeding values and make it possible to select for EB, measurement and recording of EB would need to improve. The study suggests that it may be possible to select for minimally recorded traits for instance, those measured on experimental farms, using genomic selection. Overall, the study demonstrated that genomic selection could be used to select for EB, confirming its genetic background.
Publisher: American Dairy Science Association
Date: 05-2018
Abstract: Genomic prediction is applicable to in iduals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Publisher: American Dairy Science Association
Date: 11-2013
Abstract: In recent years, it has been shown that not only is the phenotype under genetic control, but also the environmental variance. Very little, however, is known about the genetic architecture of environmental variance. The main objective of this study was to unravel the genetic architecture of the mean and environmental variance of somatic cell score (SCS) by identifying genome-wide associations for mean and environmental variance of SCS in dairy cows and by quantifying the accuracy of genome-wide breeding values. Somatic cell score was used because previous research has shown that the environmental variance of SCS is partly under genetic control and reduction of the variance of SCS by selection is desirable. In this study, we used 37,590 single nucleotide polymorphism (SNP) genotypes and 46,353 test-day records of 1,642 cows at experimental research farms in 4 countries in Europe. We used a genomic relationship matrix in a double hierarchical generalized linear model to estimate genome-wide breeding values and genetic parameters. The estimated mean and environmental variance per cow was used in a Bayesian multi-locus model to identify SNP associated with either the mean or the environmental variance of SCS. Based on the obtained accuracy of genome-wide breeding values, 985 and 541 independent chromosome segments affecting the mean and environmental variance of SCS, respectively, were identified. Using a genomic relationship matrix increased the accuracy of breeding values relative to using a pedigree relationship matrix. In total, 43 SNP were significantly associated with either the mean (22) or the environmental variance of SCS (21). The SNP with the highest Bayes factor was on chromosome 9 (Hapmap31053-BTA-111664) explaining approximately 3% of the genetic variance of the environmental variance of SCS. Other significant SNP explained less than 1% of the genetic variance. It can be concluded that fewer genomic regions affect the environmental variance of SCS than the mean of SCS, but genes with large effects seem to be absent for both traits.
Publisher: Oxford University Press (OUP)
Date: 02-2013
DOI: 10.1534/GENETICS.112.147983
Abstract: The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real ex le data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation in iduals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in ex le simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Publisher: Wiley
Date: 26-06-2020
DOI: 10.1111/JPN.13407
Publisher: Oxford University Press (OUP)
Date: 02-2013
DOI: 10.1534/GENETICS.112.143313
Abstract: Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Publisher: American Dairy Science Association
Date: 09-2015
Abstract: With the aim of increasing the accuracy of genomic estimated breeding values for dry matter intake (DMI) in Holstein-Friesian dairy cattle, data from 10 research herds in Europe, North America, and Australasia were combined. The DMI records were available on 10,701 parity 1 to 5 records from 6,953 cows, as well as on 1,784 growing heifers. Predicted DMI at 70 d in milk was used as the phenotype for the lactating animals, and the average DMI measured during a 60- to 70-d test period at approximately 200 d of age was used as the phenotype for the growing heifers. After editing, there were 583,375 genetic markers obtained from either actual high-density single nucleotide polymorphism (SNP) genotypes or imputed from 54,001 marker SNP genotypes. Genetic correlations between the populations were estimated using genomic REML. The accuracy of genomic prediction was evaluated for the following scenarios: (1) within-country only, by fixing the correlations among populations to zero, (2) using near-unity correlations among populations and assuming the same trait in each population, and (3) a sharing data scenario using estimated genetic correlations among populations. For these 3 scenarios, the data set was ided into 10 sub-populations stratified by progeny group of sires 9 of these sub-populations were used (in turn) for the genomic prediction and the tenth was used for calculation of the accuracy (correlation adjusted for heritability). A fourth scenario to quantify the benefit for countries that do not record DMI was investigated (i.e., having an entire country as the validation population and excluding this country in the development of the genomic predictions). The optimal scenario, which was sharing data, resulted in a mean prediction accuracy of 0.44, ranging from 0.37 (Denmark) to 0.54 (the Netherlands). Assuming near-unity among-country genetic correlations, the mean accuracy of prediction dropped to 0.40, and the mean within-country accuracy was 0.30. If no records were available in a country, the accuracy based on the other populations ranged from 0.23 to 0.53 for the milking cows, but were only 0.03 and 0.19 for Australian and New Zealand heifers, respectively the overall mean prediction accuracy was 0.37. Therefore, there is a benefit in collaboration, because phenotypic information for DMI from other countries can be used to augment the accuracy of genomic evaluations of in idual countries.
Location: United Kingdom of Great Britain and Northern Ireland
No related grants have been discovered for Mario Calus.