Polygenic risk scores (PRSs) derived from multi-ancestry genome-wide association studies (GWASs) outperform PRSs based on single-ancestry GWASs in underrepresented populations, new research shows.
PRSs, which comprise genome-wide data affecting susceptibility to an inherited disease or trait, are useful tools for disease risk prediction. The investigators noted that while the cohorts for PRSs have recently become more diverse, most data come primarily from populations with European ancestry.
In a paper published in Cell Genomics, researchers from the Broad Institute of MIT and Harvard and colleagues investigated factors affecting the predictive performance of PRSs in diverse populations using a combination of large-scale population genetic simulations and empirical meta-analyses to compare scores derived from multi-ancestry GWASs with single-ancestry GWASs.
For their study, they used real genomic data from the BioBank Japan and UK Biobank “across traits exhibiting distinct genetic architectures.” They performed meta-analyses of GWASs from populations of European ancestry and GWASs from minority populations by varying the ratios of sample sizes to mimic multi-ancestry GWASs with varying ancestry compositions, with a particular focus on East Asian and African populations.
Their findings showed that PRSs derived from multi-ancestry GWASs outperformed PRSs from a single-ancestry GWAS in understudied populations overall in predictive accuracy. Moreover, their analysis demonstrated that meta-analyzing datasets from diverse ancestral groups improved PRSs accuracy more than linearly combining PRSs. Furthermore, accounting for local ancestry enhanced the predictive performance of PRSs in understudied populations.
Among the limitations the authors noted was that their analysis was limited in scope to common variants, and that their study focused only on selected methods. They suggest future studies that build on their work, stressing “the necessity of diversifying not only the ancestry but also the phenotypic spectrum when collecting genomic data from global populations.”