Publication
MARS: Improved De Novo Peptide Candidate Selection for Non-Canonical Antigen Target Discovery in Cancer
Publisher:
Research Square Platform LLC
Date:
03-08-2022
DOI:
10.21203/RS.3.RS-1890352/V1
Abstract: Cryptic Human Leukocyte Antigen (HLA)-presented peptide identification from unannotated genome sources is a priority for target antigen discovery for development of next generation immunotherapies in cancer. Current immunopeptidomic approaches utilize the integration of transcriptomics data to inform spectral interpretation, however, recent observations that tumour-associated antigen-encoding RNA levels are often low highlights limitations of such proteogenomic approaches 1 . We here employ a de novo sequencing approach with a refined, MHC-centric analysis strategy to detect non-canonical HLA-associated peptide sequences (HLAp) in cancer without integration of transcript sequence information. Our strategy integrates HLA binding prediction, peptide retention time prediction, and average local confidence scores culminating in the machine learning model MARS (MHC binding prediction, Average Local Confidence Score, and Retention time integration for improved de novo candidate Selection). We demonstrate increased HLA-I peptide identification sensitivity by benchmarking our model against de novo sequencing alone with a large synthetic HLA-I peptide library dataset. We further define the sensitivity of MARS by reanalysis of a published dataset of high-quality non-canonical HLAp identifications in human cancer cell line and tissue datasets and achieve almost 2-fold improvement of the full sequence recall (FSR) for high quality spectral assignments in comparison to de novo sequencing alone 2 . We minimize the false discovery rate (FDR) through a step-wise peptide sequence mapping strategy and are able to expand the reported non-canonical peptide space with an assignment accuracy above 85.7%. Finally, we utilize MARS to detect and validate lncRNA-derived peptides in human cervical tumour resections, demonstrating its suitability to discover novel, non-canonical peptide sequences in primary tumour tissue at reduced FDR, in the absence of transcriptomic sequencing data.