Database search engines and target database features impinge upon the identification of post-translationally cis-spliced peptides in HLA class I immunopeptidomes.

Michele MishtoYehor HorokhovskyiJohn A CormicanXiaoping YangSteven LynhamHenning UrlaubJuliane Liepe

Published in: Proteomics (2022)

Unconventional epitopes presented by HLA class I complexes are emerging targets for T cell targeted immunotherapies. Their identification by mass spectrometry (MS) required development of novel methods to cope with the large number of theoretical candidates. Methods to identify post-translationally spliced peptides led to a broad range of outcomes. We here investigated the impact of three common database search engines - that is, Mascot, Mascot+Percolator, and PEAKS DB - as final identification step, as well as the features of target database on the ability to correctly identify non-spliced and cis-spliced peptides. We used ground truth datasets measured by MS to benchmark methods' performance and extended the analysis to HLA class I immunopeptidomes. PEAKS DB showed better precision and recall of cis-spliced peptides and larger number of identified peptides in HLA class I immunopeptidomes than the other search engine strategies. The better performance of PEAKS DB appears to result from better discrimination between target and decoy hits and hence a more robust FDR estimation, and seems independent to peptide and spectrum features here investigated.

Keyphrases