Annotated Protein Database Using Known Cleavage Sites for Rapid Detection of Secreted Proteins.
Dylan J HarneyMark LarancePublished in: Journal of proteome research (2022)
Liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis of secreted proteins has contributed to our understanding of human disease and physiology but is limited by its need for accurate protein database annotation. Common assumptions used in proteomics of perfect protease specificity are inaccurate for secreted proteins, which are cleaved by numerous endogenous proteases. Here, we describe the generation of an optimized protein database that divides proteins into their individual biological chains and peptides to allow fast identification of semi-tryptic peptides from secreted proteins using fully tryptic searches. We applied this biologically annotated database to previously published human plasma proteome data sets containing either DIA or DDA data, using Spectronaut, DIA-NN, MaxDIA, and MaxQuant. Using our annotated database, we greatly reduced search times while achieving similar protein and peptide identifications compared to that obtained from standard approaches using semi-tryptic searches. Furthermore, our database enables the identification of biologically relevant semi-tryptic peptides using data analysis packages that are not capable of semi-tryptic searches. Together, these findings demonstrate that our annotated database is more capable than currently available databases for secreted protein analysis and is particularly useful for large-scale plasma proteome analysis.