Protein sequence landscapes are not so simple: on reference-free versus reference-based inference.

Thomas DupicAngela M PhillipsMichael M Desai

Published in: bioRxiv : the preprint server for biology (2024)

In a recent preprint, Park, Metzger, and Thornton reanalyze 20 empirical protein sequence-function landscapes using a "reference-free analysis" (RFA) method they recently developed. They argue that these empirical landscapes are simpler and less epistatic than earlier work suggested, and attribute the difference to limitations of the methods used in the original analyses of these landscapes, which they claim are more sensitive to measurement noise, missing data, and other artifacts. Here, we show that these claims are incorrect. Instead, we find that the RFA method introduced by Park et al. is exactly equivalent to the reference-based least-squares methods used in the original analysis of many of these empirical landscapes (and also equivalent to a Hadamard-based approach they implement). Because the reanalyzed and original landscapes are in fact identical, the different conclusions drawn by Park et al. instead reflect different interpretations of the parameters describing the inferred landscapes; we argue that these do not support the conclusion that epistasis plays only a small role in protein sequence-function landscapes.

Keyphrases