Login / Signup

Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?

Ramón Alain Miranda-QuintanaDávid BajuszAnita RáczKároly Héberger
Published in: Molecular informatics (2021)
Similarity measures are widely used in various areas from taxonomy to cheminformatics. To this end, a large number of similarity and distance measures (or, collectively, comparative measures) have been introduced, with only a few studies directed to revealing their inner relationships. We present a thorough analytical study of the conditions leading to two comparative measures providing equivalent results over a given set of molecules. A key part of this work is the introduction of a novel way to study the consistency between comparative measures: the differential consistency analysis (DCA). This tool reveals how the consistency can be established in an analytical way with minimal (or no) assumptions. We found that the consensus between Tanimoto and the Cosine coefficients improved by choosing a reference whose similarity to the rest of the molecules varies less, or by representing the molecules in a way that does not depend strongly on their size (i. e. bit frequency in the chosen fingerprint representation). The presented derivations are just some generic examples; DCA can be applied widely and for all binary similarity coefficients introduced so far, independently from the molecular representations.
Keyphrases
  • drug discovery
  • mass spectrometry
  • working memory
  • ionic liquid
  • case control