Information in morphological characters.
Congyu YuQigao JiangzuoEmanuel TschoppHaibing WangMark NorellPublished in: Ecology and evolution (2021)
The construction of morphological character matrices is central to paleontological systematic study, which extracts paleontological information from fossils. Although the word information has been repeatedly mentioned in a wide array of paleontological systematic studies, its meaning has rarely been clarified nor specifically defined. It is important, however, to establish a standard to measure paleontological information because fossils are hardly complete, rendering the recognition of homologous and homoplastic structures difficult. Here, based on information theory, we show the deep connections between paleontological systematic study and communication system engineering. Information is defined as the decrease of uncertainty and it is the information in morphological characters that allows distinguishing operational taxonomic units (OTUs) and reconstructing evolutionary history. We propose that concepts in communication system engineering such as source coding and channel coding, correspond to the construction of diagnostic features and the entire character matrices in paleontological studies. The two coding strategies should be distinguished following typical communication system engineering, because they serve dual purposes. With character matrices from six different vertebrate groups, we analyzed their information properties including source entropy, mutual information, and channel capacity. Estimation of channel capacity shows character saturation of all matrices in transmitting paleontological information, indicating that, due to the presence of noise, oversampling characters not only increases the burden in character scoring, but also may decrease quality of matrices. We further test the use of information entropy, which measures how informative a variable is, as a character weighting criterion in parsimony-based systematic studies. The results show high consistency with existing knowledge with both good resolution and interpretability.