A fortune cookie problem: A test for nominal data whether two samples are from the same population of equally likely elements.

Jiangtao GouKaren RuthStanley BasickesSamuel Litwin

Published in: Communications in statistics: theory and methods (2022)

This article considers a way to test the hypothesis that two collections of objects are from the same uniform distribution of such objects. The exact p -value is calculated based on the distribution for the observed overlaps. In addition, an interval estimate of the number of distinct objects, when all objects are equally likely, is indicated.

Keyphrases

electronic health record
big data
density functional theory
machine learning