AI RESEARCH
MaxSketch: Robust Distinct Counting in Streams via Random Projections
arXiv CS.LG
•
ArXi:2605.15571v1 Announce Type: cross Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only approximately similar -- for example, different images of the same individual may vary significantly at the pixel level. Classical sketches such as HyperLogLog rely on consistent hash values for identical elements and break down in this regime.