Looks useful! I like the way this library is documented, very well explained.
alonmln | 6 years ago
Is this a variant of Space Saving or Frequent algorithm?
Space Saving: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94....
Frequent: https://stackoverflow.com/questions/3260653/algorithm-to-fin...
And there is a variant with a Sketch function, known as "Filtered Space Saving". I cannot find a working link to the paper, but here is a Golang implementation of it:
gtrubetskoy | 6 years ago
It would be useful to have load/save (mmap for faster loads).
visarga | 6 years ago
So like an hyperloglog ?
sametmax | 6 years ago
It is good that someone wrote this. It is embarrassing how bad Python's native performance is.
suff | 6 years ago
The introduction is misleading about what tradeoffs it makes
should give 1. The description should be more upfront that you are getting inaccurate answers.> Bounter implements approximative algorithms using optimized low-level C structures, to avoid the overhead of Python objects.
"approximative algorithms" isn't a commonly used phrase (at least according to Google) and this sentence doesn't say anything about what the actual trade-off is (loss of accuracy in the counts). A possible alternative would be writing to hard disk so what trade-off is made isn't obvious.
I also don't know what the precision and recall percentages in the example table mean.