Bounter: A Python counter that uses limited memory regardless of data size

Radim | 156 points

The introduction is misleading about what tradeoffs it makes

  >>> from bounter import bounter
  >>> bounts = bounter(size_mb=1)
  >>> bounts.update(str(i) for i in xrange(1000000))
  >>> bounts['100']
  0L
should give 1. The description should be more upfront that you are getting inaccurate answers.

> Bounter implements approximative algorithms using optimized low-level C structures, to avoid the overhead of Python objects.

"approximative algorithms" isn't a commonly used phrase (at least according to Google) and this sentence doesn't say anything about what the actual trade-off is (loss of accuracy in the counts). A possible alternative would be writing to hard disk so what trade-off is made isn't obvious.

I also don't know what the precision and recall percentages in the example table mean.

asrp | 6 years ago

Looks useful! I like the way this library is documented, very well explained.

alonmln | 6 years ago

Is this a variant of Space Saving or Frequent algorithm?

Space Saving: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94....

Frequent: https://stackoverflow.com/questions/3260653/algorithm-to-fin...

And there is a variant with a Sketch function, known as "Filtered Space Saving". I cannot find a working link to the paper, but here is a Golang implementation of it:

https://godoc.org/github.com/dgryski/go-topk

gtrubetskoy | 6 years ago

It would be useful to have load/save (mmap for faster loads).

visarga | 6 years ago

So like an hyperloglog ?

sametmax | 6 years ago

It is good that someone wrote this. It is embarrassing how bad Python's native performance is.

suff | 6 years ago