benchmark: counting uniques

16 Mar 2015 - New York

While working on a Rosalind exercise recently, I came upon an interesting solution for a counting-uniques problem.

A canonical performant solution for a counting-uniques problem is to make a single traversal through the collection of elements and populate a hash table that maps from each unique element to a count of the number of times it appears in the collection.

The hash table then provides constant-time lookup for each count, yielding an $\mathcal{O}(n)$ asymptotic run time.

So at first glance this solution, which simply calls list.count for each unique element in the given list, seemed like a performance-naive one:

def count_uniques(list, elts_to_be_counted):
    return list(map(lambda x: list.count(x), elts_to_be_counted))

This solution has an $\mathcal{O}(m n)$ runtime, where $m$ is the number of unique elements in the list (in this case, passed in via the elts_to_be_counted parameter) and $n$ the number of total elements. (Hence quadratic for a list where each element is unique.)

The naivete was (perhaps) all mine, however. From the accompanying discussion:

Interestingly, this is much faster (10x in my tests) than tracking the counts manually with a dictionary, even though the dictionary solution only iterates through the string once, rather than four times.

and:

count is a low level C method in python so it is bound to be faster than any loop.

Skeptical, I decided to benchmark it myself.

Python’s list.count(x)

Implementation

The following benchmarks our two competing implementations in Python. In the blue corner, we have test_count, which counts uniques using the built-in list.count function. In the red corner, the tersely named test_count_with_ht, which uses a hash table to do the same.

We test each method over 100 trials, each on a 1,000-element list, where each list element is a randomly selected integer.

import benchmark as bm
import random as rand

from collections import Counter
from collections import defaultdict
from sets import Set


class BenchmarkCount(bm.Benchmark):
    each = 100

    def setUp(self):
        self.list_length = 1000
        self.list = [rand.randint(1, 100)] * self.list_length

    def test_count(self):
        uniques = Set(self.list)
        return list(map(lambda x: self.list.count(x), uniques))

    def test_count_with_ht(self):
        counts = defaultdict(lambda: 0)
        for elt in self.list:
            counts[elt] += 1
        return list(map(lambda x: counts[x], counts.keys()))

    def test_count_with_counter(self):
        counts = Counter(self.list)
        return list(map(lambda x: counts[x], counts.keys()))

Results

Sure enough, test_count wins—although not quite with the reported 10x performance advantage, once the cost of generating the list of unique elements is internalized:

              name | rank | runs |      mean |        sd | timesBaseline
 -------------------|------|------|-----------|-----------|--------------
              count |    1 |  100 | 6.971e-05 |  2.83e-05 |           1.0
      count with ht |    2 |  100 | 0.0001138 | 4.082e-05 | 1.63237456821
 count with counter |    3 |  100 | 0.0003266 | 8.942e-05 | 4.68466773829

Still, something feels off about

so it is bound to be faster than any loop

[interpreting charitably: where loop == ‘single-traversal implementation in Python’].

Faster than any loop? Really? I’m not so sure.

The economy gained by peforming computations in C may dominate over some range of $n$, but the asymptotic analysis should obtain eventually.

And indeed it does. Let’s see what happens if we replace

self.list = [rand.randint(1, 100)] * self.list_length

with

self.list = list(range(self.list_length))

that is, if we test the worst possible case, in which each element in the list is unique.

Results: $\mathcal{O}\left(n2\right)$ case

               name | rank | runs |      mean |        sd | timesBaseline
 -------------------|------|------|-----------|-----------|--------------
      count with ht |    1 |  100 | 0.0004774 | 0.0001103 |           1.0
 count with counter |    2 |  100 |  0.000598 | 0.0001219 | 1.25244197195
              count |    3 |  100 |   0.01856 |   0.00281 | 38.8764132476

Here, the hash table and Counter implementations perform markedly better, with an almost 40x performance improvement. As expected, there’s a cross-over point at which, if the input is bad enough, the practical performance advantage reaped by letting C do the counting (even if with an inefficient algorithm) ceases to be a net performance gain.

Ruby’s Enumerable#count

Out of curiosity, I wanted to see if we’d see a similar effect in Ruby. Enumberable#count is implemented in C, after all. Does it too exhibit this pattern?

Implementation

require 'benchmark/ips'

def count(list, uniques)
  uniques.map { |entry| list.count(entry) }
end

def count_with_ht(list, uniques)
  counts = list.each_with_object(Hash.new(0)) { |entry, cts| cts[entry] += 1 }
  uniques.map { |entry| counts[entry] }
end

LENGTH    = 1_000
LIST      = (1..LENGTH).map { rand(100) }.freeze
UNIQUES   = LIST.uniq.freeze
UNIQ_LIST = (1..LENGTH).entries.freeze

Benchmark.ips do |x|
  x.report("with count")      { count(LIST, UNIQUES) }
  x.report("with hash table") { count_with_ht(LIST, UNIQUES) }

  x.report("with count (n^2)")      { count(UNIQ_LIST, UNIQ_LIST) }
  x.report("with hash table (n^2)") { count_with_ht(UNIQ_LIST, UNIQ_LIST) }
end

Results

Not so much:

Calculating -------------------------------------
          with count    295.937  (± 6.8%) i/s -      1.479k
     with hash table      5.419k (± 7.0%) i/s -     27.508k
    with count (n^2)     29.980  (± 6.7%) i/s -    150.000
with hash table (n^2)     2.457k (±23.2%) i/s -     10.800k

In Ruby, the hash table implementation is more performant regardless of whether or not it’s working on a collection of all-unique elements. That’s even after giving the Enumerable#count implementation an advantage by parametrizing the list of unique elements rather than generating it within the method.