As you can see, the benchmarking utilities are a collection of command line scripts that form a UNIX pipeline. To download it and view documentation, just check out the repository. The tool is currently quite basic, and its minimal documentation assumes you're running Linux and know how to install python modules. If anyone has trouble or would like to see a feature, just open an issue on the Github repository and I'll try to respond promptly.
In the example above, the score of 0.36 indicates that dorm attribute can be inferred from the communities detected by the Louvain method with an accuracy of 36%. In other words, the score is between zero and one, where higher is better. Note that a perfect community detection algorithm might not return a score of one--an algorithm's score is only meaningful when compared with the scores of other algorithms. The benchmark is based on the arguments presented in this paper, which Pádraig Cunningham and I recently published in the Journal of Complex Networks. See the paper for details---this previous post also roughly outlines the benchmark and the reasoning behind it.
The utility can benchmark an algorithm on 100 Facebook networks (for more on the Facebook100 dataset, see this post), and it's basically what I used to benchmark a few algorithms in the paper I already mentioned -- using the tool, I created tables like this one:
|Some results from the Facebook100 benchmark presented here. These results are presented in this paper.|
The field of community detection in social networks suffers from a lack of benchmarks that can tell us which algorithms work on a given type of data, and which don't. This problem afflicts the much larger field field of clustering, and indeed, of unsupervised machine learning. The lack of benchmarks is only partly caused by the fact that the community detection problem is not precisely defined---it's also just not a nice problem to work on, which means it gets little attention.
In my experience, creating new algorithms is interesting, creative, and fun. Benchmarking them is often an afterthought, a pain-in-the-ass task crammed into a short period of time before a paper submission deadline. Writing a whole paper that focuses on the messy, imperfect problem of measuring how well an algorithm works on real data was a rather boring and frustrating, but my hope is that it will spare others from this task, and at the same time improve our knowledge of which algorithms work well on Facebook-like data.
Note that in order to run the benchmark, you'll have to download the Facebook data separately--you can find it at archive.org.