- a measure of an edge's embeddedness in a community
- a threshold value for this measure--edges with an embeddedness value below this threshold are removed from the network.
While collaborating with Bobo Nick and Ulrik Brandes in Konstanz, we stumbled upon an edge filtering method that seems to work quite well on some social network datasets we're interested in such as the Facebook100 dataset, which I've posted on several times. The method extracts what we're calling the Simmelian Backbone; the paper introducing this concept (currently under submission) is available here. It's has been implemented and released in version 2.7 of visone; see this page for details on its use. Let me emphasize that this method is not a community detection algorithm, but rather a way of removing many of the edges from a network in order to make the communities more obvious.
Informal definition
I'll now try to explain the gist of the Simmelian backbone in a high-level, informal way (see the papers for a more exact definition). Let's say we start with an undirected, unweighted network G and the task at hand is to create the Simmelian backbone G'. For each ego, we rank the alters by the number of common neighbors, and take the top ten. Thus, each node n is left with a set of at most ten alters that are most strongly connected to the n, let's call this S_n. Now we perform the final step: we look at all pairs of egos and their most strongly connected alters. For each of these pairs, let's call the ego e and the alter a. We measure the size of the overlap between S_e and S_a and if it's larger than five, then we draw a directed edge in G' from e to a.
The method has two parameters--in the example just presented, the max_ranking parameter was set to 10 and the min_overlap parameter was set to five. In the paper we also discuss a parameter free variant of the method, as well as how to handle the case where the input graph is weighted or directed.
Results on the Caltech Facebook network
As I've mentioned here in the past, it's difficult to visualize the networks in the Facebook100 dataset in such a way that meaningful structure is revealed. The Caltech network is the smallest and makes a good case study because the dormitory attribute (which correspond to houses at Caltech) is thought to be very important for social life, and closely related to community structure. Here's a typical attempt to visualize the Caltech network, in which we've colored nodes according to the house they're in.![]() |
| Caltech Facebook network before edge filtering. Nodes colored by dorm (house). |
If we extract the Simmelian Backbone from this network, community structure pops right out, making for a cleaner visualization. Note the clear relationship between clusters and residential houses.
In the paper, we also extract the Simmelian backbone on a larger network (the University of Chicago Facebook network, with over 200k edges) in which the dorm attribute is known to be relevant. The results are similar to those of the Caltech network: the clusters pop out, and are very closely related to the dorm attribute. After years of struggling to get good visualizations of Facebook networks of this size, it is satisfying to finally see some real progress.
Would the GN method work as well?
Above I mentioned the Girvan Newman method, which iteratively filters out edges based on edge betweenness. We wondered whether that approach would also work well on this network. Here's the result of the Girvan Newman filtering method where modularity is maximized:
The GN method doesn't seem to work as well in a few ways. By the time modularity is maximized, most of the nodes have a degree of zero, which in many settings is undesirable because it makes it unclear which cluster they belong to. Also, the clusters visible in this visualization do not correspond as clearly with the house attribute. One might argue that the problem here is the threshold criterion, which is to maximize modularity. Maybe if we had kept removing edges, the clusters would more closely correspond to the house attribute. We tried this, and it didn't work very well: click here to see the results. Basically the clusters do end up aligning well with the house attribute, but by that point nearly all nodes have been filtered out.
Finally, while the GN method is quite expensive, calculating the Simmelian backbone is a much more efficient operation. The main expense is to enumerate all of the triangles in the network which can be done efficiently in sparse networks--see the paper for more details on computational complexity.
![]() |
| The Simmelian Backbone of the Caltech Facebook network -- "non-community edges" have been removed. Nodes colored by dorm (which are called "houses" at Caltech). Isolated nodes have been removed. |
Would the GN method work as well?
Above I mentioned the Girvan Newman method, which iteratively filters out edges based on edge betweenness. We wondered whether that approach would also work well on this network. Here's the result of the Girvan Newman filtering method where modularity is maximized:
![]() |
| Caltech Facebook network after edges have been removed according to the Girvan Newman method. Again, nodes are colored by house. Isolated nodes have been removed. |
Finally, while the GN method is quite expensive, calculating the Simmelian backbone is a much more efficient operation. The main expense is to enumerate all of the triangles in the network which can be done efficiently in sparse networks--see the paper for more details on computational complexity.
Try it out yourself in visone
As I mentioned above, you can extract the Simmelian Backbone of your own network using visone version 2.7 or greater. The visone implementation is reasonably scalable, meaning you can run it on all of the Facebook100 graphs (some of which contain more than 1 million edges). To load some of the larger graphs, you'll probably have to increase the amount of memory you dedicate to java---see this page for details.


