Fennel: Streaming Graph Partitioning for Massive Scale Graphs
Contributions
Graph partitioning is a key problem to enable efficient solving of a wide range of computational tasks and
querying over large-scale graph data. In this paper, we contribute to the graph partitioning problem as follows.
- We introduce a unifying framework for graph partitioning which enables a well principled design of scalable,
streaming graph partitioning algorithms that are amenable to distributed implementation.
- We show that many previously proposed methods are special instances of this framework,
we derive a novel one-pass, streaming graph partitioning algorithm and show that it
yields significant benefits over previous approaches,
using a large set of real-world and synthetic graphs.
- We found its performance to be overall comparable to the de-facto standard offline software METIS, and it even outperforms it on
numerous real-world graphs. For instance, for the Twitter graph with more than 1.4 billion of edges,
our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges,
whereas it took more than 8.5 hours by METIS to produce a balanced partition that cuts 11.98% of edges.
-
Furthermore, modularity--a popular measure for community detection [Girvan and Newman, 2002; Newman and Girvan, 2004; Newman, 2006]--is also a special
instance of our framework. We establish the first rigorous approximation algorithm, achieving a guarantee of O(log(k)/k) for partitioning into k clusters.
-
Finally, we evaluate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform, and observe significant gains in terms of the communication cost and runtime.
Publications
-
Fennel: Streaming Graph Partitioning for Massive Scale Graphs
[Co-authors: Christos Gkantsidis, Bozidar Radunovic, Milan Vojnovic]
Microsoft Technical Report
Links
Datanami.com