Fennel: Streaming Graph Partitioning for Massive Scale Graphs


Contributions

Graph partitioning is a key problem to enable efficient solving of a wide range of computational tasks and querying over large-scale graph data. In this paper, we contribute to the graph partitioning problem as follows.
  • We introduce a unifying framework for graph partitioning which enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementation.
  • We show that many previously proposed methods are special instances of this framework, we derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant benefits over previous approaches, using a large set of real-world and synthetic graphs.
  • We found its performance to be overall comparable to the de-facto standard offline software METIS, and it even outperforms it on numerous real-world graphs. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 8.5 hours by METIS to produce a balanced partition that cuts 11.98% of edges.
  • Furthermore, modularity--a popular measure for community detection [Girvan and Newman, 2002; Newman and Girvan, 2004; Newman, 2006]--is also a special instance of our framework. We establish the first rigorous approximation algorithm, achieving a guarantee of O(log(k)/k) for partitioning into k clusters.
  • Finally, we evaluate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform, and observe significant gains in terms of the communication cost and runtime.

Publications

  • Fennel: Streaming Graph Partitioning for Massive Scale Graphs

    [Co-authors: Christos Gkantsidis, Bozidar Radunovic, Milan Vojnovic]
    Microsoft Technical Report

Links

Datanami.com