Large Graph-Mining: Power Tools and a Practitioner's Guide

KDD 2009

a) Friendship network, b) Cheeger Inequality (Combinatorial Laplacian L=D-A) c) Multidimensional Time-Series as a tensor
d) Image Segmentation produced using the Spectral Rounding Algorithm e) Singular Value Decomposition (SVD) - Latent Semantic Analysis (LSA)


Abstract

How to find patterns in large graphs, spanning Giga and Tera bytes? What are the best tools from matrix algebra, and how can they help us solve graph mining problems? These are exactly the goals of this tutorial. Matrix algebra and graph theory can offer powerful tools and theorems, like SVD, spectral analysis, community detection, and more; we single out the most useful tools, we show the intuition behind them, and we give one or more practical settings that each tool performed well. We also cover the emerging map/reduce architecture, and its impact on large graph mining.

Foils

You can download all foils zipped (POWERPOINT-ZIP , PDF-ZIP).
  • Task 0: Introduction (ppt, pdf)
  • Task 1: Node importance (ppt, pdf)
    • SVD
    • HITS
    • PageRank
  • Task 2: Community detection (ppt, pdf)
    • METIS, Spectral partitioning
    • co-clustering, cross-associations
  • Task 3: Recommendations (ppt, pdf)
    • proximity
  • Task 4: Connection sub-graphs (ppt, pdf)
  • Task 5: Mining graphs over time & tensors (ppt, pdf)
    • PARAFAC, Tucker
  • Task 6: Virus/influence propagation (ppt, pdf)
  • Task 7: Spectral Graph Theory (ppt, pdf)
    • Properties of Adjacency Matrix, Laplacian, 
    • Sparsest cut and Cheeger Inequality, Normalized Laplacian
  • Task 8: Tera/peta graph mining: Hadoop (ppt, pdf)
  • Task 9: Conclusions (ppt, pdf)

Target Audience

The target audience are data mining and machine learning professionals who wish to know the most important matrix algebra tools and their applications in large graph mining.

Prerequisites

Computer science background (B.Sc or equivalent); familiarity with undergraduate linear algebra.

Related Tutorials

  1. Graph Mining Techniques and Their Applications Sharma Chakravarthy EDBT '06
  2. Data Mining for Social Network Analysis, Jaideep Srivastava, Nishith Pathak and Sandeep Mane, ICDM '06
  3. Spectral Graph Theory and its Applications, Daniel Spielman, FOCS '07
  4. Research Issues in Protein Location Image Databases, Christos Faloutsos and Bob Murphy SIGMOD '05
  5. Sensor Mining at Work: Principles and a Water Quality Case-Study, C. Faloutsos and Jeanne VanBriesen, KDD '06
  6. Data Mining the Internet, Christos Faloutsos and Michalis Faloutsos, SIGCOMM '02, INFOCOM '03, SIGMETRICS '03
  7. Large graph mining: patterns, tools and case studies, Christos Faloutsos and Hanghang Tong, CIKM '08

References

  1. Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks 30(1-7): 107-117, 1998.
  2. Randy Bryant. Data Intensive Scientific Computing, Tech report. available at http://www.math.cmu.edu/~bryant/pubdir/cmu-cs-07-128.pdf.
  3. Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. Fully Automatic Cross-Associations, KDD 2004, Washington, DC.
  4. Fan R. K. Chung: Spectral Graph Theory (AMS)
  5. Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. Information-theoretic co-clustering. KDD 2003, Washington, DC.
  6. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04 ,San Francisco, CA
  7. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition, SIAM Journal of Computing, 2005.
  8. Chris Godsil and Gordon Royle: Algebraic Graph Theory (Springer)
  9. Jon Kleinberg. Authoritative sources in a hyperlinked environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
  10. Tamara Kolda, Brett Bader, and Joseph Kenny. Higher-order Web link analysis using multilinear algebra, ICDM 2005, Houston, Texas.
  11. Jure Leskovec, Jon Kleinberg and Christos Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005, Chicago, IL. ("Best Research Paper" award).
  12. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, ECML/PKDD 2005, Porto, Portugal.
  13. Jure Leskovec and Christos Faloutsos. Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA
  14. Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie S. Glance, and Matthew Hurst. Patterns of Cascading Behavior in Large Blog Graphs, SDM 2007, Minneapolis, Minnesota.
  15. Bojan Mohar and Svatopluk Poljak: Eigenvalues in Combinatorial Optimization, IMA Preprint Series #939
  16. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8-12, 2007.
  17. Gilbert Strang: Introduction to Applied Mathematics (Wellesley-Cambridge Press)
  18. Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA.
  19. Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007, Minneapolis, Minnesota. ("Best Research Paper" award)
  20. Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. GraphScope: parameter-free mining of large time-evolving graphs, KDD 2007, San Jose, CA.
  21. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong. ("Best Research Paper" award)
  22. Hanghang Tong and Christos Faloutsos. Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA.
  23. Hanghang Tong, Brian Gallagher, Tina Eliassi-Rad, and Christos Faloutsos. Fast best-effort pattern matching in large attributed graphs, KDD 2007, San Jose, CA.
  24. Hanghang Tong, Yehuda Koren, and Christos Faloutsos. Fast direction-aware proximity for graph mining, KDD 2007, San Jose, CA.
  25. Hanghang Tong, Spiros Papadimitriou, Philip S. Yu and Christos Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008, Atlanta, GA.
  26. Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, and Christos Faloutsos. Colibri: Fast Mining of Large Static and Dynamic Graphs, KDD 2008, Las Vegas, NV.
  27. Charalampos Tsourakakis: Fast Counting of Triangles in Large Real Networks, without counting: Algorithms and Laws, ICDM '08
  28. Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos Faloutsos. Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint, SRDS 2003, Florence, Italy.

Instructors



Christos Faloutsos Gary L. Miller Charalampos Tsourakakis


Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, twelve ``best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD;
he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and 10 invited distinguished lectures. His research interests include data mining for streams and graphs, fractals, database performance, and indexing for multimedia and bio-informatics data.

Prof. Gary L. Miller is most famous for his work in algorithm design. He has designed both sequential and parallel algorithms. These new algorithms have appeared in over eighty papers in journals and prestigious conferences. His most recent interest is parallel algorithm design for problems arising from Scientific Computation including parallel algorithms for: partitioners for finite element meshes, parallel preconditioners for iterative linear system solvers, and parallel and sequential algorithms for mesh generation. Computer simulations for highly deformable soft tissues such as individual red blood cells He has been a program committee member on eleven computer science conferences including: the Annual Symposium on Foundations of Computer Science in 1982 1983, 1986, 1990, and 1992, the 1988 VLSI Algorithms and Architectures, the Annual ACM Symposium on Parallel Algorithms and Architectures in 1988, 1993(chair), and 2006 and the Annual ACM Symposium on the Theory of Computation 1996(chair) and 2002. He was the general chair for the annual ACM Symposium on Parallel Algorithms and Architectures conference. He is an ACM Fellow (2002), and he is a co-recipient of the ACM Paris Kanellakis Theory and Practice Award in 2003.

Charalampos Tsourakakis is a Ph.D. student in the School of Computer Science at Carnegie Mellon University supported by a NSF fellowship. He holds a Diploma in Electrical and Diploma Engineering from the National Technical University of Athens. His research interests include spectral graph theory and data mining.