a) Friendship network, b) Cheeger Inequality (Combinatorial Laplacian L=D-A) c) Multidimensional Time-Series as a tensor
d) Image Segmentation produced using the Spectral Rounding Algorithm e) Singular Value Decomposition (SVD) - Latent Semantic Analysis (LSA)
Abstract
How to find patterns in large graphs, spanning Giga and Tera bytes?
What are the best tools from matrix algebra, and how can they help
us solve graph mining problems? These are exactly the goals of this
tutorial. Matrix algebra and graph theory can offer powerful tools
and theorems, like SVD, spectral analysis, community detection, and
more; we single out the most useful tools, we show the intuition
behind them, and we give one or more practical settings that each
tool performed well. We also cover the emerging map/reduce
architecture, and its impact on large graph mining.
Foils
You can download all foils zipped (
POWERPOINT-ZIP ,
PDF-ZIP).
- Task 0: Introduction (ppt, pdf)
- Task 1: Node importance (ppt, pdf)
-
- Task 2: Community detection (ppt, pdf)
-
- METIS, Spectral partitioning
- co-clustering, cross-associations
- Task 3: Recommendations (ppt, pdf)
-
- Task 4: Connection sub-graphs (ppt, pdf)
- Task 5: Mining graphs over time & tensors (ppt, pdf)
-
- Task 6: Virus/influence propagation (ppt, pdf)
- Task 7: Spectral Graph Theory (ppt,
pdf)
-
- Properties of Adjacency Matrix, Laplacian,
- Sparsest cut and Cheeger Inequality, Normalized
Laplacian
- Task 8: Tera/peta graph mining: Hadoop (ppt, pdf)
- Task 9: Conclusions (ppt, pdf)
Target Audience
The target audience are data mining and machine learning
professionals who wish to know the most important matrix algebra
tools and their applications in large graph mining.
Prerequisites
Computer science background (B.Sc or equivalent); familiarity with
undergraduate linear algebra.
Related Tutorials
- Graph Mining Techniques and Their Applications Sharma Chakravarthy EDBT '06
- Data Mining for Social Network Analysis, Jaideep Srivastava, Nishith Pathak and
Sandeep Mane, ICDM '06
- Spectral Graph Theory and its Applications, Daniel Spielman, FOCS '07
- Research Issues in Protein Location Image Databases, Christos Faloutsos and Bob Murphy SIGMOD '05
- Sensor Mining at Work: Principles and a Water Quality Case-Study, C. Faloutsos and Jeanne VanBriesen, KDD '06
- Data Mining the Internet, Christos Faloutsos and Michalis Faloutsos, SIGCOMM '02, INFOCOM '03, SIGMETRICS '03
- Large graph mining: patterns, tools and case studies, Christos Faloutsos and Hanghang Tong, CIKM '08
References
- Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale
Hypertextual Web Search Engine, Computer Networks 30(1-7): 107-117,
1998.
- Randy Bryant. Data Intensive Scientific Computing, Tech report.
available at
http://www.math.cmu.edu/~bryant/pubdir/cmu-cs-07-128.pdf.
- Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S.
Modha, and Christos Faloutsos. Fully Automatic Cross-Associations,
KDD 2004, Washington, DC.
- Fan R. K. Chung: Spectral Graph Theory (AMS)
- Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S.
Modha. Information-theoretic co-clustering. KDD 2003, Washington,
DC.
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data
Processing on Large Clusters. OSDI'04 ,San Francisco, CA
- Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte
carlo algorithms for matrices iii: Computing a compressed
approximate matrix decomposition, SIAM Journal of Computing,
2005.
- Chris Godsil and Gordon Royle: Algebraic Graph Theory
(Springer)
- Jon Kleinberg. Authoritative sources in a hyperlinked
environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms,
1998.
- Tamara Kolda, Brett Bader, and Joseph Kenny. Higher-order Web
link analysis using multilinear algebra, ICDM 2005, Houston,
Texas.
- Jure Leskovec, Jon Kleinberg and Christos Faloutsos. Graphs
over Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL. ("Best Research Paper"
award).
- Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and
Christos Faloutsos. Realistic, Mathematically Tractable Graph
Generation and Evolution, Using Kronecker Multiplication, ECML/PKDD
2005, Porto, Portugal.
- Jure Leskovec and Christos Faloutsos. Scalable Modeling of Real
Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR,
USA
- Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie S.
Glance, and Matthew Hurst. Patterns of Cascading Behavior in Large
Blog Graphs, SDM 2007, Minneapolis, Minnesota.
- Bojan Mohar and Svatopluk Poljak: Eigenvalues in Combinatorial
Optimization, IMA Preprint Series #939
- Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and
Christos Faloutsos. NetProbe: A Fast and Scalable System for Fraud
Detection in Online Auction Networks WWW 2007, Banff, Alberta,
Canada, May 8-12, 2007.
- Gilbert Strang: Introduction to Applied Mathematics
(Wellesley-Cambridge Press)
- Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond Streams
and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia,
PA.
- Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less
is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM
2007, Minneapolis, Minnesota. ("Best Research Paper" award)
- Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos
Faloutsos. GraphScope: parameter-free mining of large time-evolving
graphs, KDD 2007, San Jose, CA.
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast Random
Walk with Restart and Its Applications, ICDM 2006, Hong Kong.
("Best Research Paper" award)
- Hanghang Tong and Christos Faloutsos. Center-Piece Subgraphs:
Problem Definition and Fast Solutions, KDD 2006, Philadelphia,
PA.
- Hanghang Tong, Brian Gallagher, Tina Eliassi-Rad, and Christos
Faloutsos. Fast best-effort pattern matching in large attributed
graphs, KDD 2007, San Jose, CA.
- Hanghang Tong, Yehuda Koren, and Christos Faloutsos. Fast
direction-aware proximity for graph mining, KDD 2007, San Jose,
CA.
- Hanghang Tong, Spiros Papadimitriou, Philip S. Yu and Christos
Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs.
SDM 2008, Atlanta, GA.
- Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu,
and Christos Faloutsos. Colibri: Fast Mining of Large Static and
Dynamic Graphs, KDD 2008, Las Vegas, NV.
- Charalampos Tsourakakis: Fast Counting of Triangles in Large
Real Networks, without counting: Algorithms and Laws, ICDM '08
- Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos
Faloutsos. Epidemic Spreading in Real Networks: an Eigenvalue
Viewpoint, SRDS 2003, Florence, Italy.
Instructors
Christos Faloutsos
is a
Professor at Carnegie Mellon University. He has received the
Presidential Young Investigator Award by the National Science
Foundation (1989), the Research Contributions Award in ICDM 2006,
twelve ``best paper'' awards, and several teaching awards. He has
served as a member of the executive committee of SIGKDD;
he has published over 160 refereed articles, 11 book chapters and
one monograph. He holds five patents and he has given over 20
tutorials and 10 invited distinguished lectures. His research
interests include data mining for streams and graphs, fractals,
database performance, and indexing for multimedia and
bio-informatics data.
Prof. Gary L. Miller
is most famous for his work in algorithm design. He has
designed both sequential and parallel algorithms. These new
algorithms have appeared in over eighty papers in journals and
prestigious conferences. His most recent interest is parallel
algorithm design for problems arising from Scientific Computation
including parallel algorithms for: partitioners for finite element
meshes, parallel preconditioners for iterative linear system
solvers, and parallel and sequential algorithms for mesh
generation. Computer simulations for highly deformable soft tissues
such as individual red blood cells He has been a program committee
member on eleven computer science conferences including: the Annual
Symposium on Foundations of Computer Science in 1982 1983, 1986,
1990, and 1992, the 1988 VLSI Algorithms and Architectures, the
Annual ACM Symposium on Parallel Algorithms and Architectures in
1988, 1993(chair), and 2006 and the Annual ACM Symposium on the
Theory of Computation 1996(chair) and 2002. He was the general
chair for the annual ACM Symposium on Parallel Algorithms and
Architectures conference. He is an ACM Fellow (2002), and he is a
co-recipient of the ACM Paris Kanellakis Theory and Practice Award
in 2003.
Charalampos Tsourakakis
is a Ph.D. student in the School of Computer Science at Carnegie Mellon University supported by a NSF fellowship.
He holds a Diploma in Electrical and Diploma Engineering from the National Technical
University of Athens. His research interests include spectral graph theory and data mining.