GraphMR: Distributed Graph Match Method using MapReduce

  1. Home
  2. Projects
  3. GraphMR: Distributed Graph Match Method using MapReduce

This work is a distributed graph match method on large data sets. This work was submitted to IEEE Transactions on Kowledge and Data Engineering (TKDE).

Data Generation

This work includes a random graph generator tool. The graph generator is an implementation of the random graph model (Erdős and Rényi 1959) in MapReduce. A graph is constructed in a distributed way across a number of cluster nodes. Therefore, we can get large graphs in a scalable manner from this generator. Besides, It enables users to specify various parameters as follows:

  • -directed or undirected graphs
  • -the number of vertices
  • -average of degree
  • -the type of attributed graphs
    • -vertices only has attributes
    • -vertices and edges has attributes
  • -the number of distinct vertex attributes
  • -the number of distinct edge attributes

Random Graph Model

There are two standard models for generating random graphs. One is G(n, m) model, where n is the number of vertices and m is the number of edges. This model was proposed by Erdős and Rényi [1]. In order to generate a random graph, this model randomly chooses m edges among nC2 possible edges. Another one is G(n,p), where n is the number of vertices and p is probability of an edge between two vertices among all possible edges. This model was first introduced by E. Gilbert [2]. Our random graph generator provides both models.

Usages

How to Build

Unpack the downloaded source code and build the source as follows:

$ tar xzvf graphmr-0.1-SNAPSHOT.tar.gz
$ cd graphmr-0.1-SNAPSHOT
$ mvn assembly:assembly -DskipTests
$ ls -l target/graphmr-0.1-SNAPSHOT.jar
target/graphmr-0.1-SNAPSHOT.jar

Options

The following command prints out various options.

$ hadoop jar graphmr-0.1-SNAPSHOT.jar gen
usage: Usage: graphmr.generator.GraphGenerator 
 -a                the number of distinct vertex attributes
 -d,--directed          directed graph or undirected (default: undirected)
    --debug             debug mode (default: disabled:)
 -e                the number of distinct edge attributes
    --expo         a exponent value- this option is only used in zipf
 -k                average of degree
 -l,--labeled      none - no label, v - only vertex label, ve -
                        vertex and edge labels (default)
    --labeldist    random - uniform distribution (default), zipf -
                        zipf distribution
 -m                the number of map tasks
    --model        which model: erdos (Erdos and Reiny model),
                        gilbert (a variation of Erdos and Reiny Model),
                        and zipf
 -o,--out          output filename
 -p,--prob         a probability - this option is only used in
                        gilbert
 -r                the number of reduce tasks
 -t,--text              the program outputs vertices as text format

Examples

generating a undirected graph with 10000 vertices and 5 average degree.

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 10000 -o out

generating a undirected graph with 10000 vertices and 10 distinct vertex

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 10000 -o out

generating a undirected graph with 10000 vertices, 10 distinct vertex attributes, and 20 distinct edge attributes.

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 -e 20 10000 -o out

generating a directed graph with 10000 vertices

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -d -o out 10000

Source Code

Download

References

  1. Erdős, P. Rényi, A. “On Random Graphs I” in Publ. Math. Debrecen 6, p. 290-297, 1959.
  2. Gilbert, E.N. “Random Graphs”. Annals of Mathematical Statistics 30: 1141–1144, 1959.
  3. Erdős-Rényi model, Wikipedia
Menu