# GraphMR: Distributed Graph Match Method using MapReduce

This work is a distributed graph match method on large data sets. This work was submitted to IEEE Transactions on Kowledge and Data Engineering (TKDE).

## Data Generation

This work includes a random graph generator tool. The graph generator is an implementation of the random graph model (Erdős and Rényi 1959) in MapReduce. A graph is constructed in a distributed way across a number of cluster nodes. Therefore, we can get large graphs in a scalable manner from this generator. Besides, It enables users to specify various parameters as follows:

- -directed or undirected graphs
- -the number of vertices
- -average of degree
- -the type of attributed graphs
- -vertices only has attributes
- -vertices and edges has attributes

- -the number of distinct vertex attributes
- -the number of distinct edge attributes

### Random Graph Model

There are two standard models for generating random graphs. One is *G*(*n, m*) model, where *n* is the number of vertices and *m* is the number of edges. This model was proposed by Erdős and Rényi [1]. In order to generate a random graph, this model randomly chooses *m* edges among nC2 possible edges. Another one is *G*(*n*,*p*), where *n* is the number of vertices and *p* is probability of an edge between two vertices among all possible edges. This model was first introduced by E. Gilbert [2]. Our random graph generator provides both models.

## Usages

### How to Build

Unpack the downloaded source code and build the source as follows:

$ tar xzvf graphmr-0.1-SNAPSHOT.tar.gz $ cd graphmr-0.1-SNAPSHOT $ mvn assembly:assembly -DskipTests $ ls -l target/graphmr-0.1-SNAPSHOT.jar target/graphmr-0.1-SNAPSHOT.jar

### Options

The following command prints out various options.

$ hadoop jar graphmr-0.1-SNAPSHOT.jar gen usage: Usage: graphmr.generator.GraphGenerator -a the number of distinct vertex attributes -d,--directed directed graph or undirected (default: undirected) --debug debug mode (default: disabled:) -e the number of distinct edge attributes --expo a exponent value- this option is only used in zipf -k average of degree -l,--labeled none - no label, v - only vertex label, ve - vertex and edge labels (default) --labeldist random - uniform distribution (default), zipf - zipf distribution -m the number of map tasks --model which model: erdos (Erdos and Reiny model), gilbert (a variation of Erdos and Reiny Model), and zipf -o,--out output filename -p,--prob a probability - this option is only used in gilbert -r the number of reduce tasks -t,--text the program outputs vertices as text format

## Examples

generating a undirected graph with 10000 vertices and 5 average degree.

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 10000 -o out

### generating a undirected graph with 10000 vertices and 10 distinct vertex

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 10000 -o out

generating a undirected graph with 10000 vertices, 10 distinct vertex attributes, and 20 distinct edge attributes.

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 -e 20 10000 -o out

generating a directed graph with 10000 vertices

# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -d -o out 10000

## Source Code

## References

- Erdős, P. Rényi, A. “On Random Graphs I” in Publ. Math. Debrecen 6, p. 290-297, 1959.
- Gilbert, E.N. “Random Graphs”. Annals of Mathematical Statistics 30: 1141–1144, 1959.
- Erdős-Rényi model, Wikipedia