Parallel Key Value Pattern Matching Model

IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 08, 2014 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 181
Parallel Key Value Pattern Matching Model
R. Senthamil Selvi1
Dr. T. Abdul Razak2
1
Assistant Professor 2
Associate Professor
1,2
Department of Computer Science
1,2
Jamal Mohamed College (Autonomous), Tiruchirappalli
Abstract— Mining frequent itemsets from the huge
transactional database is an important task in data mining.
To find frequent itemsets in databases involves big decision
in data mining for the purpose of extracting association
rules. Association rule mining is used to find relationships
among large datasets. Many algorithms were developed to
find those frequent itemsets. This work presents a
summarization and new model of parallel key value pattern
matching model which shards a large-scale mining task into
independent, parallel tasks. It produces a frequent pattern
showing their capabilities and efficiency in terms of time
consumption. It also avoids the high computational cost. It
discovers the frequent item set from the database.
Keywords: Data mining, FP Growth, Frequent Item Set
Mining, Association rule Mining
I. INTRODUCTION
Data Mining is a collection of processes for efficient
discovery of previously unknown, valid, useful and
understandable patterns in large databases. The patterns
should be actionable. So that they may be used in an
enterprise’s resolution processes. It has many software and
tools; they are used to analyze the data from large databases.
Mining Frequent Pattern is an important concept
for data mining. It gives the minimum support for threshold
in frequent itemset. Association rule mining discovers
relations between variables in large databases. Maximal
Frequent Itemset is an item set that occur maximum number
of times to the other itemset. The main purpose is to produce
a large number of results as a pattern.
Closed Frequent Itemset is linked to all frequent
itemsets. Each item can be linked to other item and to form
closed group itemsets. For example, if four itemsets are
taken as, s1, s2, s3 and s4. The first three items can be
linked to each other like, (s1, s2), (s1, s3) and (s2, s3). So
the three items form a group between them and have a
closed itemset. The main purpose is to produce a large
number of results as a pattern.
II. MOTIVATION
Business data are stored in computer and it allows users to
navigate through the data in real time. The evolution of data
mining is to support three technologies. They are data
collection, high performance computing and data mining
algorithm. Data mining has many algorithms in frequent
itemset mining. Every algorithm can perform well.
Especially, the FP-growth algorithm avoids the generation
of large numbers of candidate sets. The main idea of the
algorithm is to maintain a frequent pattern tree. All
algorithms of frequent itemset mining do not use the concept
of parallel key value model. It distributes the work through
the program to easily search and retrieve the frequent pattern
data.
III. RELATED WORK
Jiawei Han et al. [1] proposed FP-growth approach for
mining frequent itemsets without candidate generation. It is
an extended prefix-tree structure for storing quantitative
information about frequent patterns. And also some
optimizations are available to speed up FP-growth.
Christian Borgelt proposed a C implementation of a
FP-growth algorithm. The pruning concept is achieved by
traversing the levels of the FP-tree from top to bottom [2].
In implementation, the initial FP-tree is built from top to
bottom and built from a main memory representation of the
transaction database as a simple list of integer arrays. FP
growth algorithm behaves exactly the opposite way as
Apriori, which in implementation usually runs faster if items
are sorted in the ascending order.
Aiman Moyaid Said et al. [3] proposed a
comparative study of FP-growth variation. It is an
alternative method to the Apriori-based approach. It
represents the frequent itemset into a frequent pattern tree or
FP-tree, which retains the information of itemset. Using the
compact tree structure, the FP-growth algorithm mines all
the frequent itemsets.
B. Santhosh Kumar et al. [4] proposed a
comparison of memory usage and time usage in Apriori
algorithm and FP growth algorithm. It uses a compact data
structure and eliminates the repeated database scan. The
algorithm has some advantages like completeness and
compactness.
Haoyuan li et al [5] proposed FP-growth based on
the principle of divide and conquer way. That is to
decompose a mining task into a smaller task and totally
avoid candidate generation. In this paper, parallel algorithms
were developed for reducing memory use and computational
cost on every machine. Recent work in parallelizing FP-
growth suffers from high communication cost. Here a
MapReduceModel of parallel FP-growth algorithm (PFP)
which cleverly slices a large-scale mining task into
autonomous computational tasks and maps them into
MapReduce jobs achieving non-linear speedup was
proposed. The paper is based on novel data and
computation distribution scheme, which virtually eliminates
communication among computers and use map reduce
model. It is effective in mining tag-tag associations and
webpage-webpage associations to support query
recommendation or related search.
Bharat Gupta et al. proposed FP-growth algorithm
[6] that compresses the database of frequent itemsets into
frequent pattern tree recursively in the same order of
magnitude as the numbers of frequent patterns. It then
divides the compressed database into a set of conditional
databases. The FP-growth technique constructs conditional
frequent pattern tree and conditional pattern base from
database which satisfy the minimum support.

(IJSRD/Vol. 2/Issue 08/2014/044)
Marek wojciechowski et al. proposed the common
counting method to work with FP-Growth algorithm and
evaluate the efficiency of both methods when FP-Growth
basically used as a mining algorithm. [7] They consider the
problem of optimizing batches of frequent itemset queries.
This paper uses multiple query optimization methods, like
common counting and mine merge. This methods reduces
the I/O cost for common execution tasks and executes them
only once for the whole data. The experiment shows that
common counting for FP-Growth reduces the overall
processing time.
E R Naganathan et al. [8] proposed structured data
mining. It is a major research topic in Data Mining. One of
the common types of representation of structured data is
graph. Graph-based data mining show a number of methods
to mine the relational aspects of data. Graph is an alternate
approach of modeling the objects. Graph-based data mining
(GDM) is the task of finding novel, and understandable
graph-theoretic patterns in a graph representation of data. It
presents a new process to find out the Normalization
Technique for the sub graphs obtained from the FP-growth
model. This process may be one of the perfect ranking
schemes among the sub graphs mined and this ranking
scheme will play an efficient role in the sub graph
applications.
IV. EXISTING PARALLEL FP-GROWTH MODEL
Parallel FP-Growth (PFP) means mining the complete set of
frequent patterns by pattern fragment growth in parallel.
Generally it depends on distributed machines. Each machine
executes on an independent group of mining tasks. The FP-
Growth algorithm runs much faster than the Apriori, but the
parallel FP-Growth algorithm is too faster than the FP-
Growth algorithm. It converts the DB into new databases of
group-dependent transactions. So that the FP-trees built
from different group-dependent transactions are
independent. It is used to eliminate the computational
dependencies between machines. And also it demonstrates
that PFP to be promising for supporting query
recommendation for search engines.
The PFP explains the resource challenges for FP-
Growth algorithm. They are storage, computation
distribution, costly communication and support threshold
value in FP-growth. Given a set of transaction database, PFP
uses three MapReduce phases to parallelize FP-Growth.
The PFP framework has five stages of
computation. They are shard, parallel counting, graphing
items, parallel FP-Growth and aggregating. PFP using
parallel counting is a classical application of MapReduce
approach.
PFP using MapReduce approach is used to shard a
large-scale mining task into independent computational
tasks. And also it is able to address the issues of memory use
and fault tolerance. So PFP is effective in mining tag-tag
associations and webpage-webpage associations to support
query recommendation. And the disadvantage of this
method is distributed machines. Because it will increase cost
of each machines.
V. THE PROPOSED MODEL
This model uses a frequent pattern to work faster than other
methods. Here, two tasks are used. They are XModel and
PModel. This Frequent Pattern proves that PModel task is
better and works faster than XModel. Also in this PModel
computing time is saved. This model is suitable for all the
algorithms in data mining.
The processing of XModel is used to retrieve data
from the user interface and generate a frequent itemset. If
items are equal, the process will end. Otherwise it will
return to the temporary database and again execute the
whole process until the condition is true.
The Processing of PModel is used to retrieve data
from the user interface and distribute the work using the
key-value for preparing frequent item sets. The intersect
operation between all frequent item sets are executed and
another frequent list called the F-list is produced. Group of
all F-list is called G-list. Then it checks whether the G-list
has more equal frequent items and the process will end.
Otherwise the whole process will be repeated until the
frequent item sets are equal.
A. Searching Algorithm
The searching algorithm illustrated in Fig. 1 sorts the set of
items in descending order and connects a database using
JdbcOdbcDriver; this process is showed in steps 8 to 11.
Then it checks if the condition record set is null, and moves
to the next record set. Otherwise it will be cleared. To select
a frequent item from database using a command execute a
Query statement; select a product from transaction table.
After selection process is completed, the database
connection is closed. This process is showed in Steps 18 to
20. All items are collected to generate groups.
Fig. 1: The Searching Algorithm

B. XModel Algorithm
Fig. 2: The XModel Algorithm
The XModel illustrated in Fig. 2. is used to execute
a whole database and this takes more time for execution.
First, to read databases, then to select all items and execute
them; therefore the frequent items can be displayed at end of
the program. The main work of XModel is to print the
starting and ending time. This process is showed in steps 14
to 18. The total time duration can be calculated by the
difference between ending time and finishing time. So, the
duration of time taken to finish is in the order of
milliseconds.
C. PModel Algorithm
The PModel Algorithm illustrated in Fig. 3, works as
follows:
(1) Scan the transaction database once to find all
frequent items and their supports.
(2) Sort the frequent items in descending order of their
support.
(3) Get the first transaction from the transaction
database. Remove all non-frequent items and list
the remaining items according to the order in the
sorted frequent items.
(4) Get the next transaction from the transaction
database. Remove all non-frequent items and list
the remaining items according to the order in the
sorted frequent items.
(5) Group all sorted frequent itemsets and display the
start time and end time.
(6) Continue with step 4 until all transactions of the
database are processed.
Fig. 3: The PModel Algorithm
D. Architectural Diagram
Fig. 4: Parallel Key Value Pattern Matching Model Using
XModel
In Fig. 4, a single program for searches a data from
the database with the help of single key value. So it
produces a large frequent item set. It is a difficult process to
get an exact frequent item set. Because it takes a lot of time
to execute the processes. Now, the user gets more equal

frequent item sets, and then the process will end. Otherwise
the process will be repeated until the user gets the exact
frequent item set.
Fig. 5: Parallel Key Value Pattern Matching using PModel
Architectural Diagram for Parallel Key Value
Pattern Matching model using PModel is illustrated in Fig.
5. The PModel contains a program for searching a data from
the database with the help of key value. So each program
contains some frequent itemsets that can be denoted as
frequent set1, frequent set2, and frequent set3. Then it
performs the operation of intersection between frequent set
1 and frequent set 2, frequent set 2 and frequent set 3,
frequent set3 and frequent set 1. After performing this
operation, it gives another frequent list; it is called F-list I,
F-list II, and F-list III respectively. After that, group all F-
lists, and then it gives frequent item set. This set can be
denoted as “G-List”. Now, the users get more equal frequent
item sets, and then the process will end. Otherwise the
process will be repeated until the user gets the exact
frequent item set.
VI. RESULTS AND DISCUSSION
This section contains the comparison table and graph of
XModel and PModel. The proposed model is applied into
the data of transactions.
The following table provides the summing up of results.
Number of Records
Finishing Time (Milliseconds)
X Model P Model
25 844 194
50 1567 1477
75 3010 2270
100 7929 3509
Table 1: Values for Comparison Graph
The execution time of XModel and PModel is
differentiated from each other in milliseconds. The
execution time is based on the number of records. This is
shown in Table 1. For example, in 100 records, the XModel
can take 7929 milliseconds for execution while the PModel
can execute in 3509 milliseconds only.
Fig. 6: XModel Vs PModel
Fig. 6. shows the comparison graph of XModel vs.
PModel. Number of records represents X-axis and
milliseconds Y-axis. The bar chart differentiates the
XModel and PModel. The experiments show that Parallel
Key Value Matching Model reduces the overall processing
time.
VII. CONCLUSION
The Parallel Key Value Pattern Matching Model is suitable
for all algorithms in frequent itemset mining, which are
usually of large scale distribution. It demonstrated that
parallel key value pattern matching model is effective for
discovering frequent itemsets. This model contained two
methods, they are XModel and PModel. The existing model
and proposed model are denoted as XModel and PModel
respectively. The XModel takes more time to execute a
program. The comparison is based on the performance of
speedup and efficiency. The PModel produced better results
of speedup and efficiency than XModel.
REFERENCES
[1] Jia Wei Han, Jian Pei, Yiwen Yin, Runying Mao,
“Mining Frequent Patterns without Candidate
Generation: A Frequent-Pattern Tree Approach”,
Data Mining and Knowledge Discovery, 8, 53 – 87,
Kluwer Academic Publishers, Netherlands, 2004.
[2] Christian Borgelt, “An Implementation of the FP-
Growth Algorithm”, Department of Knowledge
Processing and Language Engineering, Germany,
2005.
[3] Aiman Moyaid Said, Dr. P D D.Dominic, Dr.
Azween B Abdullah,” A Comparative Study of FP-
Growth Variations”, Department of Computer and
Information Sciences, International Journal of
Computer Science and Network Security, Vol.9,
No.5, Petronas, May 2009.
[4] B. Santhosh Kumar and K.V. Rukmani,
“Implementation of Web Usage Mining using Apriori
and FP Growth Algorithms”, Department of
Computer Science, Int.J. Of Advanced Networking

and Applications, Vol.1, Issue: 06, Pages: 400-404,
Ketti, the Nilgiris, and Feb – April 2010.
[5] Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang,
Edward Chang, “PFP: Parallel FP-Growth for Query
Recommendation”, Google Beijing Research, china,
2010.
[6] Bharat Gupta and Dr. Deepak Garg, “FP-Tree Based
Algorithm Analysis: FP-Growth, COFI-Tree and CT-
PRO”, Department of Computer Science,
International Journal on Computer Science and
Engineering (IJCSE), ISSN: 0975-3397, Vol. 3, No.
7, Patiala, India, July 2011.
[7] Marek Wojciechowski, Krzysztof Galecki, and
Krzysztof Gawronek, “Concurrent Processing of
Frequent Itemset Queries using FP-Growth
Algorithm”, Department of Computer Science,
Poland.
[8] E R Naganathan, S.Narayanan and K. Ramesh
Kumar, “FP-growth Based new normalization for sub
graph ranking”, Department of Computer
Application, International Journal of Database
Management System(IJDMS), Vol. 3, No.1, Tamil
Nadu, February 2011.

Parallel Key Value Pattern Matching Model

More Related Content

What's hot

Similar to Parallel Key Value Pattern Matching Model

More from ijsrd.com

Recently uploaded

Parallel Key Value Pattern Matching Model