An efficient construction and application usefulness of rectangle greedy covers

Ouchi Koji; Nakamura Atsuyoshi; Kudo Mineichi

Title	en An efficient construction and application usefulness of rectangle greedy covers
Creator	en Ouchi, Koji en Nakamura, Atsuyoshi NRID 1000050344487 en Kudo, Mineichi NRID 1000060205101
Accessrights	open access
Subject	Other en Greedy cover Other en Axis-parallel hyperrectangle Other en Classification Other en Data mining NDC 007
Description	Abstract en We develop efficient construction methods of a rectangle greedy cover (RGC), and evaluate its usefulness in applications. An RGC is a greedy cover of the set of given positive instances by exclusive axis-parallel hyperrectangles, namely, axis-parallel hyperrectangles that exclude all the given negative instances. An RGC is expected to be a compact classification rule with high readability because the number of its component rectangles is expected to be small and it can be seen as a disjunctive normal form, which is one of the most readable representations for us. We propose two approaches of RGC construction: enumeration approach and direct approach. In enumeration approach, the maximal exclusive positive subsets (MEPSs) are enumerated first and then an ordinary greedy set covering is done using the enumerated MEPSs. We make clear the relation between enumeration of the maximal frequent itemsets and enumeration of the MEPSs, and convert an efficient enumeration algorithm LCMmax [1] of maximal frequent itemsets to an enumeration algorithm LCMmax.R-naive of MEPSs. We also develop a more efficient version of LCMmax.R-naive, or LCMmax.R, by incorporating effective dynamic reordering of instances using excluded frequency and bit-parallel exclusiveness check. In direct approach, each component MEPS of an RGC is searched not from enumerated MEPSs but directly from the dataset that consists of the remaining uncovered positive instances and the whole negative instances. We developed an algorithm called MRF that efficiently finds an maximum-sized MEPS for given positive and negative instances. MRF is made from LCMmax.R by modifying it so as to find a maximum-sized MEPS only. An RGC is constructed by MRF repetition, that is, by repeatedly executing MRF using the remaining uncovered positive instances. According to our experimental evaluation using UCI-repository datasets, LCMmax.R was about 5-11 times faster than LCMmax.Rnaive, which indicates effectiveness of the introduced two improvements. MRF repetition, however, was significantly faster than LCMmax.R, and it was fast enough for practical usage. The experimental results using UCI-repository datasets also showed that accuracy of a nearest rectangle classifier using an RGC is close to that using the hyperrectangles output by the randomized subclass method (RSM) [2] though the number of component rectangles of an RGC is significantly smaller than the number of the hyperrectangles output by RSM. The performance of RGC was also shown to be comparable to that of the six popular classifiers including logistic regression and support vector machine. The disjunctive normal form representation of the classification rules obtained by RGC was demonstrated to be simpler and more readable for us than that obtained by RSM and C4.5. (C) 2013 Elsevier Ltd. All rights reserved.
Publisher	en Elsevier Science
Date	Issued2014-03
Language	eng
Resource Type	journal article
Version Type	AM
Identifier	HDL http://hdl.handle.net/2115/54801
Relation	isVersionOf DOI https://doi.org/10.1016/j.patcog.2013.09.008
Journal	PISSN 0031-3203 en Pattern Recognition Volume Number47 Issue Number3 Page Start1459 Page End1468
File	fulltext pat_rec_r1_c.pdf 220.88 KB (application/pdf) Issued2014-03
Oaidate	2023-07-26