Fourth International Conference on Knowledge Discovery and Data Mining
Sponsored by the American Association for Artificial Intelligence
Rakesh Agrawal and Paul Stolorz; Gregory Piatetsky-Shapiro, General Chair
Published by The AAAI Press, Menlo Park, California. This proceedings is also available in book and CD format.
Alsabti, Khaled; Ranka, Sanjay; Singh, Vineet 1998. CLOUDS: A Decision Tree Classifier for Large Datasets. In Proceedings of the Fourth Knowledge Discovery and Data Mining Conference, eds. Rakesh Agrawal and Paul Stolorz, 2-8. Menlo Park, Calif.: AAAI Press.
Please Note: Abstracts are linked to individual titles, and will appear in a separate browser window. Full-text versions of the papers are linked to the abstract text. Access to full text may be restricted.
Contents
A PDF version of this file is also available
Corporate Sponsors / xi
Preface / xii
Technical Papers
CLOUDS: A Decision Tree Classifier for Large Datasets / 2
Khaled Alsabti, Syracuse University; Sanjay Ranka, University of Florida; Vineet Singh, Hitachi America, Ltd.
Scaling Clustering
Algorithms to Large Databases / 9
P. S. Bradley, Usama Fayyad and Cory Reina, Microsoft Research
Rule Discovery from
Time Series / 16
Gautam Das and King-Ip Lin, University of Memphis; Heikki Mannila, University of Helsinki; Gopal Renganathan,
Autozone Inc.; Padhraic Smyth, University of California, Irvine
Similarity of Attributes
by External Probes / 23
Gautam Das, University of Memphis; Heikki Mannila and Pirjo Ronkainen, University of Helsinki
Finding Frequent
Substructures in Chemical Compounds / 30
Luc Dehaspe, Katholieke Universiteit Leuven; Hannu Toivonen, University of Helsinki; Ross Donald King,
The University of Wales, Aberystwyth
Occam’s Two Razors:
The Sharp and the Blunt / 37
Pedro Domingos, Instituto Superior Técnico
Algorithms for Characterization
and Trend Detection in Spatial Databases / 44
Martin Ester, Alexander Frommelt, Hans-Peter Kriegel and Jörg Sander, University of Munich
Pattern Directed Mining
of Sequence Data / 51
Valery Guralnik, Duminda Wijesekera and Jaideep Srivastava, University of Minnesota
An Efficient
Approach to Clustering in Large Multimedia Databases with Noise
/ 58
Alexander Hinneburg and Daniel A. Keim, University of Halle
Mining Audit Data to
Build Intrusion Detection Models / 66
Wenke Lee, Salvatore J. Stolfo and Kui W. Mok, Columbia University
Data Mining for Direct
Marketing: Problems and Solutions / 73
Charles X. Ling and Chenghui Li, The University of Western Ontario
Integrating Classification
and Association Rule Mining / 80
Bing Liu, Wynne Hsu and Yiming Ma, National University of Singapore
Evaluating
Usefulness for Dynamic Classification / 87
Gholamreza Nakhaeizadeh, Daimler-Benz Research and Technology; Charles Taylor, University of Leeds; Carsten Lanquillon,
Daimler-Benz Research and Technology
A Belief-Driven
Method for Discovering Unexpected Patterns / 94
Balaji Padmanabhan, Leonard N. Stern School of Business and Alexander Tuzhilin, Columbia University
Interpretable
Boosted Naïve Bayes Classification / 101
Greg Ridgeway, David Madigan, Thomas Richardson and John O'Kane, University of Washington
A Data Mining Support
Environment and its Application on Insurance Data / 105
M. Staudt, J.-U. Kietz and U. Reimer, Swiss Life
Coincidence Detection:
A Fast Method for Discovering Higher-Order Correlations in
Multidimensional Data / 112
Evan W. Steeg, Derek A. Robinson and Ed Willis, Molecular Mining Corporation
Interestingness-Based
Interval Merger for Numeric Association Rules / 121
Ke Wang, Soon Hock William Tay and Bing Liu, National University of Singapore
Poster Papers
Online Generation
of Profile Association Rules / 129
Charu C. Aggarwal, T. J. Watson Research Center; Zheng Sun, Duke University; Philip S. Yu, T. J. Watson Research Center
ADtrees for Fast Counting
and for Fast Learning of Association Rules / 134
Brigham Anderson and Andrew Moore, Carnegie Mellon University
Independence Diagrams:
A Technique for Visual Data Mining / 139
Stefan Berchtold and H. V. Jagadish, AT&T Laboratories; Kenneth A. Ross, Columbia University
Direct Marketing Response
Models Using Genetic Algorithms / 144
Siddhartha Bhattacharyya, University of Illinois at Chicago
Mining Association
Rules in Hypertext Databases / 149
José Borges and Mark Levene, University College London
Blurring the Distinction
between Command and Data in Scientific KDD / 154
John Carlis, Elizabeth Shoop and Scott Krieger, University of Minnesota
Probabilistic Modeling
for Information Retrieval with Unsupervised Training Data /
159
Ernest P. Chan, Credit Suisse First Boston; Santiago Garcia, Morgan Stanley & Co.; Salim Roukos, IBM T. J. Watson Research Center
Toward Scalable Learning
with Non-Uniform Class and Cost Distributions:
A Case Study in Credit Card Fraud Detection / 164
Philip K. Chan, Florida Institute of Technology and Salvatore J. Stolfo, Columbia University
Joins that Generalize:
Text Classification Using WHIRL / 169
William W. Cohen, AT&T LabsResearch and Haym Hirsh, Rutgers University
Giga-Mining
/ 174
Corinna Cortes and Daryl Pregibon, AT&T LabsResearch
Interactive Interpretation
of Kohonen Maps Applied to Curves / 179
Anne Debregeas and Georges Hebrail, Electricite de France
FlexiMineA Flexible
Platform for KDD Research and Application Construction / 184
C. Domshlak, D. Gershkovich, E. Gudes, N. Liusternik, A. Meisels, T. Rosen and S. E. Shimony, Ben-Gurion University
A Fast Computer Intrusion
Detection Algorithm Based on Hypothesis Testing of Command
Transition Probabilities / 189
William DuMouchel, AT&T LabsResearch and Matthias Schonlau, AT&T LabsResearch and National Institute of Statistical Sciences
Initialization of
Iterative Refinement Clustering Algorithms / 194
Usama Fayyad, Cory Reina and P. S. Bradley, Microsoft Research
Mining in the Presence
of Selectivity Bias and its Application to Reject Inference
/ 199
A. J. Feelders, Tilburg University; Soong Chang and G. J. McLachlan, University of Queensland
On the Efficient
Gathering of Sufficient Statistics for Classification from Large
SQL Databases / 204
Goetz Graefe, Usama Fayyad and Surajit Chaudhuri, Microsoft Corporation
Coactive Learning
for Distributed Data Mining / 209
Dan L. Grecu and Lee A. Becker, Worcester Polytechnic Institute
Mining Segment-Wise
Periodic Patterns in Time-Related Databases / 214
Jiawei Han, Wan Gong and Yiwen Yin, Simon Fraser University
Learning to Predict
the Duration of an Automobile Trip / 219
Simon Handley and Pat Langley, Daimler-Benz Research and Technology Center; Folke A. Rauscher, Daimler-Benz AG
Fast Computation of
2-Dimensional Depth Contours / 224
Ted Johnson, AT&T Research Center; Ivy Kwok and Raymond Ng, University of British Columbia
Comparing Massive
High-Dimensional Data Sets / 229
Theodore Johnson and Tamraparni Dasu, AT&T LabsResearch
Defining the Goals
to Optimise Data Mining Performance / 234
Mark G. Kelly, David J. Hand and Niall M. Adams, The Open University
An Enhanced Representation
of Time Series Which Allows Fast and Accurate Classification,
Clustering and Relevance Feedback / 239
Eamonn J. Keogh and Michael J. Pazzani, University of California, Irvine
Active Templates:
Comprehensive Support for the Knowledge Discovery Process /
244
Randy Kerber, Hal Beck, Tej Anand and Bill Smart, NCR Human Interface Technology Center
Targeting Business
Users with Decision Table Classifiers / 249
Ron Kohavi and Daniel Sommerfield, Silicon Graphics, Inc.
BAYDA: Software
for Bayesian Classification and Feature Selection / 254
Petri Kontkanen, Petri Myllymäki, Tomi Silander and Henry Tirri, University of Helsinki
Approaches to Online
Learning and Concept Drift for User Identification in Computer Security
/ 259
Terran Lane and Carla E. Brodley, Purdue University
Human Performance
on Clustering Web Pages: A Preliminary Study / 264
Sofus A. Macskassy, Arunava Banerjee, Brian D. Davison and Haym Hirsh, Rutgers, The State University of New Jersey
Aggregation of
Imprecise and Uncertain Information for Knowledge Discovery in Databases
/ 269
Sally McClean, Bryan Scotney and Mary Shapcott, University of Ulster
Discovering Predictive
Association Rules / 274
Nimrod Megiddo and Ramakrishnan Srikant, IBM Almaden Research Center
Reinforcement Learning
for Trading Systems and Portfolios / 279
John Moody and Matthew Saffell, Oregon Graduate Institute
Group Bitmap Index:
A Structure for Association Rules Retrieval / 284
Tadeusz Morzy and Maciej Zakrzewicz, Poznan University of Technology
Towards the Personalization
of Algorithms Evaluation in Data Mining / 289
Gholamreza Nakhaeizadeh, Daimler-Benz AG and Alexander Schnabl, Technical University Vienna
Large Datasets Lead
to Overly Complex Models: An Explanation and a Solution / 294
Tim Oates and David Jensen, University of Massachusetts
Analysing Rock Samples
for the Mars Lander / 299
Jonathan Oliver, University of California, Berkeley; Ted Roush and Paul Gazis, NASA Ames Research Center;
Wray Buntine, Rohan Baxter and Steve Waterhouse, Ultimode Systems
Memory Placement
Techniques for Parallel Association Mining / 304
Srinivasan Parthasarathy, Mohammed J. Zaki and Wei Li, University of Rochester
Methods for Linking
and Mining Massive Heterogeneous Databases / 309
José C. Pinheiro and Don X. Sun, Bell Laboratories
Mining Databases
with Different Schemas: Integrating Incompatible Classifiers
/ 314
Andreas L. Prodromidis and Salvatore Stolfo, Columbia University
Time Series Forecasting
from High-Dimensional Data with Multiple Adaptive Layers / 319
R. Bharat Rao, Scott Rickard and Frans Coetzee, Siemens Corporate Research, Inc.
Ranking -- Methods
for Flexible Evaluation and Efficient Comparison of Classification
Performance / 324
Saharon Rosset, Tel Aviv University
A Robust System Architecture
for Mining Semi-Structured Data / 329
Lisa Singh, Bin Chen, Rebecca Haight, Peter Scheuermann and Kiyoko Aoki, Northwestern University
Defining diff
as a Data Mining Primitive / 334
Ramesh Subramonian, Intel Corporation
Simultaneous Reliability
Evaluation of Generality and Accuracy for Rule Discovery in Databases
/ 339
Einoshin Suzuki, Yokohama National University
Mining Generalized
Association Rules and Sequential Patterns Using SQL Queries
/ 344
Shiby Thomas, University of Florida and Sunita Sarawagi, IBM Almaden Research Center
Data Reduction Based
on Hyper Relations / 349
Hui Wang, Ivo Düntsch and David Bell, University of Ulster
Discovering Technical
Traders in the T-bond Futures Market/ 354
Andreas S. Weigend, Fei Chen and Stephen Figlewski, New York University; Steven R. Waterhouse, Ultimode Systems
Learning to Predict
Rare Events in Event Sequences / 359
Gary M. Weiss, AT&T Labs and Rutgers University, and Haym Hirsh, Rutgers University
Daily Prediction of
Major Stock Indices from Textual WWW Data / 364
B. Wüthrich, D. Permunetilleke, S. Leung, V. Cho, and J. Zhang, The Hong Kong University of Science and Technology, and W. Lam, The Chinese University of Hong Kong
PlanMine: Sequence
Mining for Plan Failures / 369
Mohammed J. Zaki, Neal Lesh and Mitsunori Ogihara, University of Rochester
Conference Report
Knowledge Discovery
and the Interface of Computing and Statistics / 375
Arnold Goodman, University of California, Irvine and John Elder IV, Elder Research
Index / 379
Tutorial
A Comparison of Leading Data Mining Tools
John F. Elder IV and Dean W. Abbott
AAAI Digital Library
AAAI relies on your generous support through membership and donations. If you find these resources useful, we would be grateful for your support.