Third International Conference on Knowledge Discovery and Data Mining
Sponsored by the American Association for Artificial Intelligence
Edited by David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy
Published by The AAAI Press, Menlo Park, California. This proceedings is also available in book and CD format.
Please Note: Abstracts are linked to individual titles, and will appear in a separate browser window. Full-text versions of the papers are linked to the abstract text. Access to full text may be restricted.
Contents
KDD-97 Organization / x
Sponsoring Organizations / xi
Preface / xiii
KDD-97 Plenary Papers
An
Interactive Visualization Environment for Data Exploration
Mark Derthick, John Kolojejchick, and Steven F. Roth, Carnegie Mellon
University / 2
Density-Connected
Sets and their Application for Trend Detection in Spatial Databases
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu,
University of Munich, Germany / 10
Visualization
Techniques to Explore Data Mining Results for Document Collections
Ronen Feldman, Bar-Ilan University, Israel; Willi Klösgen, German
National Research Center for Information Technology, Germany; Amir Zilberstein,
Bar-Ilan University, Israel / 16
A
Probabilistic Approach to Fast Pattern Matching in Time Series Databases
Eamonn Keogh and Padhraic Smyth, University of California, Irvine /
24
Using
General Impressions to Analyze Discovered Classification Rules
Bing Liu, Wynne Hsu, and Shu Chen, National University of Singapore,
Singapore / 31
Development
of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms
Gholamreza Nakhaeizadeh, Daimler-Benz AG, Germany and Alexander Schnabl,
Technical University Vienna, Austria / 37
Analysis
and Visualization of Classifier Performance: Comparison under Imprecise
Class and Cost Distributions
Foster Provost and Tom Fawcett, NYNEX Science and Technology / 43
Discriminative
Versus Informative Learning
Y. Dan Rubinstein and Trevor Hastie, Stanford University / 49
Anytime
Exploratory Data Analysis for Massive Data Sets
Padhraic Smyth, University of California, Irvine and David Wolpert, IBM
Almaden Research Center / 54
Detecting
Atmospheric Regimes Using Cross-Validated Clustering
Padhraic Smyth, University of California, Irvine; Michael Ghil and Kayo
Ide, University of California, Los Angeles; Joe Roden, Jet Propulsion
Laboratory, California Institute of Technology; Andrew Fraser, Portland State
University / 61
Mining
Association Rules with Item Constraints
Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal, IBM Almaden Research
Center / 67
JAM:
Java Agents for Meta-Learning over Distributed Databases
Salvatore Stolfo, Andreas L. Prodromidis, Shelley Tselepis, Wenke Lee,
and Dave W. Fan, Columbia University; Philip K. Chan, Florida Institute of
Technology / 74
A
Visual Interactive Framework for Attribute Discretization
Ramesh Subramonian, Ramana Venkata, and Joyce Chen, Intel Corporation /
82
Automated
Discovery of Active Motifs in Three Dimensional Molecules
Xiong Wang and Jason T.L. Wang, New Jersey Institute of Technology;
Dennis Shasha, New York University; Bruce Shapiro, National Cancer Institute;
Sitaram Dikshitulu, New Jersey Institute of Technology; Isidore Rigoutsos, IBM
T. J. Watson Research Center; Kaizhong Zhang, University of Western Ontario,
Canada / 89
Computing
Optimized Rectilinear Regions for Association Rules
Kunikazu Yoda, Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita,
and Takeshi Tokuyama, IBM Tokyo Research Laboratory, Japan / 96
Knowledge
= Concepts: A Harmful Equation
Jan M. Zytkow, Wichita State University and Polish Academy of Sciences,
Poland / 104
KDD-97 Poster Papers
Discovery
of Actionable Patterns in Databases: The Action Hierarchy Approach
Gediminas Adomavicius, New York University and Alexander Tuzhilin, Stern
School of Business, New York City University / 111
Partial
Classification Using Association Rules
Kamal Ali and Stefanos Manganaris, IBM Global Business Intelligence
Solutions; Ramakrishnan Srikant, IBM Almaden Research Center / 115
Increasing
the Efficiency of Data Mining Algorithms with Breadth-First Marker
Propagation
John M. Aronis, University of Pittsburgh and Foster J. Provost, NYNEX
Science and Technology / 119
Brute-Force
Mining of High-Confidence Classification Rules
Roberto J. Bayardo, Jr., The University of Texas at Austin / 123
Applying
Data Mining and Machine Learning Techniques to Submarine Intelligence
Analysis
Ulla Bergsten, Johan Schubert, and Per Svensson, Defence Research
Establishment, Sweden / 127
Process-Based
Database Support for the Early Indicator Method
Christoph Breitner and Jörg Schlösser, University of
Karlsruhe, Germany; Rüdiger Wirth, Daimler-Benz AG, Germany / 131
MineSet:
An Integrated System for Data Mining
Cliff Brunk, James Kelly, and Ron Kohavi, Silicon Graphics, Inc. /
135
Proposal
and Empirical Comparison of a Parallelizable Distance-Based Discretization
Method
Jesús Cerquides and Ramon López de Màntaras,
Spanish Council for Scientific Research, Spain / 139
Large
Scale Data Mining: Challenges and Responses
Jaturon Chattratichat, John Darlington, Moustafa Ghanem, Yike Guo,
Harald Hüning, Martin Köhler, Janjao Sutiwaraphun, Hing Wing To, and
Dan Yang, Imperial College of London, United Kingdom / 143
Using
Artificial Intelligence Planning to Automate Science Data Analysis
for Large Image Databases
Steve Chien, Forest Fisher, and Helen Mortensen, Jet Propulsion
Laboratory, California Institute of Technology; Edisanter Lo and Ronald
Greeley, Arizona State University / 147
Mining
Multivariate Time-Series Sensor Data to Discover Behavior Envelopes
Dennis DeCoste, Jet Propulsion Laboratory, California Institute of
Technology / 151
Why
Does Bagging Work? A Bayesian Account and its Implications
Pedro Domingos, University of California, Irvine / 155
Fast
Committee Machines for Regression and Classification
Harris Drucker, Monmouth University / 159
A
Guided Tour through the Data Mining Jungle
Robert Engels, University of Karlsruhe, Germany; Guido Lindner, Daimler
Benz AG, Germany; Rudi Studer, University of Karlsruhe, Germany / 163
Maximal
Association Rules: A New Tool for Mining for Keyword Co-Occurrences
in Document Collections
Ronen Feldman, Yonatan Aumann, Amihood Amir, and Amir Zilberstein,
Bar-Ilan University, Israel; Willi Kloesgen, German National Research Center
for Information Technology, Germany / 167
Improving
Scalability in a Scientific Discovery System by Exploiting Parallelism
Gehad Galal, Diane J. Cook, and Lawrence B. Holder, University of Texas
at Arlington / 171
Deep
Knowledge Discovery from Natural Language Texts
Udo Hahn and Klemens Schnattinger, Freiburg University, Germany /
175
Integrating
and Mining Distributed Customer Databases
Ira J. Haimowitz, Özden Gür-Ali, and Henry Schwarz, General
Electric Corporate Research and Development / 179
GA-Based
Rule Enhancement in Concept Learning
Jukka Hekanaho, Turku Center for Computer Science and Åbo Akademi
University, Finland / 183
Target-Independent
Mining for Scientific Data: Capturing Transients and Trends for
Phenomena Mining
Thomas H. Hinke, John Rushing, Heggere Ranganath, and Sara J. Graves,
University of Alabama in Huntsville / 187
Zeta:
A Global Method for Discretization of Continuous Variables
K. M. Ho and P. D. Scott, University of Essex, United Kingdom / 191
Adjusting
for Multiple Comparisons in Decision Tree Pruning
David Jensen and Matt Schmill, University of Massachusetts, Amherst /
195
SIPping
from the Data Firehose
George H. John, IBM Almaden Research Center and Brian Lent, Stanford
University / 199
Mining
Generalized Term Associations: Count Propagation Algorithm
Jonghyun Kahng, Wen-Hsiang Kevin Liao, and Dennis McLeod, University of
Southern California / 203
Metarule-Guided
Mining of Multi-Dimensional Association Rules Using Data Cubes
Micheline Kamber, Jiawei Han, and Jenny Y. Chiang, Simon Fraser
University, Canada / 207
Scalable,
Distributed Data Mining-An Agent Architecture
Hillol Kargupta, Ilker Hamzaoglu, and Brian Stafford, Los Alamos
National Laboratory / 211
Clustering
Sequences of Complex Objects
A. Ketterlin, LSIIT, France / 215
A
Unified Notion of Outliers: Properties and Computation
Edwin M. Knorr and Raymond T. Ng, University of British Columbia, Canada
/ 219
Mining
for Causes of Cancer: Machine Learning Experiments at Various Levels
of Detail
Stefan Kramer, Austrian Research Institute for Artificial Intelligence,
Austria; Bernhard Pfahringer, University of Waikato, New Zealand; and Christoph
Helma, University of Vienna, Austria / 223
Discovering
Trends in Text Databases
Brian Lent, Rakesh Agrawal, and Ramakrishnan Srikant, IBM Almaden
Research Center / 227
Fast
Robust Visual Data Mining
Ted Mihalisin, Temple University and John Timlin, Mihalisin Associates,
Inc. / 231
Beyond
Concise and Colorful: Learning Intelligible Rules
Michael J. Pazzani, Subramani Mani, and W. Rodman Shankle, The
University of California, Irvine / 235
Scaling
Up Inductive Algorithms: An Overview
Foster Provost, NYNEX Science and Technology and Venkateswarlu Kolluri,
University of Pittsburgh / 239
Visualizing
Bagged Decision Trees
J. Sunil Rao, Cleveland Clinic and William J.E. Potts, SAS Institute
Inc. / 243
KESO:
Minimizing Database Interaction
Arno Siebes and Martin L. Kersten, CWI, The Netherlands / 247
Learning
to Extract Text-Based Information from the World Wide Web
Stephen Soderland, University of Washington / 251
Image
Feature Reduction through Spoiling: Its Application to Multiple
Matched Filters for Focus of Attention
Timothy M. Stough and Carla E. Brodley, Purdue University / 255
Autonomous
Discovery of Reliable Exception Rules
Einoshin Suzuki, Yokohama National University, Japan / 259
An
Efficient Algorithm for the Incremental Updation of Association
Rules in Large Databases
Shiby Thomas, Sreenath Bodagala, Khaled Alsabti, and Sanjay Ranka,
University of Florida / 263
Bayesian
Inference for Identifying Solar Active Regions
Michael Turmon and Saleem Mukhtar, Jet Propulsion Laboratory, California
Institute of Technology; Judit Pap, University of California, Los Angeles /
267
Schema
Discovery for Semistructured Data
Ke Wang and Huiqing Liu, National University of Singapore, Singapore /
271
Selecting
Features by Vertical Compactness of Data
Ke Wang and Suman Sundaresh, National University of Singapore, Singapore
/ 275
Knowledge
Discovery in Integrated Call Centers: A Framework for Effective
Customer-Driven Marketing
Paul Xia, EIS International Inc. / 279
New
Algorithms for Fast Discovery of Association Rules
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, University of
Rochester / 283
Fast
and Intuitive Clustering of Web Documents
Oren Zamir, Oren Etzioni, Omid Madani, and Richard M. Karp, University
of Washington / 287
KDD
Process Planning
Ning Zhong, Yamaguchi University, Japan; Chunnian Liu, Beijing
Polytechnic University, China; Yoshitsugu Kakemoto, The University of Tokyo,
Japan; Setsuo Ohsuga, Waseda University, Japan / 291
Optimal
Multiple Intervals Discretization of Continuous Attributes for Supervised
Learning
D. A. Zighed, R. Rakotomalala, and F. Feschet, University of Lyon 2,
France / 295
A
Dataset Decomposition Approach to Data Mining and Machine Discovery
Blaz Zupan and Marko Bohanec, Institute Jozef Stefan, Slovenia; Ivan
Bratko, Institute Jozef Stefan and University of Ljubljana, Slovenia; Bojan
Cestnik, Temida and Institute Jozef Stefan, Slovenia / 299
KDD Invited Talk
From
Large to Huge: A Statistician’s Reactions to KDD & DM
Peter J. Huber, University of Bayreuth, Germany / 304
AAAI Digital Library
AAAI relies on your generous support through membership and donations. If you find these resources useful, we would be grateful for your support.