Evangelos Simoudis, Jiawei Han, Usama Fayyad
As we enter the true digital information era, one of the greatest challenges facing organizations and individuals is how to turn their rapidly expanding data stores into accessible, and actionable knowledge. Digital data sources are ubiquitous, created by a variety of means spanning a spectrum of activities: from a supermarket’s electronic scanner, to a bank’s automated teller machine, from a credit card reader, to a world wide web server, and the most intricate of technical instruments. While advances on data storage and retrieval continue at a breakneck pace, (several organizations have databases that today contain several hundreds of gigabytes, and in some instances terabytes of online data with millions of rows and hundreds of columns; within two years the multi-terabyte database will be common-place) the same cannot be asserted about the advances in information and knowledge extraction from large data sets. Only a very small percentage of the captured data is ever converted to actionable knowledge. The traditional approach of a human analyst, intimately familiar with a data set, serving as a conduit between raw data and synthesized knowledge by producing useful analyses and reports, is breaking down.