Restructuring Databases for Knowledge Discovery by Consolidation and Link Formation

Henry G. Goldberg and Ted E. Senator, U.S. Department of the Treasury - Financial Crimes Enforcement Network (FinCEN)

Databases often inaccurately identify entities of interest. Two operations, consolidation and link formation, which complement the usual machine learning techniques that use similarity-based clustering to discover classifications, are proposed as essential components of KDD systems for certain applications. Consolidation relates identifiers present in a database to a set of real world entities (RWE’s) which are not uniquely identified in the database. Consolidation may also be viewed as a transformation of representation from the identifiers present in the original database to the RWE’s. Link formation constructs structured relationships between consolidated RWE’s through identifiers and events explicitly represented in the database. Consolidation and link formation are easily implemented as index creation in relational database management systems. An operational knowledge discovery system which identifies potential money laundering in a database of large cash transactions implements consolidation and link formation.

