Edoardo Airoldi, Bradley Malin, and Latanya Sweeney
Online criminals have adapted traditional snail mail and door-to-door fraudulent schemes into electronic form. Increasingly, such schemes target an individual’s personal email, where they mingle among, and are masked by, honest communications. The targeting and conniving nature of these schemes are an infringement upon an individual’s personal privacy, as well as a threat to personal safety. We argue that state-of-the-art spam filtering systems fail to capture fraudulent intent hidden in the text of e-mails, but demonstrate how more robust systems can be engineered starting from existing AI tools. We illustrate how to design a learning system capable of accurately identifying the fraudulent indent within an e-mail in order to tackle, for example, the advance fee fraud scam. Further, we propose data structures, as well as statistical tests for them, which capture evolutionary patterns within e-mails that are not likely to be due to chance. Last, our system can serve as a guide for law enforcement agencies in cyber-investigations.