George Berg, Ian Davidson, Ming-Yuan Duan, and Goutam Paul
Steganography is the field of hiding messages in apparently innocuous media (e.g. images), and steganalysis is the field of detecting these covert messages. Almost all steganalysis consists of hand-crafted tests or human visual inspection to detect whether a file contains a message hidden by a specific steganography algorithm. These approaches are very fragile -- trivial changes in a steganography algorithm will often render a steganalysis approach useless, and human inspection does not scale. We propose a machine learning (ML) approach to steganalysis. First, a media file is represented as a canvas -- the available space within the file to hide a message. Those features that can distinguish clean from stegobearing files are then selected. We use ML algorithms to distinguish clean and stego-bearing files. The results reported here show that ML algorithms work in both content- and compression-based image formats, outperforming at least one current hand crafted steganalysis technique in the latter. Our current work can detect previously seen (trained on) steganography techniques, and we discuss extensions that we believe will be able to detect steganography using more sophisticated algorithms, as well as the use of previously unseen steganography algorithms.