Istvan Jonyer, Prach Apiratikul, and Johnson Thomas, Oklahoma State University
In this work we introduce a novel method for source code fingerprinting based on frequent pattern discovery using a graph grammar induction system, and use it for detecting cases of plagiarism. This approach is radically different from others in that we are not looking for similarities between documents, but similarities between fingerprints, which are made up of recurring patterns within the same source code. The advantage to our approach is that fingerprints consist of any part of the text, and has no connection to functionality of the code. Rather, it concentrates on the habits of the coder, which, in most cases, will be very hard to identify by a plagiarizer, and almost impossible to remove.