Violence Rating Prediction from Movie Scripts
Violent content in movies can influence viewers’ perception of the society. For example, frequent depictions of certain demographics as perpetrators or victims of abuse can shape stereotyped attitudes. In this work, we propose to characterize aspects of violent content in movies solely from the language used in the scripts. This makes our method applicable to a movie in the earlier stages of content creation even before it is produced. This is complementary to previous works which rely on audio or video post production. Our approach is based on a broad range of features designed to capture lexical, semantic, sentiment and abusive language characteristics. We use these features to learn a vector representation for (1) complete movie, and (2) for an act in the movie. The former representation is used to train a movie-level classification model, and the latter, to train deep-learning sequence classifiers that make use of context. We tested our models on a dataset of 732 Hollywood scripts annotated by experts for violent content. Our performance evaluation suggests that linguistic features are a good indicator for violent content. Furthermore, our ablation studies show that semantic and sentiment features are the most important predictors of violence in this data. To date, we are the first to show the language used in movie scripts is a strong indicator of violent content. This offers novel computational tools to assist in creating awareness of storytelling.