To Interpret or Not to Interpret PCA? This Is Our Question
Principal Component Analysis (PCA) is a central tool for analyzing data and social media data in particular. Typically, the data is projected on the first two PCs to obtain a twodimensional view, and trends and patterns are being examined. A key to making sense of the projected data is the semantic interpretation of the new axes (the PCs). To label the PCs, one usually looks at the top k vector entries in absolute value and assigns meaning according to them. The choice of k is done by “eyeballing” the vector. In this work we provide a computational framework to support this process and suggest an interpretability score, which measures how sensitive the interpretation step could be to the choice of k. Furthermore we give a visual method to choose the optimal k. We study our methodology in four social media platforms and discover that in two of them, Twitter and Instagram, interpretation can be done in a carefree manner, but in Steam and LinkedIn there is no natural labeling of the axes. This separation is clearly reflected in the interpretability score that each dataset received.