Learning Uniform Semantic Features for Natural Language and Programming Language Globally, Locally and Sequentially

Yudong Zhang; Wenhao Zheng; Ming Li

doi:10.1609/aaai.v33i01.33015845

Authors

Yudong Zhang Nanjing University
Wenhao Zheng Nanjing University
Ming Li Nanjing University

DOI:

https://doi.org/10.1609/aaai.v33i01.33015845

Abstract

Semantic feature learning for natural language and programming language is a preliminary step in addressing many software mining tasks. Many existing methods leverage information in lexicon and syntax to learn features for textual data. However, such information is inadequate to represent the entire semantics in either text sentence or code snippet. This motivates us to propose a new approach to learn semantic features for both languages, through extracting three levels of information, namely global, local and sequential information, from textual data. For tasks involving both modalities, we project the data of both types into a uniform feature space so that the complementary knowledge in between can be utilized in their representation. In this paper, we build a novel and general-purpose feature learning framework called UniEmbed, to uniformly learn comprehensive semantic representation for both natural language and programming language. Experimental results on three real-world software mining tasks show that UniEmbed outperforms state-of-the-art models in feature learning and prove the capacity and effectiveness of our model.

Learning Uniform Semantic Features for Natural Language and Programming Language Globally, Locally and Sequentially

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription