Technique for determining the author of software code based on multi-view

Download article in PDF format

Authors: Kurtukova A. V.

Annotation: This paper presents a new method for identifying the author of software code based on a multi-view approach. The aim of the study is to improve the accuracy and robustness of authorship identification by combining different representations of soft-ware code: source code, abstract syntax tree, control flow graph, and disassembled code. Modern machine learning meth-ods were used to build models, allowing for the integration and analysis of complex features from different sources. The exper-iments showed that the developed multi-view architecture pro-vides a significant improvement in the quality of identification compared to traditional approaches using only one representa-tion of the code. Thus, in tasks with a closed set of authors, accuracy and F1-macro values of up to 0.97 were achieved, and on open sets, high resistance to the emergence of new authors and variability of programming styles was noted. In the author verification task, complex features made it possible to achieve accuracy of up to 0.98 and reduce the EER error to 0.04.

Keywords: verification, au-thorship, graph representation, disassembler, source code, software

Editorial office address

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 701-582

  journal@tusur.ru

 

Viktor N. Maslennikov

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 51-21-21 / 51-43-02

Subscription for updates