Feature Set Formation and Comparative Analysis of Classification Algorithms for AI Generated Code Detection

DOI: 10.21293/1818-0442-2025-28-4-121-126

Download article in PDF format

Abstract: The paper presents a comprehensive approach to to construct-ing a feature space for detecting artificially generated Python source code. We developed the Algorithmic_Analyzer class to extract 27 features categorized into four groups: basic code met-rics, structural patterns, keywords, and libraries. Additionally, lexical patterns are captured using word n-grams. Experiments using classical machine learning algorithms demonstrate that structural characteristics exhibit significantly higher signifi-cance than lexical features. The study identifies the most in-formative features for artificial code detection and establishes that the XGBClassifier model achieves the best performance, with an average F1_macro score of 0.90.

Keywords: code classification, feature analysis, language models, source code, machine learning

Authors and copyright holders:

For citation:
Bukina S. G., Harchenko S. S. Feature Set Formation and Comparative Analysis of Classification Algorithms for AI Generated Code Detection. Doklady Tomskogo gosudarstvennogo universiteta sistem upravleniya i radioelektroniki, 2025, vol. 28, no. 4, pp. 121–126. DOI: 10.21293/1818-0442-2025-28-4-121-126

Editorial office address

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 701-582

  journal@tusur.ru

 

Viktor N. Maslennikov

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 51-21-21 / 51-43-02

Subscription for updates