Feature Set Formation and Comparative Analysis of Classification Algorithms for AI Generated Code Detection
DOI: 10.21293/1818-0442-2025-28-4-121-126
DOI: 10.21293/1818-0442-2025-28-4-121-126
Abstract: The paper presents a comprehensive approach to to construct-ing a feature space for detecting artificially generated Python source code. We developed the Algorithmic_Analyzer class to extract 27 features categorized into four groups: basic code met-rics, structural patterns, keywords, and libraries. Additionally, lexical patterns are captured using word n-grams. Experiments using classical machine learning algorithms demonstrate that structural characteristics exhibit significantly higher signifi-cance than lexical features. The study identifies the most in-formative features for artificial code detection and establishes that the XGBClassifier model achieves the best performance, with an average F1_macro score of 0.90.
Keywords: code classification, feature analysis, language models, source code, machine learning
Authors and copyright holders:
—
For citation:
Bukina S. G., Harchenko S. S. Feature Set Formation and Comparative Analysis of Classification Algorithms for AI Generated Code Detection. Doklady Tomskogo gosudarstvennogo universiteta sistem upravleniya i radioelektroniki, 2025, vol. 28, no. 4, pp. 121–126. DOI: 10.21293/1818-0442-2025-28-4-121-126
Executive Secretary of the Editor’s Office
Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia
Phone / Fax: + 7 (3822) 701-582
Viktor N. Maslennikov
Executive Secretary of the Editor’s Office
Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia
Phone / Fax: + 7 (3822) 51-21-21 / 51-43-02