Data splitting method for building machine learning models

Download article in PDF format

Authors: Sarin K. S.

Annotation: A method for splitting into training and validation samples for constructing predictive machine learning models is presented. The method is aimed at maintaining the proximity of samples to the original data. Proximity means minimal deviation of the characteristics of data features. Preserving proximity allows reducing information loss when constructing models, compared to random division, and thereby increasing the generalizing predictive ability. Using training data, many alternative models with structural characteristics of varying complexity are con-structed, and the most accurate model is selected using valida-tion data. Experiments were conducted to construct fuzzy classi-fiers with data division. The application of the method showed an increase in the accuracy of classification and interpretability of models compared to random division and without division of the original data.

Keywords: fuzzy classifier, binary optimization algorithms, classification, machine learning

Editorial office address

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 701-582

  journal@tusur.ru

 

Viktor N. Maslennikov

Executive Secretary of the Editor’s Office

 Editor’s Office: 40 Lenina Prospect, Tomsk, 634050, Russia

  Phone / Fax: + 7 (3822) 51-21-21 / 51-43-02

Subscription for updates