Peculiarities of fully connected neural network design for estimating the lipophilicity of organic compounds
Download article in PDF format
Authors: Pyakillya B. I., Goncharov V. I.
Annotation: The assessment of lipophilicity of small organic compounds plays a crucial role in the development and optimization of new drugs. Unfortunately, experimental methods require significant time and resources, including the use of laboratory equipment and reagents. Additionally, manual verification and data adjust-ment often increase the process's labor intensity. In contrast, computational methods like machine learning offer faster and less resource-intensive ways to assess lipophilicity, allowing for efficient processing of large data volumes and adaptation to complex relationships between molecular structure and lipo-philicity. Developing neural network models for lipophilicity assessment is challenging due to insufficient experimental data and high computational costs with graph neural network mod-els. This work presents an analysis of popular methods for describing chemical structures for building fully connected neural network models, less demanding in training data volume. Based on this analysis, features best describing organic com-pounds from an open lipophilicity dataset collected from the ChEMBL database are selected. The search for the optimal neural network model architecture for the chosen features is conducted.
Keywords: chemin-formatics, lipophilicity, neural network, modeling