Enalos KNIME nodes

Blending cheminformatics, bioinformatics and nanoinformatics tools under KNIME

Categories:

Enalos family nodes

Domain-APD

Applicability Domain (area of reliable predictions) based on the Euclidean distances. The applicability domain must be defined to flag compounds/samples for which predictions of a developed model may be unreliable. In this node similarity measurements are used to define the domain of applicability of the model based on the Euclidean distances among all training compounds and the test or virtual screening compounds. A threshold value is calculated and a prediction is considered reliable only if distances between train and test/screening compounds are lower than this threshold (Zhang et al. 2006).

Input Training set table and test/virtual screening set table (only the descriptors involved in modelling)

Output A table containing for each compound of the test/virtual screening set, the result "reliable"/"unreliable"

Domain-Leverage

Applicability Domain (area of reliable predictions) based on the extent of extrapolation. Extent of extrapolation is one simple approach to define the applicability domain and it is based on the the diagonal elements of the Hat matrix (leverage values h). These values reflect the similarity of test/screening samples to the training set (distance from the training set’s centroid) based on the descriptor values used in the model development. The limits of the applicability domain are determined by a threshold leverage value h*. The prediction for a test sample is considered reliable if h is lower than h* (Golbraikh and Tropsha 2020).

Input Training set table and test/virtual screening set table (only the descriptors involved in modelling)

Output A table containing for each compound of the test/virtual screening set, the result "reliable"/"unreliable"

Model Acceptability Criteria

This node gives information about the quality of fit and predictive ability of a continuous QSAR model based on the criteria proposed by Golbraikh and Tropsha, 2002.

Input A table containing for the test set the predicted (ypred) and the actual values (yexp) of the dependent variable, and a table containing the actual dependent variable values for the training set (ytr).

Output The quality of fit and predictive ability statistics of a continuous QSAR-type model along with an indication for the fulfilment of the relevant criteria.

Mold2

Mold2 calculates a large and diverse set of molecular descriptors (777) encoding two-dimensional chemical structure information. The software is developed by the Center for Bioinformatics at the National Center for Toxicological Research (NCTR).

Input A licensed installation of Mold2 file (exe file) and the compound structures as an SDF chemical file

Output A table containing for each compound the values of 777 molecular descriptors calculated with Mold2

For more information, visit the KNIME community nodes page.

Contact NovaMechanics Ltd

info[at]novamechanics[dot]com

NovaMechanics Ltd

@EnalosTools

NovaMechanics Ltd