In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

QSAR Model Reporting Formats. Examples of R code: feature selection and regression analysis. Figure S1: Data distribution of logBCF, BP, MP and logVP. Figures S2–S5: Relationship between model complexity and prediction errors as well as the plots of estimated values versus experimental data for logBCF, BP, MP, and logVP, respectively. Figure S6: Plots of leverage versus standardized residuals for logBCF, BP, MP, and logVP models. Table S1: Chemical product classes for training and test sets. Tables S2–S5: Regression statistics for logBCF, BP, MP, and logVP, respectively. Table S6: Applicability domains for logBCF, BP, MP, and logVP. Tables S7–S12: Chemicals with large prediction residuals for the six properties (PDF)

Chemical names, CAS registry number and SMILES as well as experimentally measured and estimated property values of the training and test sets (XLSX).

This dataset is associated with the following publication: Zang, Q., K. Mansouri, A. Williams, R. Judson, D. Allen, W.M. Casey, and N.C. Kleinstreuer. (Journal of Chemical Information and Modeling) In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 57(1): 36-49, (2017).

Data and Resources

Additional Info

Field Value
Maintainer Alexander Hanf
Last Updated March 6, 2021, 16:51 (EST)
Created March 6, 2021, 16:51 (EST)
Identifier https://doi.org/10.23719/1504459
Modified 2017-12-22
accessLevel public
bureauCode {020:00}
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
encoding utf8
harvest_url http://catalog.data.gov/dataset/1bf45632-6966-4f37-9004-a447389cd133
license https://pasteur.epa.gov/license/sciencehub-license.html
programCode {020:095}
publisher U.S. EPA Office of Research and Development (ORD)
publisher_hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
references {https://doi.org/10.1021/acs.jcim.6b00625}
resource-type Dataset
source_datajson_identifier true
source_hash fc54d26bbbb7e094d22df00fb0cff73b305609e8
source_schema_version 1.1