Water solublity (logS) database

In logS database, the aqueous solubility was expressed as logS, where S is the solubility at a temperature of 20-25°C in mol/L. These are two databases for our modeling. In reference [1], the data afforded by Tetko was used. This database includes 1290 organic compounds. The data set was converted from the SMILES flat file representation to the MACCS/sdf structured data file. In reference [2], some new molecules collected from literature were added. This database includes 1708 molecules.

Download the databases:

1. The database with 1290 molecules used in reference [1]
database.sdf (the SDF file includes 1290 molecules)
database.dat (the SMILES file includes 1290 molecules)
test_set1.sdf (the SDF file includes 21 molecules for external test)
database.doc (the predicted solubility using the drug-logS model)

2. The database with 1708 molecules used in reference [2]
solubility_2007.sdf (the SDF file includes 1708 molecules)

Reference
[1]. Tingjun Hou, Ke Xia, Wei Zhang, Xiaojie Xu, ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach, Journal of Chemical Information and Computer Sciences, 2004, 44, 266-275. [html] [PDF]
[2]. Junmei Wang, George Krudy, Tingjun Hou, George Holland, Xiaojie Xu, Development of reliable aqueous solubility models and their application in drug-like analysis, Journal of Chemical Information and Modeling, 2007, 47, 1395-1404. [html] [PDF]