Human intestinal absorption database

The dataset reported here includes 647 drug and drug-like molecules collected from more various literature sources. Ninty five compounds were elimiated from the data set, and the remaing database of 552 was used for obtained the regression models. The 552 compounds were divided into a training set 454 compounds and a test set of 98 compounds.

Correction: In the intial data set, gentamicin was indicated twice and eliminated from the data set.

All databasese used in reference [1] are compressed into a winzip file name, and the winzip file includes the following four files:
The SDF file includes 647 molecules
The SDF file includes 454 molecules
The SDF file includes 98 molecules
The table lists the compounds excluded from the regression models

The anothor winzip file is named, and this winzip file includes the following two files used in reference [2]:
The SDF file includes 480 molecules used as the training set for SVM
The SDF file includes 98 molecules used for the test set for SVM

Database download: and

Note: All the above files are password protected when compressed by using Winzip. Sign the license file and apply for the passwd from the ADME group.
License file download: license.pdf or license.doc

[1]. Tingjun Hou, Junmei Wang, Wei Zhang, Xiaojie Xu, ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification, Journal of Chemical Information and Modeling, 2007, 47, 208-218.[html] [PDF]
[2]. Tingjun Hou, Junmei Wang, Tingjun Hou, Junmei Wang, ADME evaluation in drug discovery. 8. The prediction of intestinal absorption by support vector machine, 2007, ASAP, [html] [PDF]