Human intestinal absorption database

The dataset reported here includes 647 drug and drug-like molecules collected from more various literature sources. Ninty five compounds were elimiated from the data set, and the remaing database of 552 was used for obtained the regression models. The 552 compounds were divided into a training set 454 compounds and a test set of 98 compounds.

Correction: In the intial data set, gentamicin was indicated twice and eliminated from the data set.

All databasese used in reference [1] are compressed into a winzip file name HIA.zip, and the winzip file includes the following four files:
Data_set_647.sdf
The SDF file includes 647 molecules
Training_set_454.sdf
The SDF file includes 454 molecules
Test_set_98.sdf
The SDF file includes 98 molecules
Excluded_compunds.pdf
The table lists the compounds excluded from the regression models

The anothor winzip file is named HIA1.zip, and this winzip file includes the following two files used in reference [2]:
Training_set_480_SVM.sdf
The SDF file includes 480 molecules used as the training set for SVM
Test_set_98_SVM.sdf
The SDF file includes 98 molecules used for the test set for SVM

Database download: HIA.zip and HIA1.zip

Note: All the above files are password protected when compressed by using Winzip. Sign the license file and apply for the passwd from the ADME group.
License file download: license.pdf or license.doc

Reference
[1]. Tingjun Hou, Junmei Wang, Wei Zhang, Xiaojie Xu, ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification, Journal of Chemical Information and Modeling, 2007, 47, 208-218.[html] [PDF]
[2]. Tingjun Hou, Junmei Wang, Tingjun Hou, Junmei Wang, ADME evaluation in drug discovery. 8. The prediction of intestinal absorption by support vector machine, 2007, ASAP, [html] [PDF]