Help

Browser compatibility

OS	Version	Chrome	Firefox	Microsoft Edge	Safari
Linux	CentOS 7	not tested	61.0	n/a	n/a
MacOS	Catalina	86.0.4240.75	81.0.1	86.0.622.51	13.0.5
Windows	10	84.0.4147.105	82.0	44.18362.449.0	n/a

Job Submission

①	Select a receptor file in PDB format.
②	Select a ligand database file in mol2 format.
③	Define the binding site of protein-ligand complexes by providing either: reference ligand in mol2 format; pocket residue sequence number eg. HIS_14, THR_83, PHE_150; coordinates of binding site center eg. 16, 14, 3.8. If none is provided, the binding pocket will be defined by the last ligand in the uploaded ligand database.
④	Choose a software from QuickVina (Default) and Smina for docking. QuickVina is much faster than Smina.
⑤	If email is provided, you will received a email with a key file once the job is finished.
⑥	Options for protein preparation.
⑦	Options for ligand preparation.
⑧	Settings of docking parameters.
⑨	Example files for testing.
⑩	The blue button decides whether to show or hide the setting tables. Click the green button to submit your job.

Responsive image

Job Results

Responsive image

①	Click the blue button to download the result file of corresponding job.
②	The left part of the panel presents the information of top-100 ligands based on docking score; The right part of the panel shows the conformation of the corresponding receptor-ligand complex; If the checkbox in column 'site' is checked, the corresponding binding site (residues within 5 A of the ligand) will be shown on the right.
③	The interaction information of the conformation of the protein-ligand complex selected in Visualization panel. The detection of interactions are implemented by *oddt: Hydrogen bond: the distance between acceptor-donor atoms <= 3.5 Å; the angle is between 90° and 150°. Halogen bond: the distance between acceptor-donor atoms <= 4 Å; the angle is between 90° and 150°. Metal: the distance between acceptor-donor atoms <= 4 Å; the angle is between 90° and 150°. Salt bridge: the distance between acceptor-donor atoms <= 4 Å. Hydrophobic contact*: the distance between hydrophobe pairs <= 4 Å. The 5 kinds of interaction can be switched through the interaction button.
④	The interaction frequency statistic is based on the interaction information of the protein-ligand complexes in the Visualization panel for reference. Users can get a better understanding of binding mode between the receptor and the ligand (i.e. which kind of interaction and which residue are important for the binding).
⑤	The ligand cluster centers of the top-100 ligands listed above. The clustering is based on the euclidean distance of Morgan fingerprints generated from ligands and implemented with the help of *RDKit*.
⑥	The result file contains 3 file folders: comolex: top-100 protein-ligand compl-exes files in pdb format; files: conformation files of docked ligands in mol2 format; reports: 1) score.csv: docking score of protein-ligand complexes; 2) interac-tion.csv & data.csv: interaction statistics; 3) cluster.png & pocket.pdb(if residue sequence number is provided);

Job Submission

-	①②⑤⑥⑨⑩ are the same as these mentioned in Docking help page.
③	Select a inactive ligand database file in mol2 format.
④	Select a wait-to-be-screened ligand database file in mol2 format.
⑦	Display the descriptors that will be generated for the chosen tools.
⑧	Choose as least one descriptor generation tool to capture the interaction mode of protein-ligand complex and output the descriptors.
-	If decoys are uploaded, test ligand should also be uploaded and the active ligand should be uploaded through ②; the descriptors computed from actives and decoys form the training set while the descriptors generated from test ligand form the test set and output 2 CSV files, else output 1 CSV file.

Responsive image

Job Results

descriptor_file.csv
①	The result file contains 1 file folder named csvs containing 3 kinds of CSV file.
②	The 'descriptor_file.csv' stores the descriptors generated using the uploaded ligand database file. If decoys is uploaded, the decoys will also be included in descriptors generation and the descriptors will be used as training set.
③	If the test ligand is uploaded, the corresponding descriptors will be calculated and stored in 'test_descriptor_file.csv' file and can be used as test set.
④	Descriptors generated from various tools will be saved in separate CSV files.
⑤	The data structure in the 'descriptor_file.csv' or 'test_descriptor_file.csv' is shown as follows.
name	hydrogen bond	hydrophobic contacts	...(descriptor term_n)	halogen bond	class
CHEMBL355330	-6.449	-0.111	...	-7.537	1
...	...	...	...	...	...
ZINC19892971	-6.622	-0.102	...	-8.386	0

Job Submission

-	④⑧⑨ are the same as these mentioned in Docking help page.
①	Select a descriptor file in csv format, in which the data structure is the same as the one mentioned in Descriptors.
②	Choose one of the following three algorithms: eXtreme Gradient Boosting (Default); Support Vector Machine; Random Forest.
③	Input a float between 0.0 and 1.0 to represents the proportion of the dataset to include in the train split. The test set is used to test the generalization ability of the model.
⑤	Options for data preprocessing.
⑥	Options for feature engineering. If SelectFromModel is chosen, the parameter is the number of the base estimators; else, the parameter means the number of features retained for modelling.
⑦	Settings of hyper-parameters tuning. The value range of each hyper-parameter is shown in `black background`. Users can click the [suggestion] link to learn the impact on the model when hyper-parameter changes. Two numbers separated by '-' in the input box indicate the hyper-parameter range for tuning and ten fold cross validation is used to asses the model performannce. Methods for generating search space: uniform: uniform distribution of a given range [a, b]； uniformint: uniform distribution of integers of a given range [a, b]； randint: random distribution of integers of a given range [a, b]； loguniform: uniform distribution of a transformed range [exp(a), exp(b)]； choice: random choose from a list of options.

Responsive image

Job Results

Responsive image

①	Click the blue button to download the result file of corresponding job.
②	Model performance under different evaluation metrics.
③	Hyper-parameters of the model.
④	The features used in modelling and their importance score calculated by Random Forest Classifier.
⑤	The validation curve shows the change of the model's accuracy on the training set and validation set under the hyper-parameter tuning process. It gives information about whether the model is over-fitting or not.
⑥	The learning curve displays how the performance of the model changes with changes in hyper-parameters from the perspective of the F1 score.
⑦	The receiver operating characteristic curve plotted based on model performance on the test set. The bigger the area under the curve, the more accurate the model is.
⑧	The figures show the possible range of the optimal value of each hyperparameter found by the optimization algorithm implemented in *hyperopt* as the tuning round increases.
⑨	The figures show the impact of hyper-paramters on model performance.
⑩	The figure shows top-20 important features and their influence on model performance and is generated with *shap.The color represents the feature value (red* high, blue low). This reveals for example that a high num_hea-vy_atoms increases the predicted probability of being an active ligand.
-	The result file contains 2 file folders: config: 1) the config file contains model parameters and can be used in Screening module; 2) id.rsa file used for access to the result page. reports: 1) figure files in PNG format; 2) data of model performance, hyper-parameters and feature importance stored in CSV file.

Job Submission

-	④⑤⑥ are the same as these mentioned in Docking help page.
① ②	Upload CSV files containing descriptors of training set and test set respectively. The data structure should be the same as the one mentioned in Descriptors.
③	Upload the config file generated in Modelling module so that the best model constructed in Modelling module can be used to screen the unknown ligands in test set.

Responsive image

Job Results

active_ligand.csv
result.zip	The result file contains 1 file folder named report containing 1 CSV file. The CSV file record the ligand name and corresponding predicted probability of being an binder to the receptor.
name	score
CHEMBL559826	0.997828423976898
CHEMBL259247	0.953756630420684
...	...
ZINC64611921	0.649281680583953
CHEMBL556874	0.50817596912384

Job Submission

-	③⑦⑧ have been mentioned in Docking help page; ④ has been mentioned in Descriptors help page; ⑤⑥ have been mentioned in Modelling help page;
①	The 'Ligand' should be binders to the receptors. The 'Decoys' should be non-binders to the receptors. The 'Test ligand' are the wait-to-be-screened ligands that we do not know if they are binders or non-binders. Descriptors generated from 'Ligand' and 'Decoys' assemble the training set while descriptors calculated from 'Test ligand' form the test set.
②	Different from mentioned in Docking and Modelling help page, users can cancel docking or modelling by select option 'no' in the corresponding select box.

Responsive image

Job Results

Responsive image

-	①②③ have been mentioned in Modelling help page; ④⑤⑥⑦ have been mentioned in Docking help page.
④	Different from the one mentioned in the Docking help page, the numbers in the Score column are the probabilities of being a binder to the receptor instead of the docking score and the ligands in the table are top-100 binders predicted by the model.
⑥	The interaction frequency analysis implemented in the Pipeline module is based on the active ligands in the training set while the one in the Docking module is based on all the ligands uploaded. Therefore, the statistics in this module are more reliable and can be used as a reference for the binding mode of the receptor-ligand complex.
-	The result file contains 4 file folders: comolex: top-100 protein-ligand compl-exes files in pdb format; files: conformation files of docked ligands in mol2 format; csvs: 1) descriptor_file.csv: training set; 2) test_descriptor_file.csv: test set; 3) xx_score.csv: descriptors computed by single tools; reports: 1) score.csv: docking score of protein-ligand complexes; 2) interaction.csv & data.csv: interaction statistics; 3) bias_var.csv: the model accuracy on training set and validation set and F1 score on the validation set during hyper-parameters' tuning process; 4) final_parameter.csv: hyper-parameters of the best model; 5) model_metric.csv: model performance under different evaluation metrics; 6) feature_importance.csv: the features used in modelling and their importance score calculated by Random Forest Classifier; 7) active_ligand.csv: the result of virtual screening (the ligand name and corresponding predicted probability of being an binder to the receptor); 8) bias_var.png: the figure is plotted based on accuracy data in bias_var.csv; 9) learning_curve.png: the figure is plotted based on F1 score data in bias_var.csv; 10) hyperParameter_scatter.png: the figure is plotted based on the data of the model loss, hyper-parameter values and the tuning rounds; 11) roc.png: the receiver operating characteristic curve plotted based on model performance on the test set; 12) shap.png: the figure is plotted by *Shap* and is used for model interpretation; 13) cluster.png: this figure is generated by *RDKit*; 14) pocket.pdb: if residue sequence number is provided, the pocket of the receptor in PDB format will be produced;