modelling

Modelling Panel

Descriptor

.csv

Algorithm

Training size

(0,1]

Email (optional)

Preprocessing

type	parameter	description
Standardization	Integer	mean removal and variance scaling [detail...]
StandardScaler: removes the mean and scales the data to unit variance; MinMaxScaler: rescales the data set such that all feature values are in the range [0, 1]; MaxAbsScaler: the absolute values are mapped in the range [0, 1]; RobustScaler: the centering and scaling statistics of this scaler are based on percentiles and are therefore not influenced by a few number of very large marginal outliers; Normalizer: rescales the vector for each sample to have unit norm, independently of the distribution of the samples;
Feature selection	Integer	used for feature selection/dimensionality reduction on sample sets [detail...]
SelectFromModel: Random Forest Classifiers are used to compute impurity-based feature importances, which in turn can be used to discard irrelevant features,the parameters means the number of tree; Chi-squared: selecting the best features based on univariate statistical tests, the parameters means the number of selected features; ANOVA: selecting the best features based on univariate statistical tests, the parameters means the number of selected features; Mutual_info: selecting the best features based on univariate statistical tests, the parameters means the number of selected features; Retain_all: no feature selection and the model will be trained with all features;

Setting Hyper-parameters

parameter	value range	description
tuning times	Integer	`[0, 1000]` the optimization times of hyper-parameters; [suggestion...]
The larger the value, the more likely it is to find better hyper-parameters and the longer it will take;
n_estimators	Integer	`[1, 9999]` number of boosting rounds; [suggestion...]
increasing this value will improve the learning ability of the model and the probability for model being overfitting;
learning rate	Float	`(0, 1]` step size shrinkage used in update to prevents overfitting; [suggestion...]
learning rate shrinks the feature weights to make the boosting process more conservative;
subsample	Float	`(0, 1]` subsample ratio of the training instances; [suggestion...]
decreasing this value can prevent overfitting;
max depth	Integer	`[1, 9999]` Maximum depth of a tree; [suggestion...]
Increasing this value will make the model more complex and more likely to overfit;
gamma	Float	`[0, 9999]` Minimum loss reduction required to make a further partition on a leaf node of the tree; [suggestion...]
The larger gamma is, the more conservative the algorithm will be;
min child weight	Float	`[0, 9999]` Minimum sum of instance weight (hessian) needed in a child; [suggestion...]
The larger min_child_weight is, the more conservative the algorithm will be;
colsample_bytree	Float	`(0, 1]` the subsample ratio of columns when constructing each tree; [suggestion...]
decreasing this value can prevent overfitting;;
colsample_bylevel	Float	`(0, 1]` the subsample ratio of columns when constructing each level; [suggestion...]
decreasing this value can prevent overfitting;
colsample_bynode	Float	`(0, 1]` the subsample ratio of columns when constructing each node(split); [suggestion...]
decreasing this value can prevent overfitting;
alpha	Float	`[0, 9999]` L1 regularization term on weights; [suggestion...]
Increasing this value will make model more conservative;
lambda	Float	`[0, 9999]` L1 regularization term on weights; [suggestion...]
Increasing this value will make model more conservative;
tuning times	Integer	`[0, 1000]` the optimization times of hyper-parameters; [suggestion...]
the larger the value is, the bigger probability of finding the best hyper-parameters;
C	Float	`(0, 9999]` Regularization parameter; [suggestion...]
Increasing this value will make model more conservative;
gamma	Float	`(0, 9999]` Kernel coefficient; [suggestion...]
None;
tuning times	Integer	`[0, 1000]` the optimization times of hyper-parameters; [suggestion...]
the larger the value is, the bigger probability of finding the best hyper-parameters;
n_estimators	Integer	`[1, 9999]` the number of trees in the forest; [suggestion...]
increasing this value will improve the learning ability of the model and the probability for model being overfitting;
max depth	Integer	`[1, 9999]` Maximum depth of a tree; [suggestion...]
Increasing this value will make the model more complex and more likely to overfit;
min samples leaf	Integer	`[1, 9999]` The minimum number of samples required to be at a leaf node; [suggestion...]
Increasing this value will make the model less complex and less likely to overfit;;
min samples split	Integer	`[2, 9999]` The minimum number of samples required to split an internal node; [suggestion...]
Increasing this value will make the model less complex and less likely to overfit;
min impurity decrease	Float	`[0, 9999]` The minimum number of samples required to split an internal node; [suggestion...]
Increasing this value will make the model less complex and less likely to overfit;
max features	StrOrInt	`[sqrt, log2]∪[1, 19999]` The number of features to consider when looking for the best split; [suggestion...]
Decreasing this value will make the model less complex and less likely to overfit;

Example: Descriptor | Result

Modelling

Modelling Panel

Preprocessing

Setting Hyper-parameters