In bioinformatics research, the dimension of data after feature extraction may be very high. As the number of features increases, the model will become very complicated, which can easily lead to overfitting. Although the dimensionality reduction method mentioned above can solve the problem of excessively high data dimensions to a certain extent, it requires people to constantly test different feature sizes. In our previous work, we developed the MRMD1.0 and MRMD2.0 softwares for feature ranking and dimension reduction.

Here, we propose a new version called MRMD3.0. It is a method that integrates multiple feature ranking algorithms,Which has the following advantages:

  • • built-in multiple feature ranking algorithms
  • • four feature link Analysis Algorithms.
  • • automatically infer proper dimensions size
  • • generate five charts for data analysis
  • • VProvides built-in feature selection method's interface for users
  • • MRMD3.0 can reduce the dimension of user-specified feature intervals
  • ... ...

Feature Rank Method

note:WebServer currently only has built-in integration strategy, the following single method can use our offline command version
  • Variance Threshold
  • Chisqure
  • Linear model emthod ( 1. lasso, 2. ridge, 3.elasticnet)
  • mutual inforation( 1.MI 2.NMI 3.MIC)
  • minimum Redundancy - Maximum Relevance
  • Max-Relevance-Max-Distance
  • Recursive Feature Elimination(1. LogisticRegression, 2.SVM, 3.DecisionTreeClassifier)
  • tree_feature_importance(1. DecisionTreeClassifier, 2. RandomForestClassifier, 3. GradientBoostingClassifier 4.ExtraTreesClassifier)

Link Analysis Strategy

  • PageRank
  • TrustRank
  • LeaderRank
  • HITS (Authority and Hub)

The supported file formats are :

  • csv
  • arff
  • libsvm
Usage Example
python3 [-h] [-s S] -i I [-e E] [-l L] [-n N] [-t T] [-c {RandomForest,SVM,Bayes}] [-o O] [-p P] [-m M] [-j J] [-f F] [-r {PageRank,Hits_a,Hits_h,LeaderRank,TrustRank}]

python3 -i test.csv -o out.csv

# parameters description
-s, --start start feature index, default=1
-i, --inputfile input file (require:arff ,csv or libsvm format) -e, --end
-e, --end/td> end index, default=-1
-l, --length step length, default=1
-n, --n_dim mrmd3.0 features top n,default=-1
-t, --type_metric evaluation metric, default=f1
-c,--classifier cross vaildaion classifier r {RandomForest,SVM,Bayes}, default=RandomForest
-m, metrics_file output the metrics file's name
-o, --outfile output the dimensionality reduction file's name
-f,--topn select top n features to chart
--rank_method the rank method for features,choices=["PageRank","Hits_a","Hits_h","LeaderRank","TrustRank"],default="PageRank"