MRMD3.0 | Spark Version| Chinese

MRMD3.0使用了多种特征选择方法,并集合了PageRank,LeaderRank,Hits和TrustRank算法。

1.安装:

代码: github

环境(如若安装anaconda,无需执行下面的命令):

pip3 install -r requirements.txt --ignore-installed

2. 参数:

parameters description
-s, --start start index, default=1
-i, --inputfile input file (require:arff ,csv or libsvm format)
-e, --end end index, default=-1
-l, --length step length, default=1
-n, --n_dim mrmd2.0 features top n,default=-1
-t, --type_metric evaluation metric, default=f1
-m, --metrics_file output the metrics file’s name
-o, --outfile output the dimensionality reduction file’s name
-p, --picture The scatter plots before and after dimension reduction are generated by tsne,defalult=false
-r, --rank_method the rank method for features,choices=[“PageRank”,“Hits_a”,“Hits_h”,“LeaderRank”,“TrustRank”],default=“PageRank”
—————————————————— ————————————————

3.用法样例

python3  mrmd2.0.py  -i test.csv -o out.csv -r PageRank
python3  mrmd3.0.py  -i test.csv -o out.csv -r LeaderRank
python3  mrmd3.0.py  -i test.csv -o out.csv -r TrustRank
python3  mrmd3.0.py  -i test.csv -o out.csv -r Hits_a
python3  mrmd3.0.py  -i test.csv -o out.csv -r Hits_h

4. mrmd3.0中使用的特征选择方法:

method the number of the implement method
anova *1 f_classif
chisquare *1 chi2
F value *1 f_regression
linear model *3 Lasso,LogisticRegression,Ridge
mutual inforamtion *3 MI NMI MIC
mrmd *3 pearson+Euclidean/Tanimoto/Cosine
mrmr *2 miq
recursive feature elimination *5 inearSVC,LogisticRegression, RandomForestClassifier,GradientBoostingClassifier, ComplementNB
tree_feature_importance *3 DecisionTreeClassifier,RandomForestClassifier,GradientBoostingClassifier

联系方式: heshida@tju.edu.cn