This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.
Published in | Machine Learning Research (Volume 2, Issue 4) |
DOI | 10.11648/j.mlr.20170204.13 |
Page(s) | 125-132 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Dimensionality Reduction, Machine Learning, Classification Problem, Handwritten Numeral Recognition
[1] | Gaber, Mohamed Medhat, A. Zaslavsky, and S. Krishnaswamy. A Survey of Classification Methods in Data Streams. Data Streams. 2015:39-59. |
[2] | Su, Jiang, and H. Zhang. "A fast decision tree learning algorithm." National Conference on Artificial Intelligence AAAI Press, 2006:500-505. |
[3] | Serpen, Gursel, and S. Pathical. "Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble." International Conference on Machine Learning and Applications 2009:740-745. |
[4] | Fan, J., and Y. Fan. "High Dimensional Classification Using Features Annealed Independence Rules." Annals of Statistics 36.6(2008):2605. |
[5] | Miller, Alan. Subset selection in regression. Chapman & Hill/CRC, 2002. |
[6] | Fodor, I. K. "A survey of dimension reduction techniques." Neoplasia 7.5(2002):475-485. |
[7] | Mitchell, Tom M., J. G. Carbonell, and R. S. Michalski. Machine Learning. McGraw-Hill, 2003. |
[8] | Huang, Cheng Lung, and J. F. Dun. "A distributed PSO–SVM hybrid system with feature selection and parameter optimization." Applied Soft Computing 8.4(2008):1381-1391. |
[9] | Tsai, Flora S., and K. L. Chan. "Dimensionality reduction techniques for data exploration." International Conference on Information, Communications & Signal Processing IEEE, 2007:1-5. |
[10] | Hotelling, H. H. "Analysis of Complex Statistical Variables into Principal Components." British Journal of Educational Psychology 24.6(1933):417-520. |
[11] | Zigelman, G, R. Kimmel, and N. Kiryati. "Texture mapping using surface flattening via multi-dimensional scaling." IEEE Transactions on Visualization and Computer Graphics 2002:198-207. |
[12] | Kuang, Fangjun, W. Xu, and S. Zhang. "A novel hybrid KPCA and SVM with GA model for intrusion detection." Applied Soft Computing 18. C(2014):178-184. |
[13] | Bengio, Yoshua, et al. "Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering." International Conference on Neural Information Processing Systems MIT Press, 2003:177-184. |
[14] | Balasubramanian, M, and E. L. Schwartz. "The isomap algorithm and topological stability." Science 295.5552(2002):7. |
[15] | Gorban, Alexander N., et al. Principal Manifolds for Data Visualization and Dimension Reduction. Springer Berlin Heidelberg, 2008. |
[16] | Moore, B. "Principal component analysis in linear systems: Controllability, observability, and model reduction." IEEE Transactions on Automatic Control 26.1(2003):17-32. |
[17] | Wang, Jianzhong. Locally Linear Embedding. Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer Berlin Heidelberg, 2012:203-220. |
[18] | Egeren, Lawrence F. Multivariate Statistical Analysis. North-Holland Pub. Co, 1973. |
[19] | Kussul, Ernst, and T. Baidyk. "Improved method of handwritten digit recognition tested on MNIST database." Image & Vision Computing 22.12(2004):971-981. |
[20] | Xie, Keming, C. Mou, and G. Xie. "The multi-parameter combination mind-evolutionary-based machine learning and its application." 1.1(2000):183-187 vol.1. |
[21] | Burges, Christopher J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Kluwer Academic Publishers, 1998. |
[22] | Dietterich, Thomas G. "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization." Machine Learning 40.2(2000):139-157. |
[23] | Song, Yang, et al. IKNN: Informative K-Nearest Neighbor Pattern Classification. Knowledge Discovery in Databases: PKDD 2007. Springer Berlin Heidelberg, 2007:248-264. |
[24] | Andrew Cucchiara. "Applied Logistic Regression." Technometrics 34.1(1992):358-359. |
[25] | Cutler, Adele, D. R. Cutler, and J. R. Stevens. "Random Forests." Machine Learning 45.1(2012):157-176. |
[26] | Kohavi, Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." International Joint Conference on Artificial Intelligence Morgan Kaufmann Publishers Inc. 1995:1137-1143. |
APA Style
Hany Yan, Hu Tianyu. (2017). Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Machine Learning Research, 2(4), 125-132. https://doi.org/10.11648/j.mlr.20170204.13
ACS Style
Hany Yan; Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach. Learn. Res. 2017, 2(4), 125-132. doi: 10.11648/j.mlr.20170204.13
@article{10.11648/j.mlr.20170204.13, author = {Hany Yan and Hu Tianyu}, title = {Unsupervised Dimensionality Reduction for High-Dimensional Data Classification}, journal = {Machine Learning Research}, volume = {2}, number = {4}, pages = {125-132}, doi = {10.11648/j.mlr.20170204.13}, url = {https://doi.org/10.11648/j.mlr.20170204.13}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170204.13}, abstract = {This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.}, year = {2017} }
TY - JOUR T1 - Unsupervised Dimensionality Reduction for High-Dimensional Data Classification AU - Hany Yan AU - Hu Tianyu Y1 - 2017/08/31 PY - 2017 N1 - https://doi.org/10.11648/j.mlr.20170204.13 DO - 10.11648/j.mlr.20170204.13 T2 - Machine Learning Research JF - Machine Learning Research JO - Machine Learning Research SP - 125 EP - 132 PB - Science Publishing Group SN - 2637-5680 UR - https://doi.org/10.11648/j.mlr.20170204.13 AB - This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions. VL - 2 IS - 4 ER -