| Peer-Reviewed

Unsupervised Dimensionality Reduction for High-Dimensional Data Classification

Received: 20 July 2017     Accepted: 9 August 2017     Published: 31 August 2017
Views:       Downloads:
Abstract

This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.

Published in Machine Learning Research (Volume 2, Issue 4)
DOI 10.11648/j.mlr.20170204.13
Page(s) 125-132
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2017. Published by Science Publishing Group

Keywords

Dimensionality Reduction, Machine Learning, Classification Problem, Handwritten Numeral Recognition

References
[1] Gaber, Mohamed Medhat, A. Zaslavsky, and S. Krishnaswamy. A Survey of Classification Methods in Data Streams. Data Streams. 2015:39-59.
[2] Su, Jiang, and H. Zhang. "A fast decision tree learning algorithm." National Conference on Artificial Intelligence AAAI Press, 2006:500-505.
[3] Serpen, Gursel, and S. Pathical. "Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble." International Conference on Machine Learning and Applications 2009:740-745.
[4] Fan, J., and Y. Fan. "High Dimensional Classification Using Features Annealed Independence Rules." Annals of Statistics 36.6(2008):2605.
[5] Miller, Alan. Subset selection in regression. Chapman & Hill/CRC, 2002.
[6] Fodor, I. K. "A survey of dimension reduction techniques." Neoplasia 7.5(2002):475-485.
[7] Mitchell, Tom M., J. G. Carbonell, and R. S. Michalski. Machine Learning. McGraw-Hill, 2003.
[8] Huang, Cheng Lung, and J. F. Dun. "A distributed PSO–SVM hybrid system with feature selection and parameter optimization." Applied Soft Computing 8.4(2008):1381-1391.
[9] Tsai, Flora S., and K. L. Chan. "Dimensionality reduction techniques for data exploration." International Conference on Information, Communications & Signal Processing IEEE, 2007:1-5.
[10] Hotelling, H. H. "Analysis of Complex Statistical Variables into Principal Components." British Journal of Educational Psychology 24.6(1933):417-520.
[11] Zigelman, G, R. Kimmel, and N. Kiryati. "Texture mapping using surface flattening via multi-dimensional scaling." IEEE Transactions on Visualization and Computer Graphics 2002:198-207.
[12] Kuang, Fangjun, W. Xu, and S. Zhang. "A novel hybrid KPCA and SVM with GA model for intrusion detection." Applied Soft Computing 18. C(2014):178-184.
[13] Bengio, Yoshua, et al. "Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering." International Conference on Neural Information Processing Systems MIT Press, 2003:177-184.
[14] Balasubramanian, M, and E. L. Schwartz. "The isomap algorithm and topological stability." Science 295.5552(2002):7.
[15] Gorban, Alexander N., et al. Principal Manifolds for Data Visualization and Dimension Reduction. Springer Berlin Heidelberg, 2008.
[16] Moore, B. "Principal component analysis in linear systems: Controllability, observability, and model reduction." IEEE Transactions on Automatic Control 26.1(2003):17-32.
[17] Wang, Jianzhong. Locally Linear Embedding. Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer Berlin Heidelberg, 2012:203-220.
[18] Egeren, Lawrence F. Multivariate Statistical Analysis. North-Holland Pub. Co, 1973.
[19] Kussul, Ernst, and T. Baidyk. "Improved method of handwritten digit recognition tested on MNIST database." Image & Vision Computing 22.12(2004):971-981.
[20] Xie, Keming, C. Mou, and G. Xie. "The multi-parameter combination mind-evolutionary-based machine learning and its application." 1.1(2000):183-187 vol.1.
[21] Burges, Christopher J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Kluwer Academic Publishers, 1998.
[22] Dietterich, Thomas G. "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization." Machine Learning 40.2(2000):139-157.
[23] Song, Yang, et al. IKNN: Informative K-Nearest Neighbor Pattern Classification. Knowledge Discovery in Databases: PKDD 2007. Springer Berlin Heidelberg, 2007:248-264.
[24] Andrew Cucchiara. "Applied Logistic Regression." Technometrics 34.1(1992):358-359.
[25] Cutler, Adele, D. R. Cutler, and J. R. Stevens. "Random Forests." Machine Learning 45.1(2012):157-176.
[26] Kohavi, Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." International Joint Conference on Artificial Intelligence Morgan Kaufmann Publishers Inc. 1995:1137-1143.
Cite This Article
  • APA Style

    Hany Yan, Hu Tianyu. (2017). Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Machine Learning Research, 2(4), 125-132. https://doi.org/10.11648/j.mlr.20170204.13

    Copy | Download

    ACS Style

    Hany Yan; Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach. Learn. Res. 2017, 2(4), 125-132. doi: 10.11648/j.mlr.20170204.13

    Copy | Download

    AMA Style

    Hany Yan, Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach Learn Res. 2017;2(4):125-132. doi: 10.11648/j.mlr.20170204.13

    Copy | Download

  • @article{10.11648/j.mlr.20170204.13,
      author = {Hany Yan and Hu Tianyu},
      title = {Unsupervised Dimensionality Reduction for High-Dimensional Data Classification},
      journal = {Machine Learning Research},
      volume = {2},
      number = {4},
      pages = {125-132},
      doi = {10.11648/j.mlr.20170204.13},
      url = {https://doi.org/10.11648/j.mlr.20170204.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170204.13},
      abstract = {This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Unsupervised Dimensionality Reduction for High-Dimensional Data Classification
    AU  - Hany Yan
    AU  - Hu Tianyu
    Y1  - 2017/08/31
    PY  - 2017
    N1  - https://doi.org/10.11648/j.mlr.20170204.13
    DO  - 10.11648/j.mlr.20170204.13
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 125
    EP  - 132
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20170204.13
    AB  - This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.
    VL  - 2
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • School of Mathematics, Jilin University, Changchun, China

  • School of Mathematics, Jilin University, Changchun, China

  • Sections