Unsupervised Dimensionality Reduction for High-Dimensional Data Classification

Hany Yan; Hu Tianyu

doi:doi:10.11648/j.mlr.20170204.13

| Peer-Reviewed

Unsupervised Dimensionality Reduction for High-Dimensional Data Classification

Hany Yan, Hu Tianyu

Published in Machine Learning Research (Volume 2, Issue 4)

Received: 20 July 2017 Accepted: 9 August 2017 Published: 31 August 2017

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.

Published in	Machine Learning Research (Volume 2, Issue 4)
DOI	10.11648/j.mlr.20170204.13
Page(s)	125-132
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2017. Published by Science Publishing Group

Keywords

Dimensionality Reduction, Machine Learning, Classification Problem, Handwritten Numeral Recognition

References

[1]	Gaber, Mohamed Medhat, A. Zaslavsky, and S. Krishnaswamy. A Survey of Classification Methods in Data Streams. Data Streams. 2015:39-59.
[2]	Su, Jiang, and H. Zhang. "A fast decision tree learning algorithm." National Conference on Artificial Intelligence AAAI Press, 2006:500-505.
[3]	Serpen, Gursel, and S. Pathical. "Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble." International Conference on Machine Learning and Applications 2009:740-745.
[4]	Fan, J., and Y. Fan. "High Dimensional Classification Using Features Annealed Independence Rules." Annals of Statistics 36.6(2008):2605.
[5]	Miller, Alan. Subset selection in regression. Chapman & Hill/CRC, 2002.
[6]	Fodor, I. K. "A survey of dimension reduction techniques." Neoplasia 7.5(2002):475-485.
[7]	Mitchell, Tom M., J. G. Carbonell, and R. S. Michalski. Machine Learning. McGraw-Hill, 2003.
[8]	Huang, Cheng Lung, and J. F. Dun. "A distributed PSO–SVM hybrid system with feature selection and parameter optimization." Applied Soft Computing 8.4(2008):1381-1391.
[9]	Tsai, Flora S., and K. L. Chan. "Dimensionality reduction techniques for data exploration." International Conference on Information, Communications & Signal Processing IEEE, 2007:1-5.
[10]	Hotelling, H. H. "Analysis of Complex Statistical Variables into Principal Components." British Journal of Educational Psychology 24.6(1933):417-520.
[11]	Zigelman, G, R. Kimmel, and N. Kiryati. "Texture mapping using surface flattening via multi-dimensional scaling." IEEE Transactions on Visualization and Computer Graphics 2002:198-207.
[12]	Kuang, Fangjun, W. Xu, and S. Zhang. "A novel hybrid KPCA and SVM with GA model for intrusion detection." Applied Soft Computing 18. C(2014):178-184.
[13]	Bengio, Yoshua, et al. "Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering." International Conference on Neural Information Processing Systems MIT Press, 2003:177-184.
[14]	Balasubramanian, M, and E. L. Schwartz. "The isomap algorithm and topological stability." Science 295.5552(2002):7.
[15]	Gorban, Alexander N., et al. Principal Manifolds for Data Visualization and Dimension Reduction. Springer Berlin Heidelberg, 2008.
[16]	Moore, B. "Principal component analysis in linear systems: Controllability, observability, and model reduction." IEEE Transactions on Automatic Control 26.1(2003):17-32.
[17]	Wang, Jianzhong. Locally Linear Embedding. Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Springer Berlin Heidelberg, 2012:203-220.
[18]	Egeren, Lawrence F. Multivariate Statistical Analysis. North-Holland Pub. Co, 1973.
[19]	Kussul, Ernst, and T. Baidyk. "Improved method of handwritten digit recognition tested on MNIST database." Image & Vision Computing 22.12(2004):971-981.
[20]	Xie, Keming, C. Mou, and G. Xie. "The multi-parameter combination mind-evolutionary-based machine learning and its application." 1.1(2000):183-187 vol.1.
[21]	Burges, Christopher J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Kluwer Academic Publishers, 1998.
[22]	Dietterich, Thomas G. "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization." Machine Learning 40.2(2000):139-157.
[23]	Song, Yang, et al. IKNN: Informative K-Nearest Neighbor Pattern Classification. Knowledge Discovery in Databases: PKDD 2007. Springer Berlin Heidelberg, 2007:248-264.
[24]	Andrew Cucchiara. "Applied Logistic Regression." Technometrics 34.1(1992):358-359.
[25]	Cutler, Adele, D. R. Cutler, and J. R. Stevens. "Random Forests." Machine Learning 45.1(2012):157-176.
[26]	Kohavi, Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." International Joint Conference on Artificial Intelligence Morgan Kaufmann Publishers Inc. 1995:1137-1143.

Cite This Article

Plain Text BibTeX RIS

APA Style

Hany Yan, Hu Tianyu. (2017). Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Machine Learning Research, 2(4), 125-132. https://doi.org/10.11648/j.mlr.20170204.13

Copy | Download

ACS Style

Hany Yan; Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach. Learn. Res. 2017, 2(4), 125-132. doi: 10.11648/j.mlr.20170204.13

Copy | Download

AMA Style

Hany Yan, Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach Learn Res. 2017;2(4):125-132. doi: 10.11648/j.mlr.20170204.13

Copy | Download

@article{10.11648/j.mlr.20170204.13,
  author = {Hany Yan and Hu Tianyu},
  title = {Unsupervised Dimensionality Reduction for High-Dimensional Data Classification},
  journal = {Machine Learning Research},
  volume = {2},
  number = {4},
  pages = {125-132},
  doi = {10.11648/j.mlr.20170204.13},
  url = {https://doi.org/10.11648/j.mlr.20170204.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170204.13},
  abstract = {This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.},
 year = {2017}
}

Copy | Download

TY  - JOUR
T1  - Unsupervised Dimensionality Reduction for High-Dimensional Data Classification
AU  - Hany Yan
AU  - Hu Tianyu
Y1  - 2017/08/31
PY  - 2017
N1  - https://doi.org/10.11648/j.mlr.20170204.13
DO  - 10.11648/j.mlr.20170204.13
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 125
EP  - 132
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20170204.13
AB  - This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.
VL  - 2
IS  - 4
ER  -

Copy | Download

Author Information

Hany Yan

School of Mathematics, Jilin University, Changchun, China
Hu Tianyu

School of Mathematics, Jilin University, Changchun, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Hany Yan, Hu Tianyu. (2017). Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Machine Learning Research, 2(4), 125-132. https://doi.org/10.11648/j.mlr.20170204.13

Copy | Download

ACS Style

Hany Yan; Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach. Learn. Res. 2017, 2(4), 125-132. doi: 10.11648/j.mlr.20170204.13

Copy | Download

AMA Style

Hany Yan, Hu Tianyu. Unsupervised Dimensionality Reduction for High-Dimensional Data Classification. Mach Learn Res. 2017;2(4):125-132. doi: 10.11648/j.mlr.20170204.13

Copy | Download

@article{10.11648/j.mlr.20170204.13,
  author = {Hany Yan and Hu Tianyu},
  title = {Unsupervised Dimensionality Reduction for High-Dimensional Data Classification},
  journal = {Machine Learning Research},
  volume = {2},
  number = {4},
  pages = {125-132},
  doi = {10.11648/j.mlr.20170204.13},
  url = {https://doi.org/10.11648/j.mlr.20170204.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170204.13},
  abstract = {This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.},
 year = {2017}
}

Copy | Download

TY  - JOUR
T1  - Unsupervised Dimensionality Reduction for High-Dimensional Data Classification
AU  - Hany Yan
AU  - Hu Tianyu
Y1  - 2017/08/31
PY  - 2017
N1  - https://doi.org/10.11648/j.mlr.20170204.13
DO  - 10.11648/j.mlr.20170204.13
T2  - Machine Learning Research
JF  - Machine Learning Research
JO  - Machine Learning Research
SP  - 125
EP  - 132
PB  - Science Publishing Group
SN  - 2637-5680
UR  - https://doi.org/10.11648/j.mlr.20170204.13
AB  - This paper carries on research surrounding the influences produced by dimensionality reduction on machine learning classification effect. Firstly, paper constructs the analysis architecture of data dimension reduction classification, combines the two different unsupervised dimension reduction methods, locally linear embedding (LLE) and principal component analysis (PCA) with the five machine learning classification methods: Gradient Boosting Decision Tree (GBDT), Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Logistic Regression. And then uses the handwritten digital identification dataset to analyze the classification performance of these five classification methods on different dimension datasets by different dimensionality reduction methods. The analysis shows that using the appropriate dimensionality reduction method for dimensionality reduction classification can effectively improve the classification accuracy; the dimensionality reduction classification effect of non-linear dimensionality reduction method is generally better than the linear dimensionality reduction method; different machine learning classification algorithms have significant differences in the sensitivity of dimensions.
VL  - 2
IS  - 4
ER  -

Copy | Download