The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.
Published in | International Journal of Environmental Monitoring and Analysis (Volume 6, Issue 3) |
DOI | 10.11648/j.ijema.20180603.12 |
Page(s) | 77-83 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2018. Published by Science Publishing Group |
ED-LDA, Probabilistic, Environmental Data, Social Network, Data Mining
[1] | Robert C. Machine Learning, a Probabilistic Perspective [J]. Chance, 2014, 27(2):62-63. |
[2] | Boneschanscher M P, Evers W H, Geuchies J J, et al. Long-range orientation and atomic attachment of nanocrystals in 2D honeycomb superlattices [J]. Science, 2014, 344(6190):1377. |
[3] | Schwarz C. ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation [J]. Stata Journal, 2018, 18. |
[4] | Turney, Peter D, Pantel, et al. From frequency to meaning: vector space models of semantics [J]. Journal of Artificial Intelligence Research, 2010, 37(1):141-188. |
[5] | Xie L, Li G, Xiao M, et al. Novel classification method for remote sensing images based on information entropy discretization algorithm and vector space model [J]. Computers & Geosciences, 2016, 89(C):252-259. |
[6] | Hebballi V, Rojit V. Latent Semantic Analysis (LSA) based object recognition and clustering[C]// International Conference on Green Computing and Internet of Things. IEEE, 2016:416-421. |
[7] | Zhang M, Li P, Wang W. An index-based algorithm for fast on-line query processing of latent semantic analysis [J]. Plos One, 2017, 12(5):e0177523. |
[8] | Littman M L, Dumais S T, Landauer T K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing [M]// Cross-Language Information Retrieval. Springer US, 1998:51-62. |
[9] | Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005. |
[10] | Hofmann T. Unsupervised Learning by Probabilistic Latent Semantic Analysis [J]. Machine Learning, 2001, 42(1-2):177-196. |
[11] | Wu X, Yan J, Liu N, et al. Probabilistic latent semantic user segmentation for behavioral targeted advertising[C]// ACM SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising, Paris, France, June. DBLP, 2009:10-17. |
[12] | Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022. |
[13] | Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Neural information processing systems 15. Cambridge, MA: MIT Press. |
[14] | Chae, B. K. (2015). Insights from hashtag# supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research. International Journal of Production Economics, 165, 247-259. |
[15] | Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005. |
APA Style
Lei Feng, Jose López, Li Feng, Sheng Zhang, Bormin Huang, et al. (2018). Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. International Journal of Environmental Monitoring and Analysis, 6(3), 77-83. https://doi.org/10.11648/j.ijema.20180603.12
ACS Style
Lei Feng; Jose López; Li Feng; Sheng Zhang; Bormin Huang, et al. Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. Int. J. Environ. Monit. Anal. 2018, 6(3), 77-83. doi: 10.11648/j.ijema.20180603.12
AMA Style
Lei Feng, Jose López, Li Feng, Sheng Zhang, Bormin Huang, et al. Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. Int J Environ Monit Anal. 2018;6(3):77-83. doi: 10.11648/j.ijema.20180603.12
@article{10.11648/j.ijema.20180603.12, author = {Lei Feng and Jose López and Li Feng and Sheng Zhang and Bormin Huang and Fang Fang and Chongming Li}, title = {Topic Modeling of Environmental Data on Social Networks Based on ED-LDA}, journal = {International Journal of Environmental Monitoring and Analysis}, volume = {6}, number = {3}, pages = {77-83}, doi = {10.11648/j.ijema.20180603.12}, url = {https://doi.org/10.11648/j.ijema.20180603.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijema.20180603.12}, abstract = {The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.}, year = {2018} }
TY - JOUR T1 - Topic Modeling of Environmental Data on Social Networks Based on ED-LDA AU - Lei Feng AU - Jose López AU - Li Feng AU - Sheng Zhang AU - Bormin Huang AU - Fang Fang AU - Chongming Li Y1 - 2018/07/23 PY - 2018 N1 - https://doi.org/10.11648/j.ijema.20180603.12 DO - 10.11648/j.ijema.20180603.12 T2 - International Journal of Environmental Monitoring and Analysis JF - International Journal of Environmental Monitoring and Analysis JO - International Journal of Environmental Monitoring and Analysis SP - 77 EP - 83 PB - Science Publishing Group SN - 2328-7667 UR - https://doi.org/10.11648/j.ijema.20180603.12 AB - The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet. VL - 6 IS - 3 ER -