This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.
Published in | Machine Learning Research (Volume 2, Issue 3) |
DOI | 10.11648/j.mlr.20170203.14 |
Page(s) | 105-109 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Natural Language Processing, Chinese Word Segmentation, Hidden Markov Model, Maximum Entropy Model, Conditional Random Field, Automatic Proofreading
[1] | John Lafferty, Andrew McCallum, F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international Conference on Machine Leaning. San Francisco, USA. 2001: 282-289. |
[2] | Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [A]. Processing of the International Conference on Machine Learning (ICML-2001) [C]. Williams college, MA, 2001: 282-289. |
[3] | Pinto D, McCallum A, Wei X et al. Table extraction using conditional random fields [A]. Proceedings of the 26th ACM SIGm [C], Toronto, Canada, 2003: 235-242. |
[4] | David Palmer A Trainable Rule-based Algorithm for Word Segmentation 1997. |
[5] | Berkeley, California, A new statistical formula for Chinese text segmentation incorporating contextual information. United States Pages: 82-89 Year of Publication: 1999. |
[6] | Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition proceedings of The IEEE 77 (2): pp. ZS7-286, 1989. |
[7] | Zhou, GD., Su J. Named entity Recognition using all HMM-based chunk tagger. 2002. |
[8] | E. T. Jaynes. information Theory and Statistica Imeehanics. 1957. |
[9] | J. R. Crran and S. Clark Investigatigating GIS and Smoothing for Maximum Entropy Tggers. Proceedings of the llh Conference of the Europe Chapter of the Association of Computation Linguistics (EACL), Pages 91—98, Budapest, Hungary. 2003. |
[10] | Tan Y’Yao T, Chea Q ET al. Applying conditional random fields to Chinese shallow parsing |
[11] | Proceedings of CICLing-2005 [c], Mexico City, Mexico, 2005: 167-176. |
[12] | Kudo T, Yamamoto K, Matsumoto Y. Applying Conditional Random Fields to Japanese Morphological Analysis [A]. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004) [C], Barcelona, 2004: 230-237. |
[13] | Zhou J, Dai X, Ni R et al. A hybrid approach to Chinese word segmentation around CRFs [A]. Proceedings of the Fourth SIGHAN Workshop on Chinese language Processing [C], Jeju Island, Korea, 2005: 196-199. |
APA Style
Junxia Deng, Hong Zhang, Shanzai Li. (2017). Chinese Word Segmentation Based on Conditional Random Field. Machine Learning Research, 2(3), 105-109. https://doi.org/10.11648/j.mlr.20170203.14
ACS Style
Junxia Deng; Hong Zhang; Shanzai Li. Chinese Word Segmentation Based on Conditional Random Field. Mach. Learn. Res. 2017, 2(3), 105-109. doi: 10.11648/j.mlr.20170203.14
@article{10.11648/j.mlr.20170203.14, author = {Junxia Deng and Hong Zhang and Shanzai Li}, title = {Chinese Word Segmentation Based on Conditional Random Field}, journal = {Machine Learning Research}, volume = {2}, number = {3}, pages = {105-109}, doi = {10.11648/j.mlr.20170203.14}, url = {https://doi.org/10.11648/j.mlr.20170203.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170203.14}, abstract = {This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.}, year = {2017} }
TY - JOUR T1 - Chinese Word Segmentation Based on Conditional Random Field AU - Junxia Deng AU - Hong Zhang AU - Shanzai Li Y1 - 2017/04/17 PY - 2017 N1 - https://doi.org/10.11648/j.mlr.20170203.14 DO - 10.11648/j.mlr.20170203.14 T2 - Machine Learning Research JF - Machine Learning Research JO - Machine Learning Research SP - 105 EP - 109 PB - Science Publishing Group SN - 2637-5680 UR - https://doi.org/10.11648/j.mlr.20170203.14 AB - This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl. VL - 2 IS - 3 ER -