| Peer-Reviewed

Chinese Word Segmentation Based on Conditional Random Field

Received: 6 February 2017     Accepted: 27 February 2017     Published: 17 April 2017
Views:       Downloads:
Abstract

This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.

Published in Machine Learning Research (Volume 2, Issue 3)
DOI 10.11648/j.mlr.20170203.14
Page(s) 105-109
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2017. Published by Science Publishing Group

Keywords

Natural Language Processing, Chinese Word Segmentation, Hidden Markov Model, Maximum Entropy Model, Conditional Random Field, Automatic Proofreading

References
[1] John Lafferty, Andrew McCallum, F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international Conference on Machine Leaning. San Francisco, USA. 2001: 282-289.
[2] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [A]. Processing of the International Conference on Machine Learning (ICML-2001) [C]. Williams college, MA, 2001: 282-289.
[3] Pinto D, McCallum A, Wei X et al. Table extraction using conditional random fields [A]. Proceedings of the 26th ACM SIGm [C], Toronto, Canada, 2003: 235-242.
[4] David Palmer A Trainable Rule-based Algorithm for Word Segmentation 1997.
[5] Berkeley, California, A new statistical formula for Chinese text segmentation incorporating contextual information. United States Pages: 82-89 Year of Publication: 1999.
[6] Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition proceedings of The IEEE 77 (2): pp. ZS7-286, 1989.
[7] Zhou, GD., Su J. Named entity Recognition using all HMM-based chunk tagger. 2002.
[8] E. T. Jaynes. information Theory and Statistica Imeehanics. 1957.
[9] J. R. Crran and S. Clark Investigatigating GIS and Smoothing for Maximum Entropy Tggers. Proceedings of the llh Conference of the Europe Chapter of the Association of Computation Linguistics (EACL), Pages 91—98, Budapest, Hungary. 2003.
[10] Tan Y’Yao T, Chea Q ET al. Applying conditional random fields to Chinese shallow parsing
[11] Proceedings of CICLing-2005 [c], Mexico City, Mexico, 2005: 167-176.
[12] Kudo T, Yamamoto K, Matsumoto Y. Applying Conditional Random Fields to Japanese Morphological Analysis [A]. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004) [C], Barcelona, 2004: 230-237.
[13] Zhou J, Dai X, Ni R et al. A hybrid approach to Chinese word segmentation around CRFs [A]. Proceedings of the Fourth SIGHAN Workshop on Chinese language Processing [C], Jeju Island, Korea, 2005: 196-199.
Cite This Article
  • APA Style

    Junxia Deng, Hong Zhang, Shanzai Li. (2017). Chinese Word Segmentation Based on Conditional Random Field. Machine Learning Research, 2(3), 105-109. https://doi.org/10.11648/j.mlr.20170203.14

    Copy | Download

    ACS Style

    Junxia Deng; Hong Zhang; Shanzai Li. Chinese Word Segmentation Based on Conditional Random Field. Mach. Learn. Res. 2017, 2(3), 105-109. doi: 10.11648/j.mlr.20170203.14

    Copy | Download

    AMA Style

    Junxia Deng, Hong Zhang, Shanzai Li. Chinese Word Segmentation Based on Conditional Random Field. Mach Learn Res. 2017;2(3):105-109. doi: 10.11648/j.mlr.20170203.14

    Copy | Download

  • @article{10.11648/j.mlr.20170203.14,
      author = {Junxia Deng and Hong Zhang and Shanzai Li},
      title = {Chinese Word Segmentation Based on Conditional Random Field},
      journal = {Machine Learning Research},
      volume = {2},
      number = {3},
      pages = {105-109},
      doi = {10.11648/j.mlr.20170203.14},
      url = {https://doi.org/10.11648/j.mlr.20170203.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20170203.14},
      abstract = {This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Chinese Word Segmentation Based on Conditional Random Field
    AU  - Junxia Deng
    AU  - Hong Zhang
    AU  - Shanzai Li
    Y1  - 2017/04/17
    PY  - 2017
    N1  - https://doi.org/10.11648/j.mlr.20170203.14
    DO  - 10.11648/j.mlr.20170203.14
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 105
    EP  - 109
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20170203.14
    AB  - This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.
    VL  - 2
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • International Economics and Trade, Gengdan Institute of Beijing University of Technology, Beijing, China

  • School of Information, Beijing Wuzi University, Beijing, China

  • School of Information, Beijing Wuzi University, Beijing, China

  • Sections