Classical Chinese carries the splendid culture of China. It is the important learning content for Chinese in high school. At the same time, the scoring rate is not good in the corresponding examination. Reading and comprehending classical Chinese in high school is the hardest part, especially for students who are not good at language learning. It is meaningful to analyze the characters of wording for classical Chinese. In this paper, NLP (Natural Language Processing) technology is applied to analyze the word usage for classical Chinese in high school. Firstly, ancient poems and literature in teaching materials in high school and classical Chinese exam questions of the past 15 years are collected. Secondly, raw data is preprocessed, cut words, and calculated word frequencies. Finally, top N words are generated and the experimental data is analyzed. According to the experimental results, the methods proposed in this paper can be used to analyze words for classical Chinese in high school. The Top N words are calculated and analyzed. It provides a good reference for high school students to learn classical Chinese.
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Classical Chinese in High School, Wording Analysis, Word Frequency Calculation
1. Introduction
Classical Chinese has a long history and contains the essence of traditional culture. It is the necessary learning content for high school students. At the same time, it is one of the test focuses of the college entrance examination. Reading and understanding classical Chinese are important parts of Chinese learning in high school. Classical Chinese and ancient poetry usually account for around 50 points in the college entrance examination, with a high proportion of 33.3%. Classical Chinese accounts for about 24-28 points, accounting for approximately 16.7%. Unfortunately, the study of classical Chinese is not easy. Several factors lead to the difficulty. Firstly, classical Chinese is very different from our everyday language. Secondly, the semantics of classical Chinese words are rich, and many words often have multiple meanings that are far from the words themselves. Therefore, high school students generally find it hard to learn classical Chinese.
In recent years, there has been some research on the frequency statistics and analysis of commonly used words in classical Chinese for middle school students. The paper
[1]
Shuyi Lu. Research on the Teaching of Monosyllabic Real Words in Classical Chinese in the Unified High School Compulsory Textbook [D], Shanghai Normal University. 2023.
studied the teaching method of monosyllabic actual words in classical Chinese in the unified edition of high school textbooks. The paper
[2]
Qian Wu. Research on Teaching of Commonly Used Real Words in Classical Chinese for Junior High School Students in the Department of Compilation and Translation [D]. Shaanxi University of Technology, 2022.
surveyed commonly used real words in classical Chinese literature for junior high school students. The paper
[3]
Chen Qi. Research on the Distribution and Teaching of Function Words in Classical Chinese in the Unified High School Compulsory Textbook [D]. Shanghai Normal University. 2022.
researched function word distribution and teaching methods. Literature
[4]
Xueyang Liu. Research on Classical Chinese Vocabulary Teaching in Five Year Vocational Chinese Language Based on Word Frequency Statistics: A Case Study of the 2011 Su Education Press Chinese Language Textbook [J]. Education Science Forum. 2020, (36): 53-57.
[5]
Xiaoyu Ge. Research on Teaching Common Words in Middle School Classical Chinese Based on Frequency Statistics of Commonly Used Words [D]. Liaoning Normal University. 2018.
[6]
Hong Zhang. Frequency Statistics and Common Word Analysis of Classical Chinese in Middle School [D]. Huazhong Normal University, 2018.
[7]
Ying Zhang. Comparative Study on Selection and Annotation of Senior Classical Chinese Textbooks in Mainland and Taiwan —Taking Mainland Edition and Taiwan Longteng Edition as examples [D]. Fujian Normal University, 2023.
Liang Zhong. Suggestions for Teaching High School Classical Chinese Real Words Based on Word Frequency Statistics [J]. Journal of Ningbo Institute of Education, 2013, 15(06): 130-133.
[9, 10]
, further research on actual words in classical Chinese was demonstrated. Literature
[3]
Chen Qi. Research on the Distribution and Teaching of Function Words in Classical Chinese in the Unified High School Compulsory Textbook [D]. Shanghai Normal University. 2022.
Songmiao Zhou. Research on the Teaching of Classical Chinese Function Words in the People's Education Press High School Chinese Textbook [D]. Liaoning Normal University, 2017.
[3, 11, 12]
focused on function words in classical Chinese. Literature
[13
-15] studied classical Chinese exam questions for college entrance.
However, the analysis of the wording characters for the classical Chinese of high school is understudied. The topics in high school are wider than in middle school. At the same time, the content is harder. It is necessary to conduct targeted research to analyze the vocabulary of classical Chinese in high school, to help high school students improve their learning efficiency. In this paper, natural language processing technology is applied to analyze the content of high school classical Chinese. The vocabulary in classical Chinese is studied. The structure of this article: Section 2 introduces the research method, Section 3 presents and analyzes the experimental results, and Section 4 provides the conclusion.
2. Materials and Methods
2.1. Overview Framework and Design
We collected ancient poetry and literature in textbooks and the recent 15-year classical Chinese exam questions for high school and analyzed the wording. The framework is in Figure 1.
The data sources include two parts (as shown in Table 1):
Table 1. Data sources list.
Category
Content
Data size (word number)
Classical Chinese exam content
2008-2023 Classical Chinese content in the college entrance examination
12257
Ancient poetry and literature
72 Essential ancient poems and literature in high school textbooks
22342
(2) Data preprocessing
The data collected has different formats containing different elements. Such as images and advertisements (content unrelated to classical Chinese and poetry). It needs to be preprocessed. The preprocess includes two steps: remove invalid content and annotate raw data.
Table 2. Annotation fields for Classical Chinese content.
Annotation fields
Description
Year
The year of the college entrance examination questions
Region
The region of the exam paper, such as Beijing
Type
Ancient poems or classical Chinese
Dynasty
The dynasty of the text
Author
The author of the poem or article
Title
The title of the poem or article
Content
The content of the poem or article
Exam question stem
Exam questions and options
Answer
The answers to the exam questions
The annotated data is stored in CSV format. An example is Figure 2.
After pre-processing, the contents of classical Chinese and ancient poems can be extracted. But some additional characters are included. It needs further processing. The steps are as follows:
(1) Remove punctuation and special characters. The regular expression is used and generated as below:
punc = '[,。?《》:(),.?():""“”·;\n‘ 【 】、·!!‘’]'
The method of “sub” is called from the “re” module of Python. It can be used to remove punctuation and special characters. Corresponding codes are as follows:
new_txt = re.sub(punc, "", txt)
(2) The method of “cut” from the “jieba” module is used to cut words.
init_words = jieba.lcut(txt, cut_all=False)
(3) Remove stop words. According to the stop words list, the words are filtered to remove stop words. Examples of stop words are “之、【1】、①、1…”.
(4) Calculate the total number of words and each word frequency. Key points are as below:
for word in init_words:
if word not in stopwords:
if word != '\t':
#print(word)
words.append(word)
# outstr += " "
number = len(words)
3. Experimental Results
We designed a configurable parameter that can control the “N” number of Top N words. When applying N=30, the Top 30 frequency words are shown in Figure 3.
Figure 3. Top 30 Words, word frequency, and percentage.
We compared the top 10 words. It is found that 80% of them in the college entrance examination questions overlap with the top 10 words in high school textbooks (marked in green in Figure 3). It shows that the top 10 high-frequency words in the 72 ancient texts that must be memorized in high school Chinese textbooks highly overlap with the top 10 high-frequency words in college entrance examination questions. This result shows that high school students need to importance ancient poetry and literature in their textbooks. In addition, it is found that high-frequency words, due to their frequent usage, often contain multiple semantics. It needs more time to understand and master. A recommended method is to require in-depth understanding in conjunction with example sentences.
At the same time, it is observed that there are some limitations in the “jieba” module for word segmentation in classical Chinese. For example, the correctness of the word cutting for the term '械器' remains to be debated. We realize that the accuracy of word cutting significantly impacts the final frequency statistics of words. Therefore, in the next work, further research will be conducted to improve the accuracy of word cutting in classical Chinese. Another content is to generate example sentences for Top N words. This can help high school students better learn and master classical Chinese.
4. Conclusions
Classical Chinese and ancient poetry carry the long history and splendid culture of our country. They contain brilliant wisdom and humanistic ideas that have shone through the ages. As high school students, we should master classical Chinese semantics and expression norms. This can help us master classical Chinese and absorb the excellent ideas of the ancients.
This article focuses on the content of classical Chinese and poems in high school. Firstly, it collects ancient poetry and classical Chinese content from high school textbooks and the past 15 years of college entrance examination questions; Then cuts words and calculates word frequency statistics; Finally, the statistical results are analyzed and high-frequency words are provided. The experimental results show that this method can analyze words in high school classical Chinese automatically, providing a reference for high school students to improve their learning efficiency in classical Chinese.
Jie Li: Data curation, Investigation, Project administration, Writing – review & editing
Funding
The work is supported by the “Beijing Youth Top-notch Talent Training Program”.
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1]
Shuyi Lu. Research on the Teaching of Monosyllabic Real Words in Classical Chinese in the Unified High School Compulsory Textbook [D], Shanghai Normal University. 2023.
Qian Wu. Research on Teaching of Commonly Used Real Words in Classical Chinese for Junior High School Students in the Department of Compilation and Translation [D]. Shaanxi University of Technology, 2022.
Chen Qi. Research on the Distribution and Teaching of Function Words in Classical Chinese in the Unified High School Compulsory Textbook [D]. Shanghai Normal University. 2022.
Xueyang Liu. Research on Classical Chinese Vocabulary Teaching in Five Year Vocational Chinese Language Based on Word Frequency Statistics: A Case Study of the 2011 Su Education Press Chinese Language Textbook [J]. Education Science Forum. 2020, (36): 53-57.
[5]
Xiaoyu Ge. Research on Teaching Common Words in Middle School Classical Chinese Based on Frequency Statistics of Commonly Used Words [D]. Liaoning Normal University. 2018.
[6]
Hong Zhang. Frequency Statistics and Common Word Analysis of Classical Chinese in Middle School [D]. Huazhong Normal University, 2018.
[7]
Ying Zhang. Comparative Study on Selection and Annotation of Senior Classical Chinese Textbooks in Mainland and Taiwan —Taking Mainland Edition and Taiwan Longteng Edition as examples [D]. Fujian Normal University, 2023.
Liang Zhong. Suggestions for Teaching High School Classical Chinese Real Words Based on Word Frequency Statistics [J]. Journal of Ningbo Institute of Education, 2013, 15(06): 130-133.
[11]
Lan Chen. Study on Classical Chinese Adverb Teaching of Compulsory Textbooks in Senior High Schools [D]. Guizhou Normal University, 2024.
Songmiao Zhou. Research on the Teaching of Classical Chinese Function Words in the People's Education Press High School Chinese Textbook [D]. Liaoning Normal University, 2017.
[13]
Xunyang Ren. Research on Evaluation Strategy of Classical Chinese Teaching in Senior High School Under the Background of New College Entrance Examination [D]. Shaanxi University of Technology, 2024.
Jie Wen. Focusing on Ability and Literacy to Understand Ancient Poetry and Prose - Analysis of Ancient Poetry and Prose Reading Test Questions and Preparation Suggestions for the 2024 Nine Province Joint Entrance Examination [J]. Guangxi Education, 2024, (08): 35-38.
[15]
Xinrong Li. Focusing on the connection between teaching and examination, consolidating core competencies - the characteristics of classical Chinese essay topics in the 2023 National College Entrance Examination [J]. Yu Wen Tian Di, 2024, 31(02): 13-16.
Zhou, S.; Li, J. The Analysis of the Characters of Wording for High School Classical Chinese. Innovation. 2024, 5(4), 109-114. doi: 10.11648/j.innov.20240504.11
Zhou S, Li J. The Analysis of the Characters of Wording for High School Classical Chinese. Innovation. 2024;5(4):109-114. doi: 10.11648/j.innov.20240504.11
@article{10.11648/j.innov.20240504.11,
author = {Sili Zhou and Jie Li},
title = {The Analysis of the Characters of Wording for High School Classical Chinese
},
journal = {Innovation},
volume = {5},
number = {4},
pages = {109-114},
doi = {10.11648/j.innov.20240504.11},
url = {https://doi.org/10.11648/j.innov.20240504.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.innov.20240504.11},
abstract = {Classical Chinese carries the splendid culture of China. It is the important learning content for Chinese in high school. At the same time, the scoring rate is not good in the corresponding examination. Reading and comprehending classical Chinese in high school is the hardest part, especially for students who are not good at language learning. It is meaningful to analyze the characters of wording for classical Chinese. In this paper, NLP (Natural Language Processing) technology is applied to analyze the word usage for classical Chinese in high school. Firstly, ancient poems and literature in teaching materials in high school and classical Chinese exam questions of the past 15 years are collected. Secondly, raw data is preprocessed, cut words, and calculated word frequencies. Finally, top N words are generated and the experimental data is analyzed. According to the experimental results, the methods proposed in this paper can be used to analyze words for classical Chinese in high school. The Top N words are calculated and analyzed. It provides a good reference for high school students to learn classical Chinese.
},
year = {2024}
}
TY - JOUR
T1 - The Analysis of the Characters of Wording for High School Classical Chinese
AU - Sili Zhou
AU - Jie Li
Y1 - 2024/10/10
PY - 2024
N1 - https://doi.org/10.11648/j.innov.20240504.11
DO - 10.11648/j.innov.20240504.11
T2 - Innovation
JF - Innovation
JO - Innovation
SP - 109
EP - 114
PB - Science Publishing Group
SN - 2994-7138
UR - https://doi.org/10.11648/j.innov.20240504.11
AB - Classical Chinese carries the splendid culture of China. It is the important learning content for Chinese in high school. At the same time, the scoring rate is not good in the corresponding examination. Reading and comprehending classical Chinese in high school is the hardest part, especially for students who are not good at language learning. It is meaningful to analyze the characters of wording for classical Chinese. In this paper, NLP (Natural Language Processing) technology is applied to analyze the word usage for classical Chinese in high school. Firstly, ancient poems and literature in teaching materials in high school and classical Chinese exam questions of the past 15 years are collected. Secondly, raw data is preprocessed, cut words, and calculated word frequencies. Finally, top N words are generated and the experimental data is analyzed. According to the experimental results, the methods proposed in this paper can be used to analyze words for classical Chinese in high school. The Top N words are calculated and analyzed. It provides a good reference for high school students to learn classical Chinese.
VL - 5
IS - 4
ER -
Zhou, S.; Li, J. The Analysis of the Characters of Wording for High School Classical Chinese. Innovation. 2024, 5(4), 109-114. doi: 10.11648/j.innov.20240504.11
Zhou S, Li J. The Analysis of the Characters of Wording for High School Classical Chinese. Innovation. 2024;5(4):109-114. doi: 10.11648/j.innov.20240504.11
@article{10.11648/j.innov.20240504.11,
author = {Sili Zhou and Jie Li},
title = {The Analysis of the Characters of Wording for High School Classical Chinese
},
journal = {Innovation},
volume = {5},
number = {4},
pages = {109-114},
doi = {10.11648/j.innov.20240504.11},
url = {https://doi.org/10.11648/j.innov.20240504.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.innov.20240504.11},
abstract = {Classical Chinese carries the splendid culture of China. It is the important learning content for Chinese in high school. At the same time, the scoring rate is not good in the corresponding examination. Reading and comprehending classical Chinese in high school is the hardest part, especially for students who are not good at language learning. It is meaningful to analyze the characters of wording for classical Chinese. In this paper, NLP (Natural Language Processing) technology is applied to analyze the word usage for classical Chinese in high school. Firstly, ancient poems and literature in teaching materials in high school and classical Chinese exam questions of the past 15 years are collected. Secondly, raw data is preprocessed, cut words, and calculated word frequencies. Finally, top N words are generated and the experimental data is analyzed. According to the experimental results, the methods proposed in this paper can be used to analyze words for classical Chinese in high school. The Top N words are calculated and analyzed. It provides a good reference for high school students to learn classical Chinese.
},
year = {2024}
}
TY - JOUR
T1 - The Analysis of the Characters of Wording for High School Classical Chinese
AU - Sili Zhou
AU - Jie Li
Y1 - 2024/10/10
PY - 2024
N1 - https://doi.org/10.11648/j.innov.20240504.11
DO - 10.11648/j.innov.20240504.11
T2 - Innovation
JF - Innovation
JO - Innovation
SP - 109
EP - 114
PB - Science Publishing Group
SN - 2994-7138
UR - https://doi.org/10.11648/j.innov.20240504.11
AB - Classical Chinese carries the splendid culture of China. It is the important learning content for Chinese in high school. At the same time, the scoring rate is not good in the corresponding examination. Reading and comprehending classical Chinese in high school is the hardest part, especially for students who are not good at language learning. It is meaningful to analyze the characters of wording for classical Chinese. In this paper, NLP (Natural Language Processing) technology is applied to analyze the word usage for classical Chinese in high school. Firstly, ancient poems and literature in teaching materials in high school and classical Chinese exam questions of the past 15 years are collected. Secondly, raw data is preprocessed, cut words, and calculated word frequencies. Finally, top N words are generated and the experimental data is analyzed. According to the experimental results, the methods proposed in this paper can be used to analyze words for classical Chinese in high school. The Top N words are calculated and analyzed. It provides a good reference for high school students to learn classical Chinese.
VL - 5
IS - 4
ER -