用AI做一个英语阅读材料AR值计算器

儿童进行英语阅读学习时，蓝思值或者AR值很重要，可以帮助找到适合当前英语水平的阅读材料。英语绘本的AR值通常指的是“Accelerated Reader”（加速阅读者）值，这是一套由Renaissance Learning公司开发的阅读评估系统。AR系统旨在帮助学生、教师和家长跟踪和提高学生的阅读能力。AR级别（AR Level）：这是一个数字和字母的组合，代表一本书的阅读难度。级别越高，书籍的难度越大。例如，“2.5”表示第二年级第五个月的阅读水平。

AR系统通过提供一个标准化的方式来评估学生的阅读进度和理解能力，帮助教师为学生推荐适合他们阅读水平的书籍，并跟踪他们的阅读成长。对于英语绘本，AR值可以帮助家长和教育者选择适合孩子阅读水平和兴趣的书籍，同时通过测试来验证孩子的阅读理解能力。

如果是少量几本书，可以在官网https://www.arbookfind.com/来查询AR 值。

但是，如果有很多电子书或者官网没有收录的书，该如何查询AR值呢？

可以让ChatGPT帮忙写一个AR值计算器，输入提示词：

用Python写一个程序脚本，实现AR值计算器（Accelerated Reader）的功能，具体步骤如下：

打开文件夹："F:\aivideo"

读取里面的txt文本文档；

用NLTK 对文本进行分词和停用词处理；

一步步的思考，根据AR值计算的原理，设计一个公式来估算出AR值，然后来分析txt文本文档的词汇复杂度、平均句子长度、词汇难度、全书单词数等, 并依据公式来估算出AR值；

将计算出来的AR值放在txt文档的文件名开头中，比如：原文件名是a.txt ,计算出来的AR值是1.2，将a.txt文件重命名为：AR1.2_a.txt

注意：每一步都要输出信息到屏幕上

NLTK库中没有syllable_count这个方法，可以使用第三方库pyphen来实现音节计数。

AR值相关背景信息：

AR值在0到12.9之间，小数点前面代表阅读水平对应的年级，小数点后面代表月份。比如，一本AR值是3.6的书，代表这本书的难度相当于美国3年级第6个月的英文阅读水平，该书籍的难度是读到这个年级这个月份的孩子应该能够读懂的程度。AR值越低，读物越简单；反之，读物越难。AR从四个维度衡量读物难度：全文的文本内容；平均句子长度；

词汇难度；全书单词数。

测试分值其采用10进位系统，后面加小数点来表示更精确的级别。最低0，最高12.9。

AR数值，和美国其他主流分级体系（比如蓝思、Guided Reading等）都可相互转换。

举个例子，这是对《饼干狗》这本书的难度评定：

ATOS Book Level：1.4

Interest Level：Lower Grades (LG K-3)

Word Count：133

“翻译“成大白话就是：全书有133个词，书的内容兴趣度适合幼儿园到三年级孩子，语言难度适合具有美国1年级4个月学生平均阅读水平的孩子。综合这几个参数，老师和家长就能非常清楚地知道一本书是怎样的难度。

正在对文本进行处理...

句子数量: 19

单词数量: 147

去除停用词后的单词数量: 57

平均句子长度: 7.74

词汇量: 50

计算得到的AR值: 0.82

文件 Just Me and My Puppy.txt 已重命名为 AR0.8_Just Me and My Puppy.txt

正在对文本进行处理...

句子数量: 19

单词数量: 147

去除停用词后的单词数量: 57

平均句子长度: 7.74

词汇量: 50

计算得到的AR值: 0.82

文件 Just Me and My Puppy_已识别.txt 已重命名为 AR0.8_Just Me and My Puppy_已识别.txt

程序运行的AR值和官网的不一致，让ChatGPT修改下。

以上是程序返回的结果。Just Me and My Puppy_实际AR值是1.6，反思以下，如何调整公式和算法，才更准确？

最后，ChatGPT给出的Python源代码是：

import os

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

import pyphen

# 下载NLTK数据（仅需运行一次）

nltk.download('punkt')

nltk.download('stopwords')

# 创建Pyphen分词器

dic = pyphen.Pyphen(lang='en')

def syllable_count(word):

hyphenated = dic.inserted(word)

return len(hyphenated.split('-'))

def flesch_kincaid_grade_level(text):

sentences = sent_tokenize(text)

words = word_tokenize(text)

syllables = sum([syllable_count(word) for word in words if word.isalnum()])

num_sentences = len(sentences)

num_words = len(words)

if num_sentences == 0 or num_words == 0:

return 0

# Flesch-Kincaid Grade Level formula

fk_grade = 0.39 * (num_words / num_sentences) + 11.8 * (syllables / num_words) - 15.59

return max(0, fk_grade) # 确保年级水平不为负

def calculate_ar_value(text):

print("正在对文本进行处理...")

# 分句

sentences = sent_tokenize(text)

num_sentences = len(sentences)

print(f"句子数量: {num_sentences}")

# 分词

words = word_tokenize(text)

num_words = len(words)

print(f"单词数量: {num_words}")

# 去除停用词

stop_words = set(stopwords.words('english'))

words = [word for word in words if word.isalnum() and word.lower() not in stop_words]

num_filtered_words = len(words)

print(f"去除停用词后的单词数量: {num_filtered_words}")

# 计算平均句子长度

avg_sentence_length = num_words / num_sentences if num_sentences > 0 else 0

print(f"平均句子长度: {avg_sentence_length:.2f}")

# 计算词汇复杂度

vocab = set(words)

vocab_size = len(vocab)

print(f"词汇量: {vocab_size}")

# 计算Flesch-Kincaid年级水平

fk_grade = flesch_kincaid_grade_level(text)

print(f"Flesch-Kincaid年级水平: {fk_grade:.2f}")

# 计算词汇密度

word_density = num_filtered_words / num_words if num_words > 0 else 0

print(f"词汇密度: {word_density:.2f}")

# 计算词汇难度（罕见单词比例）

common_words = set(['the', 'be', 'to', 'of', 'and', 'a', 'in', 'that', 'have', 'I']) # 常见单词集合

rare_word_count = sum(1 for word in vocab if word.lower() not in common_words)

rare_word_ratio = rare_word_count / vocab_size if vocab_size > 0 else 0

print(f"罕见单词比例: {rare_word_ratio:.2f}")

# 调整公式权重

ar_value = 0.2 * fk_grade + 0.6 * (avg_sentence_length / 10) + 0.8 * (vocab_size / 100) + 0.3 * word_density + 0.6 * rare_word_ratio

ar_value = min(12.9, ar_value) # 确保AR值不超过12.9

print(f"计算得到的AR值: {ar_value:.2f}")

return ar_value

def process_files_in_folder(folder_path):

for filename in os.listdir(folder_path):

if filename.endswith(".txt"):

file_path = os.path.join(folder_path, filename)

with open(file_path, 'r', encoding='utf-8') as file:

text = file.read()

ar_value = calculate_ar_value(text)

new_filename = f"AR{ar_value:.1f}_{filename}"

new_file_path = os.path.join(folder_path, new_filename)

os.rename(file_path, new_file_path)

print(f"文件 {filename} 已重命名为 {new_filename}")

if __name__ == "__main__":

folder_path = "F:\\aivideo"

process_files_in_folder(folder_path)