教你利用Python绘制酷炫的词云图。

1. 效果展示

词云图想必大家都见过，是一种形式新颖的查看文本中出现最多词汇的图。

我使用Python的第三方库stylecloud来分别生成了 2 张词云图，读者可以猜一猜以下词云图的出处来自于哪里。

词云图 1

词云图 2

2. 实现过程

2.1 导入库

代码语言：javascript

复制

import pandas as pd
import stylecloud
import jieba
from collections import Counter

2.2 导入文本

代码语言：javascript

复制

with open('./三体.txt',encoding='utf-8') as f:
    txt = f.read()
txt = txt.split()

2.3 去除停用词

代码语言：javascript

复制

def stopwordslist():
    stopwords = [line.strip() for line in open('./常见中文停用词表.txt', 'r', encoding='gbk').readlines()]
    stopwords.append(' ') # 自定义添加停用词
    return stopwords
def movestopwords(sentence):

stopwords = stopwordslist()  # 加载停用词的路径

santi_words =[x for x in sentence if len(x) >1 and x not in stopwords]

return ' '.join(santi_words)
data_cut = jieba.lcut(str(txt))

word_list = movestopwords(data_cut)
print(word_list.split(' '))

2.4 统计词频

代码语言：javascript

复制

mycount = {}

for word in word_list.split(' '):

mycount[word] = mycount.get(word,0)+1

counts_df = pd.DataFrame(mycount.items(), columns=['label', 'counts'])

counts_df.sort_values(by='counts',inplace=True, ascending = False)

counts_df.to_csv('./词频统计.csv',encoding='utf-8')

print('输出词频统计 成功！！')

print(counts_df.iloc[:10]) # 输出词频前 10 的词汇

2.5 生成词云图

代码语言：javascript

复制

stylecloud.gen_stylecloud(

text=word_list,

palette='tableau.BlueRed_6',

icon_name='fas fa-apple-alt',

font_path='./田英章楷书3500字.ttf',

output_name='《三体》词云图.png',

# custom_stopwords=stopwords

)

3. API详解

3.1 stylecloud.gen_stylecloud() 参数详解

代码语言：javascript

复制

gen_stylecloud(text=None,  # 输入文本（不含词频数）

file_path=None,   # 输入文本/CSV 的文件路径 (可以含词频数)

size=512,  # stylecloud 的大小（长度和宽度）

icon_name='fas fa-flag',  # stylecloud 形状的图标名称（如 fas fa-grin）。[default: fas fa-flag]

palette='cartocolors.qualitative.Bold_5',  # 调色板（通过 palettable 实现）。[default: cartocolors.qualitative.Bold_6]

colors=None, # 自定义十六进制的字体颜色

background_color="white",  # 背景颜色

max_font_size=200,  # stylecloud 中的最大字号

max_words=2000,  # stylecloud 可包含的最大单词数

stopwords=True,  # 布尔值，用于筛除常见禁用词

custom_stopwords=STOPWORDS, # 去除停用词

icon_dir='.temp',

output_name='stylecloud.png',   # stylecloud 的输出文本名

gradient=None,  # 梯度方向

font_path=os.path.join(STATIC_PATH,'Staatliches-Regular.ttf'), # stylecloud 所用字体

random_state=None,  # 控制单词和颜色的随机状态

collocations=True,

invert_mask=False,

pro_icon_path=None,

pro_css_path=None)