朴素贝叶斯分类器对含暴力网络评论的甄别毕业论文+任务书+开题报告+文献综述+外文翻译及原文+Python代码

朴素贝叶斯分类器对含暴力网络评论的甄别

【摘要】近年来，随着互联网技术的快速发展，互联网已经变得非常流行。大众可以通过互联网掌握即时资讯，对国家大事、热点新闻进行评论，发表自己的观点。然而，由于网民素质参差不齐，网络暴力事件时常发生，会给当事人带来负面影响。对这些评论进行鉴别能够有效的减少网络暴力，建立一个清净、安全的网络环境。一般的网络暴力甄别技术是基于暴力词所出现的频率，然而随着网络语言的流行以及网络评论数据的暴增，传统的网络暴力甄别技术需要花费大量的搜索时间，并且可达到的精度也不是很高，经常出现误判。由此，本文提出了一种基于朴素贝叶斯的网络暴力分类器，给出了建立模型的所有步骤，包括原始数据的收集，文本的预处理：标点符号的删除，表情符号的删除，分词处理，生成词汇表，转换成词向量的稀疏表示以及分类器的构建与预测。最后根据所建立的模型得出了用于分类的词向量，实验结果表明基于朴素贝叶斯的网络暴力分类器可以在大大缩短分类所需要的时间的同时具有令人信服的准确率。

【关键词】网络暴力，朴素贝叶斯，稀疏表示，准确率

Discrimination of Violent Network Comments by Naive Bayesian Classifier

【Abstract】In recent years, with the rapid development of Internet technology, the Internet has become very popular. The public can master instant information, comment on state affairs and hot news and express their own views through the Internet. However, due to the uneven quality of Internet users, network violence often occurs, which has a great negative impact on the parties. Identification of these comments can effectively reduce network violence and establish a clean and safe network environment. General network violence screening technology is based on the frequency of violent words. However, with the popularity of network language and the explosion of network comment data, traditional network violence screening technology needs to spend a lot of search time, and the accuracy that can be achieved is not very high, which often leads to misjudgment. Therefore, this paper proposes a network violence classifier based on Naive Bayes, and gives all the steps of building the model, including the collection of original data, text preprocessing: punctuation deletion, emoticon deletion, word segmentation, vocabulary generation, loose representation of converted word vector, construction and prediction of classifier. Finally, according to the established model, we get the word vector for classification. The experimental results show that the network violence classifier based on Naive Bayes can greatly shorten the time required for classification and has a convincing accuracy.

【Key Words】network violence, naive Bayes, loose representation, accuracy rate

1 绪论

1.1 研究背景及意义

1.2 国内外研究综述

1.2.1 网络暴力语言国内文献综述

1.2.2 情感分析国内外文献综述

1.2.3 主要存在的问题

1.3 组织结构及成果

1.3.1 本文结构

1.3.2 主要成果

2 朴素贝叶斯分类器

2.1 分类器

2.2 朴素贝叶斯理论

2.3 朴素贝叶斯的优缺点

3 整体框架

3.1 主要步骤

3.2 数据获取

3.3 数据清洗

3.4 分词

3.5 贝叶斯分类器构建

4 数据处理

4.1 实验平台以及数据介绍

4.2 数据清洗

4.2 分词

4.2 稀疏表示

5 模型建立

5.1 建立朴素贝叶斯分类模型

5.2 模型评价

5.3 比较实验

结论

参考文献

附录

致谢

图目录

图3.1 基于贝叶斯分类器的网络暴力评论甄别器整体设计流程

图4.1 实验平台

图4.2 微博内容

图4.3 采集到的数据

图4.4 含有特殊字符以及表情符号的原始评论数据

图4.5 精确分词后结果

图4.6 清理后数据词云图

图5.1 数据分布图

图5.2 测试集特征词云图

图5.3 10次预测结果

表目录

表4.1 定义去除的中文符号、数字、英文符号以及表情

表4.2 文本清洗例子

表4.3 三种不同模式的对比

表4.4 词典稀疏表示

表5.1 各比较实验参数设置

表5.2 各比较实验结果

朴素贝叶斯分类器对含暴力网络评论的甄别 毕业论文+任务书+开题报告+文献综述+外文翻译及原文+Python代码

朴素贝叶斯分类器对含暴力网络评论的甄别毕业论文+任务书+开题报告+文献综述+外文翻译及原文+Python代码