Current Issue Cover
多模态多层次事件网络的谣言检测

李莎1, 张怀文2, 钱胜胜2, 方全2, 徐常胜2(1.郑州大学, 郑州 450000;2.中国科学院自动化研究所模式识别国家重点实验室, 北京 100190)

摘 要
目的 自动检测谣言至关重要,目前已有多种谣言检测方法,但存在以下两点局限:1)只考虑文本内容,忽略了可用于判断谣言的辅助多模态信息;2)只关注时间序列模型捕捉谣言事件的时间特征,没有很好地研究事件的局部信息和全局信息。为了克服这些局限性,有效利用多模态帖子信息并联合多种编码策略构建每个新闻事件的表示,本文提出一种新颖的基于多模态多层次事件网络的社交媒体谣言检测方法。方法 通过一个多模态的帖子嵌入层,同时利用文本内容和视觉内容;将多模态的帖子嵌入向量送入多层次事件编码网络,联合使用多种编码策略,以由粗到细的方式描述事件特征。结果 在Twitter和Pheme数据集上的大量实验表明,本文提出的多模态多层次事件网络模型比现有的SVM-TS(support vector machine—time structure)、CNN(convolutional neural network)、GRU(gated recurrent unit)、CallAtRumors和MKEMN(multimodal knowledge-aware event memory network)等方法在准确率上提升了4 %以上。结论 本文提出的谣言检测模型,对每个事件的全局、时间和局部信息进行建模,提升了谣言检测的性能。
关键词
Multi-modal multi-level event network for rumor detection

Li Sha1, Zhang Huaiwen2, Qian Shengsheng2, Fang Quan2, Xu Changsheng2(1.Zhengzhou University, Zhengzhou 450000, China;2.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)

Abstract
Objective The proliferation of social media has revolutionized the way people acquire information. A growing number of people choose to share information, and express and exchange opinions through social media. Unfortunately, because a large number of users do not carefully verify the released content when posting information and sharing their opinions, various rumors have been fostered on social media platforms. The extensive spread of these rumors is expected to bring new threats to the political, economic, and cultural fields and affect people's lives. To strengthen the detection of rumors and prevent their spread, many approaches to rumor detection have been proposed. An early rumor detection platform (e.g., snopes.com) mainly reported through users, and then invited experts or institutions in related fields to confirm. Although these methods can achieve the purpose of rumor detection, the timeliness of detection has obvious limitations. Thus, how to detect rumors automatically has become a key research direction in recent years. To date, many automatic detection approaches have been proposed to improve the efficiency of rumor detection, including feature construction-based and neural network-based methods. The feature construction-based methods rely on hand-craft features to train rumor classifiers and neural network-based methods using neural networks to automatically extract deep features. Compared with traditional methods, models based on deep neural networks can automatically learn the underlying deep representation of rumors and extract more effective semantic features. However, these methods may suffer from the following limitations. 1) At post level, many existing methods only consider the text content. In fact, posts often contain various types of information (e.g., text and images), and the visual information are often used as an auxiliary information to judge the credibility of posts in reality. Therefore, the key to detecting rumors is obtaining the multi-modal information of the posts and systematically integrating the textual and visual information. 2) At the event level, existing approaches typically only use the temporal sequence model to capture temporal features of events. Local and global information has not been well investigated yet. In practice, local and global features are important because the former helps distinguish between posts of subtle differences, and the latter helps capture features that repeatedly present in the event. Therefore, based on encoding the temporal information of the event, local and global information should be exploited to obtain a fine-grained feature of the event for event encoding collaboratively. Method To overcome these limitations, this paper presents a novel multi-modal multi-level event network (MMEN) for rumor detection, which can effectively use multi-modal post information and combine multi-level encoding strategies to construct a representation of each news event. MMEN employs an encoding network that jointly exploits multiple encoding strategies such as mean pooling, recurrent neural networks, and convolutional networks to model the global, temporal, and local information of each event. Then, these various types of information are combined into a unified deep model. Specifically, our model consists of the following three components: 1) The multi-modal post embedding layer employs bidirectional encode representations form transformers(BERT) to generate the text content embedding vector and use Visual Geometry Group-19(VGG-19) to obtain the visual content. 2) The multi-level event encoding network utilizes three-level encodings to capture global, temporal, and local information. The first level is a global encoder through the mean pooling, which represents the elements that are repeatedly present in the posts. The second is a temporal encoder that exploits a bidirectional recurrent neural network to use past and future information of a given post sequence. The third level is a local encoder by utilizing more subtle local representation of events. Then, the encoding results are combined to describe the events in a coarse-to-fine fashion. 3) The rumor detector layer aims to classify each event as either fake or authentic. The detector exploits a fully connected layer with corresponding activation function to generate predicted probability to determine whether the event is a rumor or not.Result In this study, the public datasets Pheme and Twitter are used to evaluate the effectiveness of the MMEN. The quantitative evaluation metrics included accuracy, precision, recall, and F1 score. We also perform five-fold cross-validation throughout all experiments. The experiments demonstrate that our proposed MMEN has improved accuracy by more than 4% over current best practices. MMEN has an accuracy of 82.2% on the Pheme dataset and 87.0% on the Twitter dataset. We compare our model MMEN with five state-of-the-art baseline models. Compared with all the baselines, the MMEN achieves the best performance and outperforms other rumor detection methods in most cases. To examine the usefulness of each component in the MMEN and demonstrate its effectiveness, we compare variants of MMEN. The experiment results show that the multi-modal features learned by the multimodal post embedding layer can improve the accuracy of rumor detection by nearly 0.2% on the two datasets. The experimental results also show that the temporal encoder has a stronger effect on detection accuracy. Conclusion In this study, we design a novel MMEN for rumor detection. Experiments and comparisons demonstrate that our model is more robust and effective than state-of-the-art baselines based on two public datasets for rumor detection. We attribute the superiority of MMEN to its two properties. The MMEN takes advantage of the multiple modalities of posts, and the proposed multi-level encoder jointly exploits multiple encoding strategies to generate powerful and complementary features progressively.
Keywords

订阅号|日报