Current Issue Cover
可视数据清洗综述

王铭军1,2, 潘巧明1, 刘真3, 陈为2(1.丽水学院工程与设计学院, 丽水 323000;2.浙江大学计算机科学与技术学院, 杭州 310058;3.杭州电子科技大学, 杭州 310018)

摘 要
目的 数据清洗是一个长期存在并困扰人们的问题,随着可视化技术的发展,可视数据清洗必将成为数据清洗的重要方法之一.阐述数据的主要质量问题和可视数据清洗的过程,回顾可视数据清洗的研究现状(包括数据质量问题的来源、分类以及可视数据清洗方法),并根据已有文献总结可视数据清洗面临的主要挑战和机遇.方法 由于数据清洗的方法和策略与具体的数据质量问题相关,因此本文以不同的数据质量问题为线索来归纳和评述可视数据清洗的方法和策略.结果 根据数据质量问题的不同,将可视清洗方法归纳为直接可视清洗、可视缺失数据、可视不确定数据、可视数据转换和数据清洗资源共享等,并依据不同的数据质量问题归纳总结出相应问题所面临的挑战和可进一步研究的方向.结论 对可视数据清洗的归纳、总结和展望,并指出在数据清洗领域中可视数据清洗将会是未来最有前景的研究方向之一.
关键词
Survey of visualization data cleaning

Wang Mingjun1,2, Pan Qiaoming1, Liu Zhen3, Chen Wei2(1.College of Engineering and Design, Lishui University, Lishui 323000, China;2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China;3.Hangzhou DianZi University, Hangzhou 310018, China)

Abstract
Objective Many issues still exist in data cleaning despite extensive studies on this method. With visual interface and visualization, visual data cleaning has become one of the most important data cleaning methods. This study describes existing data quality problems and visual data cleaning processes, reviews state-of-the-art visual data cleaning methods (including sources, categories of data quality issues,and visual data cleaning methods), and summarizes the challenges and opportunities associated with visual data cleaning problems.Method Data cleaning techniques are related to specific data quality issues. Thus,this study examines different data quality problems to summarize and review previous works on visual data cleaning. Result Based on data quality issues, visual cleaning methods are summarized as direct visual cleaning, visual missing data, visual uncertainty data, visual data transformation, and data cleaning resource sharing. Challenges and further research directions are surveyed according to different data quality issues. Conclusion We introduce and provide an overview of visual data cleaning problems, as well as highlight research directions of visual data cleaning.
Keywords

订阅号|日报