Current Issue Cover
面部动态特征描述的抑郁症识别

安昳, 曲珍, 许宁, 尼玛扎西(西藏大学信息科学技术学院, 拉萨 850000)

摘 要
目的 抑郁症是一种严重的精神类障碍,会显著影响患者的日常生活和工作。目前的抑郁症临床评估方法几乎都依赖于临床访谈或问卷调查,缺少系统有效地挖掘与抑郁症密切相关模式信息的手段。为了有效帮助临床医生诊断患者的抑郁症严重程度,情感计算领域涌现出越来越多的方法进行自动化的抑郁症识别。为了有效挖掘和编码人们面部含有的具有鉴别力的情感信息,本文提出了一种基于动态面部特征和稀疏编码的抑郁症自动识别框架。方法 在面部特征提取方面,提出了一种新的可以深层次挖掘面部宏观和微观结构信息的动态特征描述符,即中值鲁棒局部二值模式—3D正交平面(median robust local binary patterns from three orthogonal planes,MRELBP-TOP)。由于MRELBP-TOP帧级特征的维度较高,且含有部分冗余信息。为了进一步去除冗余信息和保留关键信息,采用随机映射(random projection,RP)对帧级特征MRELBP-TOP进行降维。此外,为了进一步表征经过降维后的高层模式信息,采用稀疏编码(sparse coding,SC)来抽象紧凑的特征表示。最后,采用支持向量机进行抑郁程度的估计,即预测贝克抑郁分数(the Beck depression inventory-II,BDI-II)。结果 在AVEC2013(the continuous audiovisual emotion and depression 2013)和AVEC2014测试集上,抑郁程度估计值与真实值之间的均方根误差(root mean square error,RMSE)分别为9.70和9.22,相比基准算法,识别精度分别提高了29%和15%。实验结果表明,本文方法优于当前大多数基于视频的抑郁症识别方法。结论 本文构建了基于面部表情的抑郁症识别框架,实现了抑郁程度的有效估计;提出了帧级特征描述子MRELBP-TOP,有效提高了抑郁症识别的精度。
关键词
Automatic depression estimation using facial appearance

An Yi, Qu Zhen, Xu Ning, Nima Zhaxi(School of Information Science, Tibet University, Lhasa 850000, China)

Abstract
Objective Depression is a serious mood disorder that causes noticeable problems in day-to-day activities. Current methods for assessing depression depend almost entirely on clinical interviews or questionnaires and lack systematic and efficient ways for utilizing behavioral observations that are strong indicators of psychological disorder. To help clinicians effectively and efficiently diagnose depression severity, the affective computing community has shown a growing interest in developing automated systems using objective and quantifiable data for depression recognition. Based on these developments, we propose a framework for the automatic diagnosis of depression from facial expressions. Method The method consists of following steps.1) To extract facial dynamic features, we propose a novel dynamic feature descriptor, namely, median robust local binary patterns from three orthogonal planes (MRELBP-TOP), which can capture the microstructure and macrostructure of facial appearance and dynamics. To extend the MRELBP descriptors to the temporal domain, we follow the procedure of the LBP-TOP algorithm, where an image sequence is regarded as a video volume from the perspective of three different stacks of planes, that is, the XY, XT, and YT planes. The XY plane provides spatial domain information, whereas the XT and YT planes provide temporal information. The robust center intensity based LBP (RELBP_CI) and robust neighborhood intensity based LBP(RELBP_NI)features are extracted independently from three sets of orthogonal planes, and co-occurrence statistics in these three directions are considered. The features are then stacked in a joint histogram. 2) The proposed MRLBP-TOP descriptors are typically high dimensional. Standard methods, such as principle component analysis (PCA) and linear discriminant analysis (LDA), have been widely used in dimensionality reduction. However, PCA and LDA have some drawbacks. Compared with PCA, random projenction(RP) has a lower computational cost and is easier to implement. 3) To obtain a compact feature representation, sparse coding (SC) is used. SC refers to a general class of techniques that automatically select a sparse set of elements from a large pool of possible bases to encode an input signal. Basically, SC assumes that objects in the world and their relationships are simple and succinct and can be represented by only a small number of prominent elements. 4) Finally, support vector regression(SVR) is adopted to predict Beck depression inventory(BDI) scores over an entire video clip for depression recognition and analysis. Result The root mean square error between the predicted values and the Beck depression inventory-II (BDI-II) scores is 9.70 and 9.01 on the test sets of the continuous audiovisual emotion and depression 2013 (AVEC2013)and AVEC2014, respectively. Conclusion 1) We develop an automated framework that effectively captures facial dynamics information for the measurement depression severity. 2) We propose a robust yet dynamic feature descriptor that captures the macrostructure, microstructure, and spatiotemporal motion patterns. The proposed feature descriptor can be adopted for facial expression recognition tasks in the future. Furthermore, we adopt sparse coding to learn overcomplete dictionary and organize MRELBP-TOP feature descriptors into compact behavior patterns.
Keywords

订阅号|日报