Smiling virtual agent in social context

Magalie Ochs1· Radoslaw Niewiadomski1· Paul Brunet2· Catherine Pelachaud1

CNRS-LTCI, TélécomParisTech

{ochs, niewiadomski, pelachaud}@telecom-paristech.fr

School of Psychology, Queen´s University of Belfast



A smile may communicate different communicative intentions depending on subtle

characteristics of the facial expression. In this article, we propose an algorithm to determine the

morphological and dynamic characteristics of virtual agent’s smiles of amusement, politeness, and embarrassment. The algorithm has been defined based on a virtual agent’s smiles corpus constructed by users and analyzed with a decision tree classification technique. An evaluation, in different contexts, of the resulting smiles has enabled us to validate the proposed algorithm.

KeywordsSmile · Virtual Agent · Facial Expression · Politeness · Amusement · Embarrassment

1. Introduction A smile is one of the simplest and most easily recognized facial expressions (Ekman and Friesen, 1982). The zygomatic majors, one on either side of the face, are the only two muscles needed to be activated to create a smile. However, a smile may have several meanings – such as amusement, politeness, or embarrassment – depending on subtle differences in the characteristics of the smile itself and of other elements of the face that are displayed with the smile. These different types of smiles are often distinguishable during a social interaction. Recently researchers (Rehm and André, 2005; Niewiadomski and Pelachaud, 2007) has shown that people are also able to distinguish different types of smiles when they are expressed by a virtual agent. Moreover, a smiling virtual agent improves the human-machine interaction, for example enhances the perception of the task and of the agent, and the motivation and enthusiasm of the user (Krumhuber et al., 2008; Theonas et al., 2008). Conversely, an inappropriate smile (an inappropriate type of smile or a smile expressed in an inappropriate situation) may have negative effects on the social interaction (Theonas et al., 2008).

In this paper, we present research that aimed at identifying the morphological and dynamic characteristics of different smiles in a virtual agent. More precisely, we have investigated how a virtual agent may display different a smile in different contexts. For this purpose, we have first analyzed different types of smiles in context-free situations. We created a web application to collect a virtual agent’s smile descriptions corpus directly constructed by users. Based on the corpus, we used a machine learning algorithm to determine the characteristics of each type of the smile that a virtual agent may express. As a result, we obtain the algorithm that enables the generation of a variety of facial expressions corresponding to the polite, embarrassed and amused smiles. To validate this algorithm, we have secondly conducted an evaluation to validate the identified smiles in polite, embarrassed, and amused contexts.

The paper structure is as follow. After giving an overview of existing work on humans’ smiles (Section 2.1) and on virtual agents’ smiles (Section 2.2), we introduce the web application developed to collect the smiles corpus (Section 3). In Section 4, we present the algorithm to compute the smile’s characteristics based on the smiles corpus. In Section 5, we present the evaluation, in different contexts, of the smiles resulting from the proposed algorithm. We conclude in Section 6.

2. Related work

2.1 Theoretical background: Types and characteristics of smiles Unsurprisingly, the most common type of smile is the amused smile, also called felt, Duchenne, enjoyment, or genuine smile. However, when someone smiles, it does not necessarily mean that the person feels happy or amused. Indeed, different types of smiles with different meanings can be displayed and be distinguished by other people. Another type, which is often thought of as the amuse smile’s opposite is the polite smile, also called non-Duchenne, false, social, masking, or controlled smile (Frank et al., 1993). Perceptual studies (Frank et al., 1993) have shown that people unconsciously and consciously distinguish between an amused smile and a polite smile.

Furthermore, someone may smile in a negative situation. For example, a specific smile appears in the facial expression of embarrassment (Keltner, 1995), or anxiety (Harrigan and O’Connell, 1996).

In the current paper, we focus on the three following smiles: amused, polite and embarrassed smiles. These smiles have been selected because they have been explored in the Human and Social Sciences literature both from the encoder’s point of view (i.e., from the point of view of the person who smiles, Keltner, 1995; Ekman and Friesen, 1982) and from the decoder’s point of view (i.e., from the point of view of the one who perceived the smile, Ambadar et al., 2009).

The different smiles are distinguishable given their distinct morphological and dynamic characteristics. Morphological characteristics include facial movements such as the mouth opening or the cheeks rising. Dynamic characteristics correspond to the temporal unfolding of the smile such as the velocity. In the literature on smiles (Ambadar et al., 2009, Keltner, 1995, Ekman and Friesen, 1982), the following characteristics are generally considered to distinguish the

amused, polite, and embarrassed smiles1:

- morphological characteristics: AU6 (cheek raising), AU24 (lip press), AU12 (zygomatic major), symmetry of the lip corners, mouth opening, and amplitude of the smile;

- dynamic characteristics: duration of the smile and velocity of the onset and offset of the smile.

Concerning the cheek raising, Ekman (2003) claims the orbicularis oculi (which refers to the Action Unit (AU) 6 in the Facial Action Coding System (Ekman et al., 2002)) is activated in an amused smile. Without it, the expression of happiness seems to be insincere (Duchenne, 1862).

This finding was also confirmed in the empirical study by Frank et al, (1993) in which participants distinguished between the smiles of "enjoyment" and "non-enjoyment" based on the orbicularis oculi activation. According to Ekman (2003), asymmetry is an indicator of voluntary and nonspontaneous expression, such as the polite smile. Lip press (AU24) is often related to the smile of embarrassment (Keltner, 1995). The different types of smile may have different durations. The felt expressions, such as the amused smile, last from half a second to four seconds, even if the corresponding emotional state is longer (Ekman, 2003). The duration of a polite or embarrassed smile is shorter than 0.5 second or longer than 4 seconds (Ekman and Friesen, 1982; Ekman, 2003). Not only do the overall durations differ, but also the course of the expression varies depending on the type of smiles. The dynamics of facial expressions is commonly defined by three time intervals. The onset corresponds to the interval of time in which the expression reaches its Note that other elements of the face, such as the gaze, the head movements and the eyebrows, influence how a smile is perceived. However, in the presented work, we focus on the influence of the smile and we do not consider the other elements of the face.

maximal intensity starting from the neutral face. Then, the apex is the time during which the expression maintains its maximal intensity. Finally, the offset is the interval of time in which the expression starting from the maximal intensity returns to the neutral expression (Ekman and Friesen, 1982). In the deliberate expressions, the onset is often abrupt or excessively short, the apex is held too long, and the offset can be either more irregular or abrupt and short (Ekman and Friesen, 1982; Hess and Kleck, 1990). Smiles characterized by long onset, long offset, and short apex duration were perceived as significantly more spontaneous and genuine than smiles characterized by short onset, short offset, and long apex duration in the empirical study with the synthesized videos of smiling humans (Krumhuber et al., 2008).

However, no consensus exists on the morphological and dynamic characteristics of the amused, polite, and embarrassed smile. In general, AU6 is more present in amused smile than in polite or embarrassed smile. For instance, according to Ekman the amused smile is characterized by a cheek raising (AU6), the activation of the zygomatic major (AU12) and a symmetry of the zygomatic major. The dynamic characteristics of the amused smile are the smoothness and regularity of the onset, apex, offset and of the overall zygomatic actions, and duration of the smile between 0.5 and 4 seconds (Ekman and Friesen, 1982). The activation of AU6, long onset and offset as well as short apex duration are also indicated in the empirical studies aiming to distinguish enjoyment smiles (Frank et al, 1993; Krumhuber, et al., 2008). Also the results of Ambadar et al. (2007) confirm the role of AU6; they indicate the mouth opening as another cue of the smile of enjoyment. The expressions of amusement composed of AUs 6 and 12, accompanied by AU 58 and 63 were correctly recognized (46%) in the forced-choice test including 14 emotions (Keltner and Buswell, 1996). However, recently the role of AU6 in the smile of amusement was challenged by Krumhuber and Manstead (2009).

According to Ekman, in the expression of a polite smile, the cheek raising (AU6) is absent, the amplitude of the zygomatic major (AU12) is small, the smile is slightly asymmetric, the apex is too long, the onset too short, the offset too abrupt, and the lips may be pressed (Ekman and Friesen, 1982).

Finally, according to Keltner (Keltner, 1995; Keltner and Buswell, 1996) a smile of embarrassment is characterized by the lips pressed and by the absence of AU6 that are often accompanied by head and gaze aversion. The expressions of embarrassment composed of AUs 12 and 24 accompanied by AU 51, 54, and 64 were correctly recognized (51%) in the forced-choice test including 14 emotions (Keltner and Buswell, 1996). In Ambadar et al. (2007) work embarrassed/nervous smiles more often characterized by mouth opening and larger amplitude than polite smiles.

2.2 Smiling virtual agents

In order to increase the variability of virtual agent’s facial expressions, several researchers have considered different virtual agent’s smiles. For instance, in Tanguy (2006), two different types of smiles, amused and polite, are used by a virtual agent. The amused smile is used to reflect an emotional state of happiness whereas a polite smile, called fake smile in Tanguy (2006), is used in a case of a sad virtual agent. The amused smile is represented by lip corners raised, lower eyelids raised, and an open mouth. The polite smile is represented by an asymmetric raising of the lip corners and an expression of sadness in the upper part of the face.

In Rhem and André (2005), virtual agents mask a felt negative emotion of disgust, anger, fear, or sadness with a smile. Two types of facial expression were created according to the Ekman’s description (Ekman and Friesen, 1975). The first expression corresponds to a felt emotion of happiness (including an amused smile). The second one corresponds to the other expression (e.g.

disgust) masked by unfelt happiness. In particular, the expression of unfelt happiness lacks the AU6 activity and is asymmetric (see Section 2.1). It may correspond to a polite smile. A perceptual test has enabled the authors to measure the impact of such fake expressions on the user’s subjective impression of the agent. The participants were able to perceive the difference, but they were unable to explain their judgment. The agent expressing an amused smile was perceived as being more reliable, trustworthy, convincing, credible, and more certain about what it said compared to the agent expressing a negative emotion masked by a polite smile 2.

In Krumhuber et al. (2008), the authors have explored the impact of varying dynamic characteristics of smile in virtual faces on the user’s job interview impressions and decisions. The results show that smiles with long onset and offset durations were associated with ‘authentic smiles’ (i.e., amused smile). Fake smiles were characterized by short onset and offset durations.

The total duration of both types of smiles was equal (4 seconds). During the interaction, the type of smiles used by the virtual agents has an impact on the user’s perception: the job is perceived as more positive and suitable in case of authentic smiles. Globally, regardless of its type (e.g., fake or authentic), a smile increases the positive perception of the agent. Niewiadomski and Pelachaud (2007) proposed an algorithm to generate complex facial expressions such as masked or fake expressions. An expression is a composition of eight facial areas, each of which can display signs of emotion. For complex facial expressions, various emotions can be expressed on different areas of the face. In particular, it is possible to generate different expressions of joy; for example a felt expression and a fake one. The felt expression of joy uses the reliable features (e.g., AU6), while the second one is asymmetric.

To create facial expressions of emotions, Grammer and Oberzaucher (2006) performed what they called a reserve engineering approach. They used a 3D facial model driven by FACS (Ekman et al., 2002). A set of facial expressions was rendered randomly. An expression corresponds to either a single Action Unit, a combination of Action Units that were 50% or 100% randomly generated.

Participants had to evaluate the expressions along the 3D dimensional space of Pleasure-ArousalDominance (Mehrabian and Russell, 1974). Multiple multivariate regression technique was applied enabling the mapping between Action Units and the dimensions Pleasure and Arousal. The authors propose to use the obtained mapping to create facial expressions of emotions.

