Symmetry in Emotional and Visual Similarity between Neutral and Negative Faces

Is Mr. Hyde more similar to his alter ego Dr. Jekyll, because of their physical identity, or to Jack the Ripper, because both evoke fear and loathing? The relative weight of emotional and visual dimensions in similarity judgements is still unclear. We expected an asymmetric effect of these dimensions on similarity perception, such that faces that express the same or similar feeling are judged as more similar than different emotional expressions of same person. We selected 10 male faces with different expressions. Each face posed one neutral expression and one emotional expression (five disgust, five fear). We paired these expressions, resulting in 190 pairs, varying either in emotional expressions, physical identity, or both. Twenty healthy participants rated the similarity of paired faces on a 7-point scale. We report a symmetric effect of emotional expression and identity on similarity judgements, suggesting that people may perceive Mr. Hyde to be just as similar to Dr. Jekyll (identity) as to Jack the Ripper (emotion). We also observed that emotional mismatch decreased perceived similarity, suggesting that emotions play a prominent role in similarity judgements. From an evolutionary perspective, poor discrimination between emotional stimuli might endanger the individual.


Introduction
Emotional similarity refers to the tendency to group stimuli together because they evoke the same feelings in us, even when they are visually different. For example, we may judge two different individuals with fearful faces either as similar, because they both express negative emotion, or different, because identity aspect of faces do not look alike. At present, it is not clear whether different stimulus attributes (i.e., emotional expression, identity aspect) have a symmetrical or asymmetrical influence on similarity perception. In other words, is Mr. Hyde more similar to Dr. Jekyll, because they have the same facial features (same identity), or to Jack the Ripper, because of the emotions they trigger in witnesses of their crimes?
The investigation of emotional similarity has a long tradition, with both replicated and controversial results. First, as framed by Russell's circumplex model, participants rate the similarity between emotional stimuli according to their resemblance in valence and arousal. These orthogonal dimensions (valence and arousal) define participants' emotional similarity space, wherein proximities reflect the similarity among stimuli [1]. This was replicated both in adults and children [2][3][4], using simple stimuli, such as words [5][6][7], objects [8,9], and faces [10][11][12][13], and with more complex stimuli, such as real world photographs [14][15][16]. Based on this line of research, an increasing number of studies aim to decode the nature of emotions in the brain [17], particularly where and how valence and Symmetry 2021, 13, 2091 2 of 10 arousal are represented, by computing the correlation between behavioural and neural measures of similarity [18][19][20][21].
One of the most controversial findings in the emotional similarity literature is related to asymmetries in similarity judgements between different levels of valence (i.e., negative vs. positive). Specifically, in a series of experiments, Koch et al. (2016) demonstrated that 'good is more alike than bad', that is, there is higher similarity among positive than negative emotional stimuli [5,13]. By contrast, others report higher semantic relatedness among negative than randomly selected non-emotional pictures [22] and wider generalisation in conditioned than unconditioned stimuli in healthy controls [23]. One of the reasons for these mixed results might be related to differences in semantic similarity among the various levels of valence of the experimental stimuli used. This may confound the relationship between emotional dimensions and perceived similarity [24]. One way to control for this confounding factor is to select simple stimuli, possibly from the same semantic category, such as faces.
Many datasets of prototypical emotional and neutral faces are currently available [25][26][27][28]. These are widely used in emotion cognition research to uncover how facial expressions are processed and perceived. In general, evidence from neural data shows that regions in the occipitotemporal lobe, including the fusiform face area, the inferior temporal cortex, and the superior temporal sulcus, encode facial identity and similarity among facial expressions [29][30][31]. In addition, Said et al. (2010) observed a positive correlation between neural similarity in the posterior superior temporal sulcus and affect-based similarity ratings [32]. In behaviour, faces depicting basic emotions that share the same valence and arousal elicited similar subjective experiences in healthy participants [33,34]. Among basic emotions, happiness is the one recognized with highest accuracy and lowest ambiguity [35,36]. Anger and disgust [37], as well as fear and surprise [35], are most frequently confused, probably because of an perceptual overlap, with lowered eyebrows in anger and disgust, and raised eyebrows in fear and surprise [38]. This similarity in emotional expression and physical appearance might explain part of the overall similarity observed between faces expressing different emotions in the face similarity space [39]. This is in line with the results from Said et al. (2010), who instructed two groups of participants to rate either the visual or the emotional similarity among faces, and reported high correlation (r = 0.93) between the visual and the affect-based similarity ratings [32].
However, these studies have so far failed to investigate the relative weight of emotional expression and visual identity in global similarity judgements among faces, since they did not ask participants to focus on one of these features. Only a handful of studies [12,[39][40][41] explored the latter effect. Among them, Wegrzyn et al. (2017) asked participants to recognize emotions from faces that depicted two identities (one male and one female), which expressed seven different emotions. Faces were masked by a grid of white tiles, which started with one tile randomly shown and subsequently one additional tile was revealed every second. Participants were instructed to click a button below the image when they recognized the facial expression, and to select the labelled button corresponding to it in a forced-choice decision task. The multidimensional scaling (MDS) analysis of the emotion recognition task revealed that faces clustered according to the emotion they expressed in similarity space. Conversely, the MDS with the low-level visual features (grey-scale value in each pixel) of the faces as input showed that faces were dispersed according to the identity they depicted. However, in this study, participants were not asked to process inter-stimulus relationships. Conversely, Halberstadt and Niedenthal (1997) manipulated emotions by instructing participants to watch either emotional (positive or negative) or neutral movies, and then to judge the similarity among faces. Participants in the emotional compared to the non-emotional states weighted the emotional dimension of faces more than the gender or head orientation. Taken together, these studies suggest that the relevance of emotional expression and identity may be malleable according to task instructions, and that both are salient features that define participants' face similarity space. However, it seems that these dimensions interact during subjective similarity judgement tasks. One promising technique for disentangling emotional and visual facial features is to computes objective measures of low-level visual similarity among faces, as in the eigenfaces method [39]. According to this approach, the low-level visual similarity among faces is conceptualized as correlation between pixel values of grey-scale intensity; the eigenfaces are extracted by performing principal component analysis (PCA) on the correlations among faces, and represent unique visual features of a set of human faces as dimensions, which define the face-space [42]. This approach has been widely adopted in the context of face recognition and identification, because of the speed of recognition and a higher success rate in comparison to other computational methods [43]. Several studies [43][44][45][46] used eigenfaces to predict the emotions evoked from images. Success is greater when this method is used, compared to other low-level visual features (e.g., GIST, colour histograms). For example, Yuan et al. (2013) developed a novel algorithm based on eigenfaces, Sentribute, which reached a level of accuracy of 82% in predicting image sentiments based on mid-level attributes [44]. A similar approach was adopted in another study by Hsu (2013), wherein the authors automatically identified and discriminated emotions according to the twodimensional subspace of valence-arousal [45].
We computed objective measures of low-level visual similarity, in order to control for visual similarity as confounding factor of the effect of interest: asymmetry between emotional expression and identity features on similarity judgements. In particular, we expected that paired faces with different identity that express the same or similar emotions (Mr. Hyde and Jack the Ripper) would be perceived as more similar than faces with the same identity, but different emotional expressions (Mr. Hyde vs. Dr. Jekyll), as shown in Figure 1. With this aim, we selected negative and neutral faces, that differed in either emotional or visual aspects. We also expected higher similarity ratings for faces with the same emotional expression or same identity (similarity within-category) than for faces with different emotional expressions and identities (similarity between categories). The first prediction represents our main hypothesis; the second one serves as manipulation check, since a good category boundary simultaneously maximizes the within-category similarity and minimizes the between categories similarity.

Participants
A total of twenty healthy participants (13 females, 7 males; mean age 32.10 ± 10.17) were recruited from the University of Manchester to take part in the study. This sample size is comparable to other publications on this topic [47,48]. All participants had normal or corrected-to-normal vision and were older than 18 years. Participants provided informed consent prior to the experiment and were reimbursed for their participation. The exclusion criteria were: a history of neurological (e.g., head injury or concussion) or psychiatric (e.g., depression, anxiety) conditions, drug or alcohol abuse, or regular medication that could influence emotional processing. The study was approved by the ethics board number 2018-3619-5928 of the University of Manchester.

Stimuli
Twenty images of faces (562 pixels × 762 pixels) were selected from the Karolinska Directed Emotional Faces (KDEF) dataset [26], which comprises 490 colour pictures of human facial expressions from 70 selected individuals (35 women and 35 men), each displaying six basic emotions (angry, fearful, disgusted, happy, sad, and surprised) and a neutral facial expression. Each expression is photographed from the front. In particular, we selected 10 emotional (five disgust, IDs: 02, 06, 10, 17, 27; five images of fear, IDs: 04, 08, 11, 23, 28) male facial expressions, and their neutral equivalents (n = 10), which corresponded to the same IDs. We chose fear and disgust, because the most distinguishing characteristics of the emotion of fear appear in the upper half of the face (eyes), whereas for disgust these appear in the lower half of the face (mouth). We chose this to minimise the visual similarity between emotional faces such that their similarities were more related to emotional aspects. Males were selected in order to exclude gender as an additional dimension to consider in the judgement of similarity, which is beyond the scope of this experiment.
(Mr Hyde and Jack the Ripper) would be perceived as more similar than faces with the same identity, but different emotional expressions (Mr Hyde vs. Dr Jekyll), as shown in Figure 1. With this aim, we selected negative and neutral faces, that differed in either emotional or visual aspects. We also expected higher similarity ratings for faces with the same emotional expression or same identity (similarity within-category) than for faces with different emotional expressions and identities (similarity between categories). The first prediction represents our main hypothesis; the second one serves as manipulation check, since a good category boundary simultaneously maximizes the within-category similarity and minimizes the between categories similarity.  . The similarity ratings were standardized, transformed into dissimilarity measures (correlational distance) and entered in a 20 × 20 representational dissimilarity matrix (RDM). In the RDM, the rows and the columns represented the stimuli (disgust: 1 to 5; fear: 6 to 10; neutral: 11 to 20), and each cell a correlational distance between faces in each pair. In the RDM, the violet squares represent the dissimilarity within emotional pictures (EE), calculated by averaging the dissimilarity within disgusted (EE_D) and fearful (EE_F) faces; EE_DF is the dissimilarity between disgusted and fearful faces, and NN the dissimilarity within neutral faces; ID, depicted in grey colour, indicates the dissimilarity between emotional and neutral faces, with the same identity, and EN the dissimilarity between emotional and neutral faces, with different identities. We expected an asymmetric effect of emotional expression and identity on similarity judgements, resulting in higher similarity (lower dissimilarity) in EE, EE_DF and NN compared to ID.

Experimental Procedure
Participants viewed all possible pairs of the 20 images, resulting in 190 different combinations, presented side by side on a blank screen. Participants were instructed to rate the similarity of each pair by using a 7-point scale (1 = low similarity, 7 = high similarity). Each trial started with a central fixation cross for 500 ms, the task cue ('how similar do you think these pictures are?') was presented at the top of the screen, and the judgement scale at the bottom. Participants were told to respond as quickly as possible by clicking the appropriate number key, and were informed that there was not a right or wrong answer. The task ended after approximately twenty minutes.

Data Analysis
Similarity ratings. We analysed the similarity ratings using Representational Similarity Analysis (RSA) [49], implemented in Matlab R2018, and SPSS. A graphical representation of the conditions of interest and key hypotheses is shown in Figure 1. Specifically, the similarity ratings were entered into a 20 × 20 similarity matrix for each participant. The rows and the columns represent the experimental stimuli, and each cell reflects the similarity rating for each pair. Then, for each subject, a Representational Dissimilarity Matrix (RDM) was computed. We first normalized the similarity ratings, by subtracting 1 (the lowest similarity rating) from each rating x, and then dividing by 6 (highest similarity rating-lowest similarity rating). Second, we transformed them into correlational distances, by subtracting the ratings from 1. These values were entered into each cell of the RDM. The RDM is therefore symmetric about a diagonal of zeros. Next, we extracted from the single-subject RDM the mean dissimilarities and standard deviations of our conditions of interest, shown in Figure 1: within emotional faces (EE), calculated by averaging the dissimilarity within disgusted (EE_D) and within fearful (EE_F) faces; within neutral faces (NN); between emotional and neutral faces with the same identity (ID); between emotional and neutral faces with different identities (EN). The latter served as a measure of dissimilarity between categories, and the first three as within-category dissimilarity. We also considered the dissimilarity between fearful and disgusted faces (EE_DF) as part of dissimilarity within-category, because the faces in this condition shared negative valence and high arousal. We included this measure to further test our main hypothesis with a dimensional approach to emotions. The dissimilarity measures were entered as dependent variables in two one-way repeated-measures ANOVAs, with conditions as grouping factor. The main hypothesis was tested in the first ANOVA, which included the conditions EE, NN, EE_DF, and ID, and used a planned contrast to test lower similarity (higher dissimilarity) in ID compared to the other conditions, as displayed in Figure 1. The second ANOVA used a planned contrast to test lower similarity (higher dissimilarity) in EN than in EE, NN, EE_DF and ID. Bonferroni post hoc corrections for multiple comparisons (p < 0.05) were used to explore the nature of the effect.
Multidimensional scaling (MDS) analysis. To visualize the structure of the similarity space, we performed a multidimensional scaling (MDS) analysis on the similarity ratings, where proximities reflect similarities among stimuli and are measured on an ordinal scale. The rank order of proximities determines the dimensionality of the space and the metric configuration of the points representing the stimuli [50]. In line with previous studies in this research field, we assumed this space to be two-dimensional, with valence and arousal as orthogonal dimensions [2]. The goodness-of-fit of the MDS representation was estimated with the Stress measure. We expected that faces clustered according to their similarity in emotional expression rather than identity in the bidimensional face space.
Pixel-based similarity. We measured low level visual similarity among faces by computing the Pearson correlations between pixel values of light intensity for each pair of faces. This was done to exclude the possibility that differences in similarity judgements among conditions were due to low level visual similarity only. In particular, we first prepared the dataset of images by transforming them into grey-scale and applying histogram equalization to enhance the contrast of the image and maximize the prominence of discernible features. Second, we computed the correlation coefficients between the pixels of each pair of images. To always obtain positive values, we converted the correlation coefficients into correlational distances (1-Pearson correlation). These were entered in a 20 × 20 representational dissimilarity matrix, wherein the row and the columns represented the faces and each cell the correlational distance between faces in each specific pair. We extracted from this matrix the mean and the standard deviation of each condition of interest, which resembled those in the similarity ratings matrix. These were used as dependent variables in a one-way repeated-measures ANOVA, wherein we used a planned contrast to test the same main hypothesis, that is, lower similarity (higher dissimilarity) in ID compared to EE, NN and EE_DF (p < 0.05).

Results
In contrast to our hypothesis, we did not observe lower similarity ratings in ID compared to EE the small sample size, we calculated the inter-rater reliability, which resulted in a very good Cronbach's Alpha (α= 0.97). We also measured the visual similarity among faces by computing the correlational distance among them, in order to exclude the possibility that differences in similarity judgements among conditions were due to visual similarity. We found higher visual similarity (lower correlational distance) in ID compared to EN_DF, F(1, 9) = 33.93, p < 0.001, η p 2 = 0.79; and NN, F(1, 9) = 18.02, p = 0.002, η p 2 = 0.67, but only a trend towards significance was observed between ID and EE, F(1, 9) = 4.01, p = 0.08, η p 2 = 0.31. The MDS solution showed that the faces were clustered according to their similarity in valence and arousal (but not visual similarity) in a two-dimensional space. It had a Stress value of 0.04, indicating a good fit for this model. These findings are reported in Figure 2. . Given the small sample size, we calculated the inter-rater reliability, which resulted in a very good Cronbach's Alpha (α= 0.97). We also measured the visual similarity among faces by computing the correlational distance among them, in order to exclude the possibility that differences in similarity judgements among conditions were due to visual similarity. We found higher visual similarity (lower correlational distance) in ID compared to EN_DF, F(1, 9) = 33.93, p < 0.001, ηp 2 = 0.79; and NN, F(1, 9) = 18.02, p = 0.002, ηp 2 = 0.67, but only a trend towards significance was observed between ID and EE, F(1, 9) = 4.01, p = 0.08, ηp 2 = 0.31. The MDS solution showed that the faces were clustered according to their similarity in valence and arousal (but not visual similarity) in a two-dimensional space. It had a Stress value of 0.04, indicating a good fit for this model. These findings are reported in Figure 2.

Discussion
In this study, we investigated the asymmetric effect of emotional expression and identity on the perception of similarity between faces. We explored whether participants relied more on emotional or identity features while judging the similarity between emotional and neutral faces, without instructing them on which aspect to focus. We report two new findings. First, emotional and visual identity features had the same relevance in similarity judgements: Mr. Hyde is perceived as equally similar to Jack the Ripper and to his alter ego Dr. Jekyll. This result suggests a symmetric rather than an asymmetric effect on similarity perception. Second, similarity ratings were not fully explained by the identity of faces, evident in that NN and EE conditions were less visually similar (higher correlational distance) than ID, yet participants did not perceive these conditions to be different from each other in similarity. We also found that emotional similarity among faces may influence overall, global similarity perception, given the higher dissimilarity in conditions with an emotional mismatch (i.e., EE_DF and EN) compared to those with emotional congruency (i.e., EE and NN). Below we discuss the implications of these findings.
Symmetrical effects of emotional and identity features on similarity judgements provide additional evidence for the relevance of emotion in similarity judgements. Further support comes from the observation that an emotional mismatch (i.e., EN and EE_DF conditions) makes people perceive faces as less similar compared to conditions with emotional congruency (i.e., EE and NN). As previously proposed [12,41], this process is evolutionarily advantageous: poor discrimination among emotional expressions that have the same meaning (expressions of disgust, for example) possibly would not endanger the individual; however, when the stimulus is emotional, small dissimilarities can create large differences in similarity perception and action planning (e.g., fight or flight). Disgusted and fearful faces in the EE_DF condition have similar values in valence and arousal (low scores in valence and high scores in arousal). This is also the case for the neutral faces in the NN condition (medium scores in both valence and arousal). Yet, small variations in valence and arousal were more relevant when the faces were emotional, rather than neutral. Emotions convey specific information about one's internal and external environment that each individual takes into consideration for congruent action planning and decision making. This is made possible by selectively focusing attention to the emotional aspects of the world, and it will probably result in a lower latency in detecting the emotional content of any stimulus and increased discriminability of stimuli exhibiting those features. Furthermore, we have shown that our emotional similarity judgments are not determined by the low-level visual similarity judgements. We observed less low-level visual similarity between emotionally similar faces, both neutral and negative emotional, than between faces with the same identity but different emotions. This suggests that the symmetric effect observed in the similarity ratings task is not explained by low-level visual similarity.
Our study has several limitations that can be addressed in future work. First, we studied only two negative emotions, neglecting positively valenced emotions. We chose this to increase the statistical power in terms of number of trials per condition, while keeping the experiment short enough to ensure participants' attention. It would be relevant in future studies to examine whether the same effects are replicated with positive emotions. Second, we only selected male facial expressions. This was a deliberate choice, to ensure that participants would focus on the visual and emotional similarity among faces. However, it would be interesting to include gender as an additional dimension in face space and to explore its relative weight in similarity judgements. Finally, our sample size was quite small, even though the inter-rater reliability was very high. However, given the significance and applicability of the findings, it would be appropriate to replicate the experiment by increasing its sample size, and by including an equal number of male and female faces. This would test whether gender moderates the previously reported effects.
Overall, in the present study, we report a symmetrical effect of emotional expression and identity on similarity judgements. Mr. Hyde is equally similar to Dr. Jekyll and to Jack the Ripper, despite the higher visual similarity to the latter. Determining the