Humans have a remarkable fidelity for visual long-term memory, and yet the composition of these memories is a longstanding debate in cognitive psychology. While much of the work on long-term memory has focused on processes associated with successful encoding and retrieval, more recent work on visual object recognition has developed a focus on the memorability of specific visual stimuli. Such work is engendering a view of object representation as a hierarchical movement from low-level visual representations to higher level categorical organization of conceptual representations. However, studies on object recognition often fail to account for how these high- and low-level features interact to promote distinct forms of memory. Here, we use both visual and semantic factors to investigate their relative contributions to two different forms of memory of everyday objects. We first collected normative visual and semantic feature information on 1,000 object images. We then conducted a memory study where we presented these same images during encoding (picture target) on Day 1, and then either a Lexical (lexical cue) or Visual (picture cue) memory test on Day 2. Our findings indicate that: (1) higher level visual factors (via DNNs) and semantic factors (via feature-based statistics) make independent contributions to object memory, (2) semantic information contributes to both true and false memory performance, and (3) factors that predict object memory depend on the type of memory being tested. These findings help to provide a more complete picture of what factors influence object memorability. These data are available online upon publication as a public resource.
The original version of this article was revised due to a retrospective Open Access order.
A correction to this article is available online at
corrected publication 2021
One of the most important issues in memory research is why we remember some things but forget others. To address this issue, it is critical to answer not only which
The current study relates two different but interconnected literatures. First, we relate findings in the neuroscience (Grill-Spector & Malach,
Second, the current study relates to findings in the semantic cognition literature that are relevant to explaining object memorability. Most memory studies examining the influence of semantic factors rarely incorporate visual features as predictors of memory strength and have centered on verbal stimuli and simple lexical factors like word frequency or concreteness (e.g., words that reflect more concrete concepts tend to be remembered better; see Fliessbach et al.,
In sum, the current object memory study investigated how well visual and semantic properties predict the visual and lexical memory of object concepts. Before this experimental study, it was necessary to conduct a normative study of the visual and semantic features of a large set of everyday object images. Available published norms comprising semantic properties for object concepts are only available for words (Devereux et al.,
In the memory study, which consisted of two experiments, visual and semantic variables were used to predict subsequent memory for objects. Complex visual measures were obtained by analyzing the object pictures using the layer-specific activation information from a popular DNN, AlexNet (Krizhevsky et al.,
Five hundred and sixty-six Amazon Mechanical Turk workers all with a 95% approval rating or above (347 females, 19–75 years of age, mean age = 34.6 years, all self-reported native speakers of American English) participated in this study. Participants had an average of 14.68 years of education, and the racial demographic was balanced with national averages. Participants could take part in repeat sessions and completed between one and five sessions. Sessions lasted about an hour with 40 concepts presented per session. Participants were paid $3.00 for their participation in the property norming study. Informed consent was obtained from all participants under a protocol approved by the Duke Medical School Institutional Review Board (IRB). All procedures and analyses were performed in accordance with IRB guidelines and regulations for experimental testing.
The two primary aims of this study were to (1) collect normative feature data on a large set of object concepts that can be expanded and manipulated by a wide range of research domains and 2) assess whether feature statistics can explain memorability of objects. To date, the most extensive and widely used set of property norms are the McCrae Norms (Ken McRae et al.,
A total of 995 object concepts were used for the online object norming via Amazon Mechanical Turk (AMT). Image concepts were selected from a wide range of standard object categories (e.g., birds, buildings, mammals, tools, vehicles), as well as object categories present in everyday life, but not well represented in typical object databases (food, holiday items, street items). 237 of the objects were living and 758 were non-living. The relative size of each of the 29 categories in our database is depicted in Fig. Distribution of Item Frequencies across different Categories. The bar chart represents the number of items in each category in our dataset, split by living (pink) and nonliving (dark pink) categories
Suitable images for each concept were selected from the image search engines Google Images, Bing Images, and Flickr. Images were selected based on the following criteria: (1) minimum size of 300 x 300 pixels; (2) either whitespace background, a background easily removable with image-editing software, or a background not otherwise integrated into the foreground or target object; (3) standard framing/positioning of the object, i.e., we avoided image orientations that obscured the identity of the object; (4) all images were in color, with no obvious chromatic or morphological filter; (5) no visible watermarks; and (6) no text printed on the object concept identifying it as such (e.g., “Fire Station No. 9”). After assembling two image exemplars for each concept, backgrounds were removed with photo-editing software and images were cropped to square dimensions and resized to 300 x 300 pixels.
Image attributes in the current analysis are characterized both as intrinsic properties of object identity and as potential predictors of object memorability. Attribute definitions, characteristics, and distributions within the current dataset are summarized in Fig. Visual measures used in the current study. Note that original scores (pink line) have been adjusted with log transformations (dark line) if and only if they did not meet standards for normality. If a measure lacks a dark line such a transformation was not needed
The first question addressed by the current article is whether simple image features are predictive of memory in the visual and lexical memory task. Visual measures are summarized in Fig.
Next, we assessed complex visual properties of object images by assessing the similarity of visual features derived from a DNN, which carries inherently relational information given that (1) a DNN optimizes based on all images within a training set, and (2) individual layers represent distinct but still dependent information between layers as image vectors change through the progression across layers (DNNs; Krizhevsky et al., Multidimensional scaling plots for object stimuli. (
Next, we describe the semantic attributes in the current analysis, as summarized in Fig. Semantic measures used in the current study. Note that original scores (pink line) have been adjusted with log transformations (dark line) if and only if they did not meet standards for normality. If a measure lacks a dark line such a transformation was not needed
Name agreement, which reflects the agreement for a verbal label to an object photograph, was assessed with a standard picture-name agreement task (Snodgrass & Vanderwart,
The principal approach of the current analysis is the application of a new and large set of property norms designed to characterize the semantic features associated with a broad range of object concepts. While many of the concepts overlap with the McRae (Ken McRae et al.,
The feature-statistics used in this study are based on the conceptual structure account (CSA), a neurocognitively motivated theory of conceptual knowledge that captures information of conceptual representations (Taylor et al.,
In addition to the simpler feature statistics described above, the current analysis sought to test the utility of object features in predicting memorability of items in either the lexical or visual memory test. We used three key measures that are capable of differentiating between similar objects. First,
The current feature-norming dataset comprises 995 objects from 29 different categories and includes 5,520 features, each of which was present at least three times in the data. Taxonomic features (e.g.,
Participants were shown an object (e.g., a porcupine) and were given a space to add five unique features; similar to previous feature-norming paradigms (Devereux et al., Feature-norming example. Example stimuli and prompt that participants saw during the feature-norming task
Before the construction of the production frequency matrix, feature responses underwent various stages of processing, following the procedures used by McRae et al. ( Example concept properties Object Relation word Feature Production frequency Has wings 21 Is an insect 13 Does make honey 12 Does sting 11 Has a stinger 11 Does fly 10 Is black 8 Is yellow 8 Has legs 7 Does pollinate flowers 6
A group of 200 Amazon Mechanical Turk workers (>95% approval rating in AMT, all self-reported speaker of native English) participated in the lexical memory task and a different group of 303 workers participated in the visual memory task. Both groups of participants were different than the group of participants who completed the normative study. Forty people were excluded from the visual memory task and seven people from the lexical memory task, because of a computer error that did not collect their responses.
A total of 456 Amazon Mechanical Turk workers completed either the visual (n = 263) or lexical (n = 193) memory task. In the lexical memory task, participant ages ranged from 19 to 87 years, mean age = 39.7 years, with 108 females and 85 males, and in the visual memory task ages ranged from 18 to 76 years, mean age = 37.1 years, with 137 females and 126 males. The demographics in the two memory tasks are comparable, such that there are no significant differences in sex t(454) = -0.93, p = 0.35, years of education t(399) = -1.10, p = 0.27, race t(445) = 0.18, p = 0.86, and lag, which is the difference between the time when the participant completed encoding and retrieval, t(245) = 1.10, p = 0.27. There is a significant difference in age, t(406) = 2.11, p = 0.04, between visual and lexical memory groups, but this factor was not a significant predictor of either visual (r = 0.07, p = 0.32) or lexical memory (r = 0.07, p = 0.31).
AMT workers were presented with either a visual or a lexical memory test (Fig. Memory study paradigm. Example stimuli during the visual and lexical memory task. Both tasks comprinsed of an Encoding session where visual object stimuli were presented; subsequent Retrieval Sessions presented old and new objects (visual memory task) or word stimuli (lexical memory task)
After data collection, mean hit rates and false alarm rates for each item were calculated based on the percentage of correct responses across subjects to old or new trials, respectively. As such, each measure represents an item-wise averaging across the responses of all contributing AMT workers, and is therefore expressed as a continuous measure amenable to linear regression. As analyses were focused on the item level, each item was presented
In order to address the relationship between visual and semantic image statistics and image memorability, we adopted a linear regression framework. Model diagnostics included overall fit, as well as explicit examination of collinearity across predictor variables (evaluated by the variance inflation factor), and diagnostics on normality (see Figs.
The central goal of the current study was to determine what visual and semantic information predicts object memorability. First, we provide regression diagnostics and then summarize the distribution of memory scores on the Visual and Lexical memory tests. We then examine a series of predictors for both memory types based on the visual and semantic features described above. Lastly, we examine interactions between the two forms of memory.
Accuracy for both the visual and the lexical memory tasks on Day 2, as indexed by both Hit Rate (HR) and False Alarm Rate (FAR), as well as response times are shown in Table Behavioral performance on visual and lexical memory tests Visual memory Lexical memory M SD M SD Response accuracy Hit rate 0.56 0.18 0.55 0.15 False alarm rate 0.27 0.15 0.44 0.14 Response time (ms) Hits 1202 98.48 1288 129.61 Misses 1196 117.78 1304 168.79 Correct rejections 1176 84.09 1318 152.68 False alarms 1245 146.52 1295 290.84 Memory performance summary across category and items. (
In this article, simple visual statistics such as JPEG size, proportion non-white space, image energy, and simple semantic statistics such as name agreement, COCA frequency, and number of features are used to assess the stand-alone properties of the objects in the database. The complex semantic statistics we use in this study include
In applying regression using large samples of data it is important to be sensitive to the assumptions of the shape of the distribution, including the normality, skew, and kurtosis of the constituent distribution, as well as the collinearity between potential predictors. Model diagnostics describing skew, kurtosis, and normality were conducted on all predictors for all of the regression models and were adjusted and normalized where needed. If a distribution was positively or negatively skewed the appropriate transformation was applied and that value was used in the regression for both the lexical and the visual model. For example, most variables necessitated a log10 transformation, while Hue, Saturation, and Value measures were corrected with a boxcox transformation. Proportion of non-white space, Mean Distinctiveness, and CSxD were considered normal in their original state. Figures Covariance matrix for all predictors. Covariance across all predictors in the regression model. Variance inflation factor suggests models are not likely to lead to multicollinearity.
Here, we first describe the results of the visual properties and their contribution to memory before examining semantic properties. We then report the results of the four separate regression models testing four distinct memory measures: Hit Rates (HRs) and False Alarm Rates (FARs) for both Visual and Lexical memory tests (see Table Regression output Predictors Visual memory Lexical memory Hit rate False alarm rate Hit rate False alarm rate Visual features Energy 0.30 (0.005) -0.60 (-0.009) 0.71 (0.01) JPEG size -0.30 (-0.02) Proportion non-white space 0.43 (0.01) -0.43 (-0.01) 0.84 (0.02) -1.19 (-0.04) Hue 0.57 (0.003) 1.33 (0.006) 0.21 (0.0009) -0.52 (-0.002) Saturation 0.28 (0.002) 0.89 (0.005) 1.12 (0.007) 0.69 (0.004) Value -1.61 (-0.01) 0.39 (0.002) 1.03 (0.006) 0.21 (0.001) Early DNN layer (3) 0.13 (0.01) Middle DNN layer (6) Late DNN layer (8) 0.19 (0.02) 0.97 (0.10) Semantic features Frequency -0.28 (-0.001) -0.11 (-0.0005) Name agreement Number of non-taxonomic features -0.20 (-0.01) -0.29 (-0.02) 0.77 (0.05) -0.26 (-0.02) Correlational strength Mean distinctiveness Correlation x Distinctiveness
First, examining the capacity for visual properties to predict later memory, we found that entropy values based on activation of all layers of the DNN (Early, Middle, and Late based on AlexNet Layers 3/6/8, respectively), had a significant influence on nearly all measures of memory discriminability. Early DNN information was a significant predictor of memory in both tasks. This layer, which is organized roughly by shape (see Fig.
In contrast to the more complex values derived from the DNN, we found only a few other visual properties to be predictive of memory success. Simple image statistics such as mean hue of an image was not predictive of memory in either the visual (HR:
Turning to our semantic properties, we found that in the visual memory task, the complex feature-based predictors generally proved to be stronger than more simple statistics as predictors of later memory strength. Correlational strength was a strong predictor of both item-wise HRs (
One direct means of disentangling this somewhat perplexing result is in an examination of a third CSA statistic. In addition to CS and MD, the mean correlational x distinctiveness (CSxD, also known as “slope”) measure was a positive predictor of both the visual (HR:
In addition to these semantic predictors, COCA frequency and name agreement also predicted memory. COCA frequency of a concept positively predicted both lexical HRs and FARs, (HRs:
Lastly, we sought to further address the possibility that response bias may have influenced item-wise memory scores, given that the values for many predictors of both HR and FAR trend in the same direction (see Table
Lastly, in order to test the independence of visual and lexical memory, we performed two additional analyses on these data. First, we assessed the relationship between visual and lexical memory at the level of items using Pearson’s correlations. As shown by Fig. Relationships between visual and lexical memory. (
To answer this question, we explored the relationship between visual and lexical memory hit rates using mediation analyses, which helps to explore potential sources of variation mediating an observed relationship between a predictor and outcome variable. In our analysis, we posited the significant predictors in the regression output above (see Table Mediation between visual and lexical memory hit rates Mediator a b c’ ab R2 CI Visual predictors Early DNN 0.11 (3.03) 0.06 (1.86) 0.40 (11.00) 0.01 0.12 [0, 0.02] Middle DNN -0.08 (-1.52) -0.04 (-1.62) 0.40 (11.14) 0.00 0.11 [0, 0.01] Late DNN -0.03 (-0.55) 0.10 (4.09) 0.41 (11.38) 0.00 0.13 [-0.02, 0.01] Semantic predictors CS 0.03 (5.83) 1.52 (7.01) 0.36 (10.02) 0.05 0.15 [0.03, 0.07] MD 0.01 (0.92) -0.06 (-0.24) 0.40 (11.18) 0.00 0.11 [0, 0] CSxD 0.46 (2.40) 0.03 (4.21) 0.39 (10.96) 0.01 0.13 [0, 0.03]
The current analysis presents a large-scale database of object concepts with a comprehensive assessment of visual and semantic properties, as well as visual and lexical memorability information based on independent tests of object memory. In our analysis, we identified two critical observations that help to guide future analyses of object memory. First, we found that complex visual and semantic features (e.g., DNN-based entropy, correlational strength of features) were strong predictors of item memory for both visual and lexical tests, a finding that highlights the importance of considering these fine-grained relationships between items in stimuli selection. Second, we found that despite the fact that visual and lexical item memory had a relatively modest correlation in hit rates across items, performance was driven by similar object properties, and that correlational strength helped to explain the relationship between these memory scores. This result suggests that object representations have distinct visual and verbal components, but remain grounded in the same core substrate. Furthermore, this suggests that memorability may not be an intrinsic quality of an image but may instead depend on the context under which the memory for such image is tested.
A principal contribution of the current article is a highly dimensional image corpus comprising of 995 concrete objects spanning a wide range of different categories, along with visual and semantic features and output from the associated memory study, all of which are available within a centralized database (
In the visual domain, we contribute a rich form of visual statistics based on three separate layers of a deep convolutional neural network. There is evidence that DNNs share some similar properties to the ventral visual pathway (Cichy et al.,
A major finding from this analysis is that semantic feature statistics associated with complex semantic properties, based on the interrelatedness of the constituent features of items (e.g., mean distinctiveness, correlational strength), were significant predictors of hit rates on the visual and lexical memory tests. This novel finding demonstrates that feature statistics that describe the interrelatedness of constituent features across items have a strong mnemonic value for both perceptual and conceptual memory. The direction of this prediction was intuitive when one places a premium on interrelatedness. Correlational strength had a strong positive prediction score (Table
Visual working-memory and episodic memory research are both based on the premise that objects are represented as bound units (Brady et al.,
These meaningful properties have been shown to be predictive of memory in other studies. In Brady et al. (
Our second finding showed that similar item characteristics predict memory on both tests, and that visual and lexical hit rates were correlated across items. This finding supports the idea that the organization of perceptual and conceptual memory traces share similar underlying item characteristics, but nonetheless draw on a number of unique representational forms based on the modality of retrieval (Saffran et al.,
What do these findings mean for the study of visual memory of everyday things? Our data show strong support for the idea that the fidelity of long-term memory representations is driven by complex semantic properties that describe the structure of an object’s constituent features. Such a result provides an item-wise perspective on a range of previous studies showing that, compared to perceptual information, semantic information can have stronger influences on both hit rates and false alarm rates. For example, the fact that false alarm rates were predicted by all three complex feature-based semantic statistics (MD, CS, and CSxD; see Table
Our article attempts to bridge the object perception literature (which focuses on item-wise properties) and the episodic memory literature (which focuses on subject-level performance). Given our interest in measuring memory for items, we averaged the frequency of hits (1) and misses (0) across participants in both the visual and the lexical memory tasks and conducted a linear regression model. However, analyses quantifying subject-level bias towards different object classes or categories might identify not only the degree to which a particular object is memorable across contexts, but also the capacity for biased memory endorsements. For example, more familiar items may be seen as more memorable (Nickerson & Adams,
Lastly, our mediation analyses (Fig.
In addition to furthering our understanding of the influence of different stimulus properties on item-wise memorability of object images, the current results also showcase how broad, comprehensive image databases can be utilized to answer fundamental questions on the nature of memory representations. The current study goes beyond memorability studies that examined a single memory task (Isola et al.,
The authors would like to thank Lorraine Tyler and Barry Devereux for help in the conception of this project, and Alex Raghunandan, Christina Gancayco, Jiyun Yoon, Tobi Akinyelu, and Vincent Zhang for help with data collection.
The data and materials for studies outlined in this paper are available on GitHub (
This research was funded by grant support from the National Institute on Aging grant # AGK01053539 to SWD. AC was supported by a Royal Society and Wellcome Trust Sir Henry Dale Fellowship (211200/Z/18/Z).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.