How does context influence object recognition




















That is, does scene context influence object processing irrespective of attentional focus or does it influence object processing only when attention is perhaps briefly focused on the background scene, due to ambiguity concerning the location of the target object. Providing participants with knowledge concerning the location of the target object would allow focused attention to be deployed to the target location, drawing attention away from the background scene.

Would reduced attention to the background scene reduce the semantic consistency effect? The aim of Experiment 3a was to investigate the role of spatial attention in the semantic consistency effects observed in Experiments 1 and 2.

Specifically, when the location of the target object is known in advance, would scene context still influence object processing? Participants were rewarded by course credit or a monetary reimbursement.

The setup of Experiment 3a was highly similar to Experiment 2. Four additional stimuli were added to the stimulus set used in Experiment 2, resulting in a total of 60 stimuli.

One major change to the design consisted of adding a location cue prior to the onset of the natural scene. The location pre-cue consisted of a small black cross 0. Participants were instructed to covertly attend the location cue, while remaining fixated on the center of the screen.

The invalid cues were presented at the same vertical position as the valid cue, but mirrored over the vertical meridian of the scene, resulting in a horizontally shifted cue toward the opposite visual hemifield.

Due to the target objects being relatively large foreground objects pasted into the scene, invalid cues generally did not lead to ambiguity as to which object functioned as the target object. For those scenes in which this was nonetheless the case, the cue was horizontally shifted until it no longer fell on a possible alternative target object.

Semantic consistency and cue validity were counterbalanced across participants, in such a way that each target object was presented in a semantically consistent and inconsistent setting as well as being cued validly or invalidly equally often.

The procedure of this experiment was similar to the procedure used in the previous experiments. After the initial fixation, the location cue would be presented for ms. The duration of the blank screen prior to the onset of the natural scene was extended from to ms in order to give the participants additional time to process the cue. Other than these changes trials were identical to Experiment 2. Similar to Experiments 1 and 2, five independent raters, employing the same method and cut-off as in the previous experiments, rated all answers.

In other words, the effect of scene context on object recognition was not significantly attenuated when spatial attention was focused on the target object, relative to when attention was focused on part of the background scene.

Scene consistency effects as a function of attentional cueing to the target location in A Experiment 3a and B Experiment 3b. Similar to Experiments 1 and 2, participants were more accurate in recognizing objects presented on a consistent background, compared to an inconsistent background. Importantly, scene consistency effects were equally strong for trials in which the location of the target was cued as for trials in which the location of the target object was not Experiment 3b or invalidly Experiment 3a cued.

To extend and replicate the findings of Experiment 3a, a different cueing procedure was used in Experiment 3b. Instead of using attentional pre-cues, target location information in Experiment 3b was provided, on half the trials, by a salient red rectangle centered on the target object. Because all images were in grayscale, the red rectangle should pop out and capture attention e.

Participants were rewarded by course credit or a monetary reward. The same stimulus set used in Experiment 3a was used in Experiment 3b. On the remainder of the trials no cue was presented. Cued items were semi-randomly chosen in such a way that half the trials that contained a cue consisted of consistent object-scene pairs, whereas the other half consisted of inconsistent object-scene pairs. Similar to the previous experiments, five independent raters scored the answers given by the 14 participants.

In line with Experiment 3a the lack of an interaction shows that knowledge about the correct target location does not attenuate the consistency effect Figure 6B. Experiments 3a and 3b suggest that target location ambiguity does not influence the size of the consistency effect. Consequently an ANOVA was performed with two within-subjects variables: consistency consistent, inconsistent and cueing cued, uncued.

The uncued condition contains the invalid cue condition of Experiment 3a and the no-cue condition of Experiment 3b in both cases, the target location was not cued. Additionally, cue-type pre-cue, simultaneous cue , as defined by the two different experiments, was included as a between-subjects variable. The results of Experiments 3a and 3b converge to show that the scene consistency effect is independent of attentional focus; similar consistency effects were observed when the target object location was known as when it was not known.

Thus, the background scene appears to influence object processing even when attention was not directed at the scene. This finding suggests that semantic consistency effects in the current experiments were primarily driven by scene properties that are, to some extent, independent of the focus of attention.

The main effect of validity in Experiment 3a shows that participants actively used the pre-cue to focus attention on the target object, thus verifying the effectiveness of the cueing manipulation. There may be several reasons why the simultaneous cue in Experiment 3b did not have the same facilitative effect. Alternatively, or additionally, attention may have been captured by the red rectangle itself and then redirected to the target object within the rectangle.

In this scenario, attention in the cued condition would first be directed at the red rectangle before being directed at the target object, whereas in the uncued condition attention would initially be directed at the scene before being directed to the target object. Although in this case there would be no net benefit of the cue, the cue would nonetheless have directed attention away from the scene. Crucially, the magnitude of the consistency effect was similar regardless of the presence or the absence of a cue.

Therefore, these results suggest that the consistency effect does not depend on attentive processing of the background scene. The current study tested which aspects of the scene background drive the semantic consistency effect on object naming, as observed previously Davenport and Potter, and in the current Experiment 1.

A highly controlled stimulus set was used in order to gain further insight into the role of low-level visual and shape properties on the consistency effect Experiment 2. In addition, the influence of focused attention was studied Experiments 3a and 3b. Experiment 1 clearly replicated the effects observed by Davenport and Potter , showing that participants were more accurate in naming a briefly presented object when it was placed on a semantically consistent background, as compared to a semantically inconsistent background.

Compared to Experiment 1, more controlled stimulus set was used in Experiment 2, minimizing the effects of differential low-level visual feature and shape overlap between target object and background scene. Regardless of these changes, a strong consistency effect was also observed in Experiment 2. Therefore, it seems most likely that the observed scene-object interaction is based on the extraction of semantic information derived from the scenes.

Experiments 3a and 3b showed that even when participants had prior knowledge concerning the location of the presented object, a semantic consistency effect was still observed. No difference in the magnitude of the consistency effect was found when the location of the sought-after object was known, allowing focused attention to the target, compared to when its location was unknown, resulting in attending the background scene. These results indicate that the location of spatial attention does not influence the effects of scene context on object processing.

Therefore, the semantic consistency effect appears not to require attentive processing of the background scene. This conclusion is in line with the results of Davenport and Potter , who tested the semantic consistency effect both in an experiment in which only the foreground object had to be reported the background scene could be ignored and in an experiment in which both the background and the foreground object had to be reported.

These experiments revealed similar semantic consistency effects, suggesting that actively reporting and thus attending the background scene was not required for it to affect the recognition of the foreground object. Furthermore, foreground objects were reported equally accurately in the experiment in which the background scene had to be reported as in the experiment in which the background scene was irrelevant, supporting the proposal that objects and scenes are processed interactively rather than in isolation Davenport and Potter, Which properties of the background scene might drive the semantic consistency effect?

One way to frame this question is with respect to the distinction between local versus global scene properties. Global scene properties, such as a statistical summary of spatial layout properties, are thought to be processed rapidly and in parallel to local object processing Oliva and Torralba, Importantly, the processing of global scene statistics requires minimal attentional resources Ariely, ; Chong and Treisman, , by contrast to local object processing.

The current finding that semantic consistency effects occur independently of whether attention is located on the target object or on the background scene is consistent with the hypothesis that the semantic information is derived from a coarse global representation of the scene, such as its overall structure and coarse spatial layout.

Scene gist might be sufficient to activate a scene schema Biederman et al. Future experiments could further test the hypothesis that global scene properties drive the scene consistency effect, for example by manipulating the spatial frequency content of the scenes global properties should be preserved in low-pass filtered scenes.

To summarize, object recognition is facilitated when the object is presented in a semantically consistent context. This effect cannot be attributed to overlap in low-level visual features or object shape between target object and scene background.

The finding that contextual effects were observed for scenes and objects that are presented for merely 80 ms suggests that scene context is processed rapidly. Additionally the processing of scene context was largely independent of top-down spatial attention. Together, these results are consistent with the proposal that rapidly derived scene gist facilitates the recognition of objects that are semantically related to the scene Bar, ; Torralba et al.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. We thank Dr. Davenport for providing us with the stimuli used in Experiment 1. Ariely, D. Seeing sets: representation by statistical properties.

Bar, M. Visual objects in context. Top-down facilitation of visual recognition. Biederman, I. Scene perception: detecting and judging objects undergoing relational violations. Brockmole, J. Note that the arrows do not necessarily reflect direct connections between brain regions.

The finding that EVC for isolated object recognition and LOC for context-based object recognition causally supported object recognition well beyond the feedforward sweep suggests that feedback processing is required for accurate object recognition. For context-based object recognition, the scene represented in OPA would be the global element, providing a prior for processing the relatively more local shape of the object represented in LOC.

For isolated object recognition, object shape would be the global element, providing a prior for processing the relatively more local inner object features e. Feedback based on the more global representations thus serves to disambiguate the representation of more local representations. While feedback processing was hypothesized for LOC based on previous neuroimaging findings, we did not hypothesize that feedback to EVC would be required for recognizing isolated objects. Future studies are needed to test under what conditions feedback to EVC causally contributes to object recognition Camprodon et al.

In line with the reverse hierarchy theory, we expect that the specific feedback that is useful for a given task—and the brain regions involved—depend on the available information in the image together with specific task demands Hochstein and Ahissar, An alternative interpretation of the relatively late causal involvement of EVC in isolated object recognition, and LOC in context-based object recognition, is that these effects reflect local recurrence rather than feedback.

This interpretation cannot be ruled out based on the current results alone. However, based on previous findings, we think this is unlikely, at least for LOC. In the fMRI study that used a similar stimulus set as used here Brandman and Peelen, , representations of degraded objects in LOC were facilitated relative to degraded objects alone by the presence of scene context, indicating input from outside of LOC considering that LOC did not represent object information from scenes presented alone.

Furthermore, the corresponding MEG study showed two peaks for degraded objects in scenes, one at — ms and one at — ms. The later peak showed a significant contextual facilitation effect in the MEG study, with better decoding of degraded objects in scenes than degraded objects alone. The present finding that TMS over LOC at — ms selectively impaired context-based object recognition is fully in line with these fMRI and MEG findings, pointing to feedback processing rather than local recurrence.

Taken together with previous findings, the current results are thus best explained by an account in which information from scenes processed in scene-selective cortex feeds back to LOC to disambiguate object representations. This mechanism may underlie the behavioral benefits previously observed for object recognition in semantically and syntactically congruent vs.

The current TMS results suggest that OPA is crucial for extracting this global scene information at around — ms after scene onset, and that this information is integrated with local object information in LOC around ms later. The current results do not speak to whether OPA-LOC connectivity is direct or indirect, for example involving additional brain regions such as other scene-selective regions or the orbitofrontal cortex Bar, Our study raises the interesting question of what type of context-based expectations help to disambiguate object representations in LOC.

The scenes in the current study provided multiple cues that may help to recognize the degraded objects. Both of these cues may help to recognize objects Biederman et al. Future experiments could test whether feedback to LOC is specifically related to one of these cues. For example, one could test whether similar effects are found when objects are presented in semantically uninformative scenes, with the scene only providing information about the approximate real-world size of the object.

To conclude, the current study provides causal evidence that context-based expectations facilitate object recognition by disambiguating object representations in the visual cortex. More generally, results reveal that distinct neural mechanisms support object recognition based on local features and global scene context.

Future experiments may extend our approach to include other contextual features such as co-occurring objects, temporal context, and input from other modalities. Prior to experimentation, we decided to test 24 participants in all three experiments. Participants were excluded if they reported to have one of the following: CNS-acting medication, previous neurosurgical treatments, metal implants in the head or neck area, migraine, epilepsy or previous cerebral seizures also within their family , pacemaker, intracranial metal clips, cochlea implants, or pregnancy.

Additionally, participants were asked to refrain from consuming alcohol and recreational drugs 72 hr before the experiment and refrain from consuming coffee 2 hr before the experiment. Participants were divided over three experiments, targeting three cortical areas, based on a previous experiment.

Prior to the experimental session, participants were informed about the experimental procedures and gave written informed consent. Given that latency of visual cortex activation varies across participants, a two-pulse TMS design was chosen since it allows for a broader time window of disruption while maintaining relatively good temporal resolution O'Shea et al.

Each stimulation location was identified through Talairach coordinates set in the Localite neuronavigation system. The coordinates were 45, —74, 0 for LOC Pitcher et al. We then established the optimal coil position in such a way that phosphenes were reported centrally in the visual field, where the stimuli were presented. Stimuli consisted of scene photographs with a single object belonging to one of the following eight categories: airplane, bird, car, fish, human, mammal, ship, and train.

For the isolated object recognition task, the object was cropped out of the scene and presented at its original location on a gray background. For the context-based object recognition task, the object was pixelated to remove local features.

The experiment additionally included a scene-alone condition, in which the object was cropped out and replaced with background using a content-aware fill tool.

To avoid that participants could recognize the degraded objects in scenes based on having seen their intact version, the stimulus set was divided into two halves: for each participant, half of the stimuli were used in the context-based object condition, and the other half of the stimuli were used both in the isolated object condition and the scene-alone condition. This assignment was counterbalanced across participants. Before the experiment, participants received instructions and were presented with an example stimulus which was not used in the main experiment.

This example displayed how each stimulus variation context-based, isolated object, and scene alone was derived from an original photograph. For the main task, each trial started with a fixation cross ms , followed by a stimulus presented for 33 ms. Next, a blank screen was shown for ms. After this, participants were asked to respond by pressing one out of eight possible keys according to the object category presented Figure 2.

No limit on RT was given. However, participants were encouraged during the instructions to respond within 3 s. The response screen was presented until the participant responded. The next trial started after a 2 s inter-trial interval.

This relatively long interval was chosen to prevent repetitive TMS effects. TMS was applied at one of three different time points, with randomized order. TMS pulses could be applied at 60 ms and ms after stimulus onset, ms and ms after stimulus onset, or ms and ms after stimulus onset.

In 2 participants out of the 72 1 in the LOC experiment and 1 in the EVC experiment , each pulse was accidentally delivered 16 ms earlier than described above. Each stimulus was repeated three times, once for each TMS timing 60— ms, — ms, and — ms. This resulted in a total of trials, which were presented in a random order. To avoid fatigue, the task was divided into 12 blocks of 48 trials, each lasting approximately 4 min, with short breaks in between of approximately 1 min.

Thus, completing the task took about 60 min. The total duration of the experiment, including preparation and PT determination, was approximately 90 min.

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figure 3. Our editorial process produces two outputs: i public reviews designed to be posted alongside the preprint for the benefit of readers; ii feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

This study will be of interest to scientists involved in high-level vision. The data provide a compelling demonstration of the causal role of three key visual areas in context-based object recognition. The key claims of the manuscript are supported by the data, and are strengthened by the pre-registration of each of the three experiments.

Thank you for submitting your article "Causal neural mechanisms of context-based object recognition" for consideration by eLife. The following individual involved in review of your submission has agreed to reveal their identity: Peter Kok Reviewer 3. The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

The authors should either provide a strong justification for this deviation or else run and report the statistics as originally planned. Exactly what tasks and stimuli were participants exposed to? How many participants were excluded based on this procedure? A statement should be added to the main text flagging this procedure to the reader so that they are clear that the basic main effect of TMS to LOC object-based task performance was pre-ordained.

The study hypotheses could be articulated more clearly. The legend of Figure 3 would seem a good place to lay out the predictions in more detail. The area identified as EVA should be clarified in the Introduction. What visual regions does this area encompass? Just V1? The authors argue that feedback based on more global representations of the scene serve to disambiguate more local representations of the object.

The scene-only condition seems important to this interpretation, and I would suggest discussing this manipulation further in the main text. An alternative interpretation is that scene recognition simply reduces the epistemic priors of what the object could be e. So scene recognition seems to filter the range of possible correct responses in the first place, and the effects that are observed may be separate to the capacity of scene recognition to disambiguate specific features of the object itself.

I take the authors' point that the absent effect of TMS on accuracy in the scene-only condition argues against this as a possibility. Background Citations. Methods Citations. Results Citations. Figures and Topics from this paper. Citation Type. Has PDF. Publication Type. More Filters. Guidance of visual attention by semantic information in real-world scenes. Highly Influenced.

View 4 excerpts, cites background. Object recognition continues to be a challenging area of research, especially for objects situated in their realworld environments. Yet, people are able to recognize objects in their daily … Expand. View 2 excerpts, cites background.



0コメント

  • 1000 / 1000