Johannsen, K., & De Ruiter, J. P. (2013). The role of scene type and priming in the processing and selection of a spatial frame of reference. Frontiers in Psychology, 4, 182.

Fill in this form to receive a download link:

The selection and processing of a spatial frame of reference (FOR) in interpreting verbal scene descriptions is of great interest to psycholinguistics. In this study, we focus on the choice between the relative and the intrinsic FOR, addressing two questions: (a) does the presence or absence of a background in the scene influence the selection of a FOR, and (b) what is the effect of a previously selected FOR on the subsequent processing of a different FOR. Our results show that if a scene includes a realistic background, this will make the selection of the relative FOR more likely. We attribute this effect to the facilitation of mental simulation, which enhances the relation between the viewer and the objects. With respect to the response accuracy, we found both a higher (with the same FOR) and a lower accuracy (with a different FOR), while for the response latencies, we only found a delay effect with a different FOR.

Jansen, S., Wesselmeier, H., De Ruiter, J. P., & Mueller, H. M. (2014). Using the readiness potential of button-press and verbal response within spoken language processing. Journal of Neuroscience Methods, 232, 24-29.

Fill in this form to receive a download link:

Background: Even though research in turn-taking in spoken dialogues is now abundant, a typical EEG- signature associated with the anticipation of turn-ends has not yet been identified until now.
New method: The purpose of this study was to examine if readiness potentials (RP) can be used to study the anticipation of turn-ends by using it in a motoric finger movement and articulatory movement task. The goal was to determine the preconscious onset of turn-end anticipation in early, preconscious turn-end anticipation processes by the simultaneous registration of EEG measures (RP) and behavioural measures (anticipation timing accuracy, ATA). For our behavioural measures, we used both button-press and verbal response (“yes”). In the experiment, 30 subjects were asked to listen to auditorily presented utterances and press a button or utter a brief verbal response when they expected the end of the turn. During the task, a 32-channel-EEG signal was recorded.

Results: The results showed that the RPs during verbal- and button-press-responses developed similarly and had an almost identical time course: the RP signals started to develop 1170 vs. 1190 ms before the behavioural responses.
Comparison with existing methods: Until now, turn-end anticipation is usually studied using behavioural methods, for instance by measuring the anticipation timing accuracy, which is a measurement that reflects conscious behavioural processes and is insensitive to preconscious anticipation processes. Conclusion: The similar time course of the recorded RP signals for both verbal- and button-press responses provide evidence for the validity of using RPs as an online marker for response preparation in turn-taking and spoken dialogue research.

De Ruiter, J. P. (2012). Questions are what they do. In De Ruiter, J. P. (Ed.), Questions: formal, functional, and interactional perspectives (pp. 1-9). Cambridge: Cambridge University Press.

Fill in this form to receive a download link:

What is a question? Many laymen, including cognitive scientists of varying persuasions, report having a pretty good idea about what a question is and do not see an urgent need to engage in hair-splitting about it: if one needs certain information that is lacking, and someone else has ac- cess to this information, one can specify the lacking information in the form of a question, formu- late this question at the someone else, who will then, if he or she so chooses, provide the desired information, in the form of an answer. What‘s the big deal?

De Ruiter, J. P. (2012). How (not) to give a talk. Geo.brief 6.

Fill in this form to receive a download link:

Have you also sometimes wondered why more than 90% of the scientific presentations you go to are so excruciatingly boring? Would you like to be responsible for compelling a large group of busy professionals to politely but reluctantly freeze their brains and put their entire mental lives on hold for up to two hours, and simply wait until you have finally stopped making sounds? If you don’t, I suggest you follow the following rules strictly.

Healey, P. G., De Ruiter, J. P., & Mills, G. J. (2018). Editors’ Introduction: Miscommunication. Topics in cognitive science, 10(2), 264-278.

Fill in this form to receive a download link:

Miscommunication is a neglected issue in the cognitive sciences, where it has often been dis- counted as noise in the system. This special issue argues for the opposite view: Miscommunication is a highly structured and ubiquitous feature of human interaction that systematically underpins peo- ple’s ability to create and maintain shared languages. Contributions from conversation analysis, com- putational linguistics, experimental psychology, and formal semantics provide evidence for these claims. They highlight the multi-modal, multi-person character of miscommunication. They demon- strate the incremental, contingent, and locally adaptive nature of the processes people use to detect and deal with miscommunication. They show how these processes can drive language change. In doing so, these contributions introduce an alternative perspective on what successful communication is, new methods for studying it, and application areas where these ideas have a particular impact. We conclude that miscommunication is not noise but essential to the productive flexibility of human communication, especially our ability to respond constructively to new people and new situations.

Gaschler, A. Jentzsch, S., Giuliani, M., Huth, K., De Ruiter, J. P., Knoll, A. (2012, October 7-11), Social Behavior Recognition using body posture and head pose for Human-Robot Interaction (pp. 2128-2133). Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura-Algarve, Portugal.

Fill in this form to receive a download link:

Robots that interact with humans in everyday situations, need to be able to interpret the nonverbal social signals of their human interaction partners. We show that humans use body posture and head pose as social signals to initiate and terminate interaction when ordering drinks at a bar. For that, we record and analyze 108 interactions of humans interacting with a human bartender. Based on these findings, we train a Hidden Markov Model (HMM) using automatic body posture and head pose estimation. With this model, the bartender robot of the project JAMES can recognize typical social signals of human customers. Evaluation shows a recognition rate of 82.9 % for all implemented social signals and in particular a recognition rate of 91.2 % for bartender attention requests, which allows the robot to interact with multiple humans in a robust and socially appropriate way.

Gaschler, A., Huth, K., Giuliani, M., Kessler, I., De Ruiter, J. P., & Knoll, A. (2012, March 5-8). Modelling State of Interaction from Head Poses for Social Human-Robot Interaction. In Proceedings of the Gaze in Human-Robot Interaction Workshop held at the 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), Boston, US

Fill in this form to receive a download link:

In this publication, we analyse how humans use head pose in various states of an interaction, in both human-human and human-robot observations. Our scenario is the short-term, every-day interaction of a customer ordering a drink from a bartender. To empirically study the use of head pose in this scenario, we recorded 108 such interactions in real bars. The analysis of these recordings shows, (i) customers follow a defined script to order their drink—attention request, or- dering, closing of interaction—and (ii) customers use head pose to nonverbally request the attention of the bartender, to signal the ongoing process, and to close the interaction.

Based on these findings, we design a hidden Markov model that reflects the typical interaction states in the bar scenario and implement it on the human-robot interaction system of the European JAMES project. We train the model with data from an automatic head pose estimation algorithm and additional body pose information. Our evaluation shows that the model correctly recognises the state of interaction of a customer in 78% of all states. More specifically, the model recognises the interaction state “attention to bartender” with 83% and “attention to another guest” with 73% correctness, providing the robot sufficient knowledge to begin, perform, and end interactions in a socially appropriate way.

Ziegler, L., Johannsen, K., Swadzba, A., De Ruiter, J. P., & Wachsmuth, S. (2012). Exploiting spatial descriptions in visual scene analysis. Cognitive Processes 13, 369-374.

Fill in this form to receive a download link:

The reliable automatic visual recognition of indoor scenes with complex object constellations using only sensor data is a nontrivial problem. In order to improve the construction of an accurate semantic 3D model of an indoor scene, we exploit human-produced verbal descriptions of the relative location of pairs of objects. This requires the ability to deal with different spatial reference frames (RF) that humans use interchangeably. In German, both the intrinsic and relative RF are used frequently, which often leads to ambiguities in referential communication. We assume that there are certain regularities that help in specific contexts.

In a first experiment, we investigated how speakers of German describe spatial relationships between different pieces of furniture. This gave us important information about the distribution of the RFs used for furniture-predicate combinations, and by implication also about the preferred spatial predicate. The results of this experiment are compiled into a computational model that extracts partial orderings of spatial arrangements between furniture items from verbal descriptions.

In the implemented system, the visual scene is initially scanned by a 3D camera system. From the 3D point cloud, we extract point clusters that suggest the presence of certain furniture objects. We then integrate the partial orderings extracted from the verbal utterances incrementally and cumulatively with the estimated probabilities about the identity and location of objects in the scene, and also estimate the probable orientation of the objects. 

This allows the system to significantly improve both the accuracy and richness of its visual scene representation.

Enfield, N., Kita, S., & De Ruiter, J. P. (2007). Primary and secondary pragmatic functions of pointing gestures. Journal of Pragmatics, 39, 1722-1741.

Fill in this form to receive a download link:

This article presents a study of a set of pointing gestures produced together with speech in a corpus of video-recorded ‘‘locality description’’ interviews in rural Laos. In a restricted set of the observed gestures (we did not consider gestures with special hand shapes, gestures with arc/tracing motion, or gestures directed at referents within physical reach), two basic formal types of pointing gesture are observed: B- points (large movement, full arm, eye gaze often aligned) and S-points (small movement, hand only, casual articulation). Taking the approach that speech and gesture are structurally integrated in composite utterances, we observe that these types of pointing gesture have distinct pragmatic functions at the utterance level. One type of gesture (usually ‘‘big’’ in form) carries primary, informationally foregrounded information (for saying ‘‘where’’ or ‘‘which one’’). Infants perform this type of gesture long before they can talk. The second type of gesture (usually ‘‘small’’ in form) carries secondary, informationally backgrounded information which responds to a possible but uncertain lack of referential common ground. We propose that the packaging of the extra locational information into a casual gesture is a way of adding extra information to an utterance without it being on-record that the added information was necessary. This is motivated by the conflict between two general imperatives of communication in social interaction: a social-affiliational imperative not to provide more information than necessary (‘‘Don’t over-tell’’), and an informational imperative not to provide less information than necessary (‘‘Don’t under-tell’’).

# 2007 Elsevier B.V. All rights reserved.