Author Archives: Muhammad Umair

De Beer, C., Hogrefe, M., Hielscher-Fastabend, M., & De Ruiter, J.P. (/2020/). Evaluating models of gesture and speech production for people with aphasia. /Cognitive Science./

Fill in this form to receive a download link:

Evaluating Models of Gesture and Speech Production for People With Aphasia

People with aphasia use gestures not only to communicate relevant content but also to compensate for their verbal limitations. The Sketch Model (De Ruiter, 2000) assumes a flexible relationship between gesture and speech with the possibility of a compensatory use of the two modalities. In the successor of the Sketch Model, the AR-Sketch Model (De Ruiter, 2017), the relationship between ico- nic gestures and speech is no longer assumed to be flexible and compensatory, but instead iconic ges- tures are assumed to express information that is redundant to speech. In this study, we evaluated the contradictory predictions of the Sketch Model and the AR-Sketch Model using data collected from people with aphasia as well as a group of people without language impairment. We only found com- pensatory use of gesture in the people with aphasia, whereas the people without language impair- ments made very little compensatory use of gestures. Hence, the people with aphasia gestured according to the prediction of the Sketch Model, whereas the people without language impairment did not. We conclude that aphasia fundamentally changes the relationship of gesture and speech.

De Ruiter, J.P. (2019). Turn-Taking. In: Cummins, C. & Katsos, N. (eds.), Oxford Handbook of Experimental Semantics and Pragmatics.

Fill in this form to receive a download link:

In their informal verbal exchanges people tend to follow the ‘one speaker at the time’ rule (Schegloff, 1968). The use of the term ‘turn-taking’ to describe the process in which this rule operates in human conversation is relatively recent, and attributed to Yngve (1970) and Goffman (1967) by Duncan (1972).

Especially since the famous 1974 paper by Harvey Sacks, Emanuel Schegloff, & Gail Jefferson in the journal Language, which marks the birth of the sociological discipline now called Conversation Analysis (CA), turn-taking in conversation has attracted attention from a variety of disciplines.

In this chapter, I will briefly summarize the main theoretical approaches and contro- versies regarding turn-taking, followed by some reflections on different ways it can be studied experimentally.

Magyari, L., De Ruiter, J.P., & Levinson, L. (2017). Temporal Preparation for Speaking in Question- Answer Sequences. Frontiers in Psychology 8.

Fill in this form to receive a download link:

In every-day conversations, the gap between turns of conversational partners is most frequently between 0 and 200 ms. We were interested how speakers achieve such fast transitions. We designed an experiment in which participants listened to pre-recorded questions about images presented on a screen and were asked to answer these questions. We tested whether speakers already prepare their answers while they listen to questions and whether they can prepare for the time of articulation by anticipating when questions end. In the experiment, it was possible to guess the answer at the beginning of the questions in half of the experimental trials. We also manipulated whether it was possible to predict the length of the last word of the questions. The results suggest when listeners know the answer early they start speech production already during the questions. Speakers can also time when to speak by predicting the duration of turns. These temporal predictions can be based on the length of anticipated words and on the overall probability of turn durations.

Magyari, L., & De Ruiter, J. P. (2012). Prediction of turn-ends based on anticipation of upcoming words. Frontiers in Psychology 3, 376.

Fill in this form to receive a download link:

During conversation listeners have to perform several tasks simultaneously. They have to comprehend their interlocutor’s turn, while also having to prepare their own next turn. Moreover, a careful analysis of the timing of natural conversation reveals that next speakers also time their turns very precisely. This is possible only if listeners can predict accurately when the speaker’s turn is going to end. But how are people able to predict when a turn- ends? We propose that people know when a turn-ends, because they know how it ends. We conducted a gating study to examine if better turn-end predictions coincide with more accurate anticipation of the last words of a turn. We used turns from an earlier button-press experiment where people had to press a button exactly when a turn-ended. We show that the proportion of correct guesses in our experiment is higher when a turn’s end was esti- mated better in time in the button-press experiment. When people were too late in their anticipation in the button-press experiment, they also anticipated more words in our gating study. We conclude that people made predictions in advance about the upcoming content of a turn and used this prediction to estimate the duration of the turn. We suggest an eco- nomical model of turn-end anticipation that is based on anticipation of words and syntactic frames in comprehension.

Loth, S., Huth, K., & De Ruiter, J. P. (2013). Automatic detection of service initiation signals used in bars. Frontiers in Psychology, 4 (557). http://doi.org/10.3389/fpsyg.2013.00557

Fill in this form to receive a download link:

Recognizing the intention of others is important in all social interactions, especially in the service domain. Enabling a bartending robot to serve customers is particularly challenging as the system has to recognize the social signals produced by customers and respond appropriately. Detecting whether a customer would like to order is essential for the service encounter to succeed. This detection is particularly challenging in a noisy environment with multiple customers. Thus, a bartending robot has to be able to distinguish between customers intending to order, chatting with friends or just passing by. In order to study which signals customers use to initiate a service interaction in a bar, we recorded real-life customer-staff interactions in several German bars. These recordings were used to generate initial hypotheses about the signals customers produce when bidding for the attention of bar staff. Two experiments using snapshots and short video sequences then tested the validity of these hypothesized candidate signals. The results revealed that bar staff responded to a set of two non-verbal signals: first, customers position themselves directly at the bar counter and, secondly, they look at a member of staff. Both signals were necessary and, when occurring together, sufficient. The participants also showed a strong agreement about when these cues occurred in the videos. Finally, a signal detection analysis revealed that ignoring a potential order is deemed worse than erroneously inviting customers to order. We conclude that (a) these two easily recognizable actions are sufficient for recognizing the intention of customers to initiate a service interaction, but other actions such as gestures and speech were not necessary, and (b) the use of reaction time experiments using natural materials is feasible and provides ecologically valid results.

Loth, S., Guliani, M., Jettka, K., Kopp, S. & De Ruiter, J.P. (2018). Confidence in uncertainty: Error cost and commitment in early speech hypotheses. PLoS One.

Fill in this form to receive a download link:

  1. Interactions with artificial agents often lack immediacy because agents respond slower
  2. 20  than their users expect. Automatic speech recognisers introduce this delay by analysing a
  3. 21  user’s utterance only after it has been completed. Early, uncertain hypotheses of incremental
  4. 22  speech recognisers can enable artificial agents to respond more timely. However, these
  5. 23  hypotheses may change significantly with each update. Therefore, an already initiated action
  6. 24  may turn into an error and invoke error cost. We investigated whether humans would use
  7. 25  uncertain hypotheses for planning ahead and/or initiating their response. We designed a
  8. 26  Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending
  9. 27  robot and perceived the scene only through its recognisers. The results showed that
  10. 28  participants used uncertain hypotheses for selecting the best matching action. This is
  11. 29  comparable to computing the utility of dialogue moves. Participants evaluated the available
  12. 30  evidence and the error cost of their actions prior to initiating them. If the error cost was low,
  13. 31  the participants initiated their response with only suggestive evidence. Otherwise, they waited
  14. 32  for additional, more confident hypotheses if they still had time to do so. If there was time
  15. 33  pressure but only little evidence, participants grounded their understanding with echo
  16. 34  questions. These findings contribute to a psychologically plausible policy for human-robot
  17. 35  interaction that enables artificial agents to respond more timely and socially appropriately
  18. 36  under uncertainty.

Loth, S., Jettka, K., Giuliani, M., & De Ruiter, J. P. (2015). Ghost-in-the-Machine reveals human social signals for human–robot interaction. Frontiers in Psychology, 6. http://doi.org/10.3389/fpsyg.2015.01641

Fill in this form to receive a download link:

We used a new method called “Ghost-in-the-Machine” (GiM) to investigate social interactions with a robotic bartender taking orders for drinks and serving them. Using the GiM paradigm allowed us to identify how human participants recognize the intentions of customers on the basis of the output of the robotic recognizers. Specifically, we measured which recognizer modalities (e.g., speech, the distance to the bar) were relevant at different stages of the interaction. This provided insights into human social behavior necessary for the development of socially competent robots. When initiating the drink-order interaction, the most important recognizers were those based on computer vision. When drink orders were being placed, however, the most important information source was the speech recognition. Interestingly, the participants used only a subset of the available information, focussing only on a few relevant recognizers while ignoring others. This reduced the risk of acting on erroneous sensor data and enabled them to complete service interactions more swiftly than a robot using all available sensor data. We also investigated socially appropriate response strategies. In their responses, the participants preferred to use the same modality as the customer’s requests, e.g., they tended to respond verbally to verbal requests. Also, they added redundancy to their responses, for instance by using echo questions. We argue that incorporating the social strategies discovered with the GiM paradigm in multimodal grammars of human– robot interactions improves the robustness and the ease-of-use of these interactions, and therefore provides a smoother user experience.

Johannsen, K., & De Ruiter, J. P. (2013). Reference frame selection in dialogue: priming or preference? Frontiers in Human Neuroscience, 7, 667.

Fill in this form to receive a download link:

We investigate effects of priming and preference on frame of reference (FOR) selection in dialog. In a first study, we determine FOR preferences for specific object configurations to establish a baseline. In a second study, we focus on the selection of the relative or the intrinsic FOR in dialog using the same stimuli and addressing the questions whether (a) interlocutors prime each other to use the same FOR consistently or (b) the preference for the intrinsic FOR predominates priming effects. Our results show effects of priming (more use of the relative FOR) and a decreased preference for the intrinsic FOR. However, as FOR selection did not have an effect on target trial accuracy, neither effect alone represents the key to successful communication in this domain. Rather, we found that successful communication depended on the adaptation of strategies between interlocutors: the more the interlocutors adapted to each other’s strategies, the more successful they were.

Johannsen, K., & De Ruiter, J. P. (2013). The role of scene type and priming in the processing and selection of a spatial frame of reference. Frontiers in Psychology, 4, 182.

Fill in this form to receive a download link:

The selection and processing of a spatial frame of reference (FOR) in interpreting verbal scene descriptions is of great interest to psycholinguistics. In this study, we focus on the choice between the relative and the intrinsic FOR, addressing two questions: (a) does the presence or absence of a background in the scene influence the selection of a FOR, and (b) what is the effect of a previously selected FOR on the subsequent processing of a different FOR. Our results show that if a scene includes a realistic background, this will make the selection of the relative FOR more likely. We attribute this effect to the facilitation of mental simulation, which enhances the relation between the viewer and the objects. With respect to the response accuracy, we found both a higher (with the same FOR) and a lower accuracy (with a different FOR), while for the response latencies, we only found a delay effect with a different FOR.

Jansen, S., Wesselmeier, H., De Ruiter, J. P., & Mueller, H. M. (2014). Using the readiness potential of button-press and verbal response within spoken language processing. Journal of Neuroscience Methods, 232, 24-29.

Fill in this form to receive a download link:

Background: Even though research in turn-taking in spoken dialogues is now abundant, a typical EEG- signature associated with the anticipation of turn-ends has not yet been identified until now.
New method: The purpose of this study was to examine if readiness potentials (RP) can be used to study the anticipation of turn-ends by using it in a motoric finger movement and articulatory movement task. The goal was to determine the preconscious onset of turn-end anticipation in early, preconscious turn-end anticipation processes by the simultaneous registration of EEG measures (RP) and behavioural measures (anticipation timing accuracy, ATA). For our behavioural measures, we used both button-press and verbal response (“yes”). In the experiment, 30 subjects were asked to listen to auditorily presented utterances and press a button or utter a brief verbal response when they expected the end of the turn. During the task, a 32-channel-EEG signal was recorded.

Results: The results showed that the RPs during verbal- and button-press-responses developed similarly and had an almost identical time course: the RP signals started to develop 1170 vs. 1190 ms before the behavioural responses.
Comparison with existing methods: Until now, turn-end anticipation is usually studied using behavioural methods, for instance by measuring the anticipation timing accuracy, which is a measurement that reflects conscious behavioural processes and is insensitive to preconscious anticipation processes. Conclusion: The similar time course of the recorded RP signals for both verbal- and button-press responses provide evidence for the validity of using RPs as an online marker for response preparation in turn-taking and spoken dialogue research.