Author Archives: Muhammad Umair

De Ruiter, J. P., Rossignol, S., Vuurpijl, L., Cunningham, D., & Levelt, W. (2003). SLOT: A research platform for investigating multimodal communication. Behavior Research Methods, Instruments, & Computers, 35(3), 408-419.

Fill in this form to receive a download link:

In this article, we present the spatial logistics task (SLOT) platform for investigating multimodal com- munication between 2 human participants. Presented are the SLOT communication task and the soft- ware and hardware that has been developed to run SLOT experiments and record the participants’ mul- timodal behavior. SLOT offers a high level of flexibility in varying the context of the communication and is particularly useful in studies of the relationship between pen gestures and speech. We illustrate the use of the SLOT platform by discussing the results of some early experiments. The first is an experi- ment on negotiation with a one-way mirror between the participants, and the second is an exploratory study of automatic recognition of spontaneous pen gestures. The results of these studies demonstrate the usefulness of the SLOT platform for conducting multimodal communication research in both human– human and human–computer interactions.

De Ruiter, J. P., Noordzij, M., Newman-Norlund, S. E., Hagoort, P., & Toni, I. (2007). On the origin of intentions. In Haggart, P. Rosetti, Y., & Kawato, M. (Eds.), Attention & Performance XXII. Sensorimotor foundation of higher cognition (pp. 593-610). Oxford: Oxford University Press.

Fill in this form to receive a download link:

Any model of motor control or sensorimotor transformations starts from an intention to trigger a cascade of neural computations, yet how intentions themselves are generated remains a mystery. Part of the difficulty in dealing with this mystery might be related to the received wisdom of studying sensorimotor processes and intentions in individual agents. Here we explore the use of an alternative approach, focused on understanding how we induce intentions in other people. Under the assumption that generating intentions in a third person relies on similar mechanisms to those involved in generating first- person intentions, this alternative approach might shed light on the origin of our own intentions. Therefore, we focus on the cognitive and cerebral operations supporting the generation of communicative actions, i.e. actions designed (by a Sender) to trigger (in a Receiver) the recognition of a given communicative intention. We present empirical findings indicating that communication requires the Sender to select his behavior on the basis of a prediction of how the Receiver will interpret this behavior; and that there is spatial overlap between the neural structures supporting the generation of communicative actions and the generation of first-person intentions. These results support the hypothesis that the generation of intentions might be a particular instance of our ability to induce and attribute mental states to an agent. We suggest that motor intentions are retrodictive with respect to the neurophysiological mechanisms that generate a given action, while being predictive with respect to the potential intention attribution evoked by a given action in other agents.

De Ruiter, J. P., Mitterer, H., & Enfield, N. J. (2006). Predicting the end of a speaker’s turn; a cognitive cornerstone of conversation. Language, 82(3), 515-535.

Fill in this form to receive a download link:

A key mechanism in the organization of turns at talk in conversation is the ability to anticipate or PROJECT the moment of completion of a current speaker’s turn. Some authors suggest that this is achieved via lexicosyntactic cues, while others argue that projection is based on intonational contours. We tested these hypotheses in an on-line experiment, manipulating the presence of symbolic (lexicosyntactic) content and intonational contour of utterances recorded in natural con- versations. When hearing the original recordings, subjects can anticipate turn endings with the same degree of accuracy attested in real conversation. With intonational contour entirely removed (leaving intact words and syntax, with a completely flat pitch), there is no change in subjects’ accuracy of end-of-turn projection. But in the opposite case (with original intonational contour intact, but with no recognizable words), subjects’ performance deteriorates significantly. These results establish that the symbolic (i.e. lexicosyntactic) content of an utterance is necessary (and possibly sufficient) for projecting the moment of its completion, and thus for regulating conversa- tional turn-taking. By contrast, and perhaps surprisingly, intonational contour is neither necessary nor sufficient for end-of-turn projection.*

De Ruiter, J. P. (2004, May 25). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces (pp. 38-41). Paris: ELRA – European Language Resources Association.

Fill in this form to receive a download link:

In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.

Language as shaped by the brain

Fill in this form to receive a download link:

It is widely assumed that human learning and the structure of human languages are intimately related. This relationship is frequently suggested to derive from a language-specific biological endowment, which encodes universal, but communicatively arbitrary, principles of language structure (a Universal Grammar or UG). How might such a UG have evolved? We argue that UG could not have arisen either by biological adaptation or non-adaptationist genetic processes, resulting in a logical problem of language evolution. Specifically, as the processes of language change are much more rapid than processes of genetic change, language constitutes a “moving target” both over time and across different human populations, and, hence, cannot provide a stable environment to which language genes could have adapted. We conclude that a biologically determined UG is not evolutionarily viable. Instead, the original motivation for UG – the mesh between learners and languages – arises because language has been shaped to fit the human brain, rather than vice versa. Following Darwin, we view language itself as a complex and interdependent “organism,” which evolves under selectional pressures from human learning and processing mechanisms. That is, languages themselves are shaped by severe selectional pressure from each generation of language users and learners. This suggests that apparently arbitrary aspects of linguistic structure may result from general learning and processing biases deriving from the structure of thought processes, perceptuo-motor factors, cognitive limitations, and pragmatics.

De Ruiter, J. P., & Levinson, S. C. (2008). A biological infrastructure for communication underlies the cultural evolution of languages. Behavioral and Brain Sciences commentary, 31, 518.

Fill in this form to receive a download link:

Universal Grammar (UG) is indeed evolutionarily implausible. But if languages are just “adapted” to a large primate brain, it is hard to see why other primates do not have complex languages. The answer is that humans have evolved a specialized and uniquely human cognitive architecture, whose main function is to compute mappings between arbitrary signals and communicative intentions. This underlies the development of language in the human species.

De Ruiter, J. P. (2006). Can gesticulation help aphasic people speak, or rather, communicate? Advances in Speech-Language Pathology, 8(2), 124-127.

Fill in this form to receive a download link:

As Rose (2006) discusses in the lead article, two camps can be identified in the field of gesture research: those who believe that gesticulation enhances communication by providing extra information to the listener, and on the other hand those who believe that gesticulation is not communicative, but rather that it facilitates speaker-internal word finding processes. I review a number of key studies relevant for this controversy, and conclude that the available empirical evidence is supporting the notion that gesture is a communicative device which can compensate for problems in speech by providing information in gesture. Following that, I discuss the finding by Rose and Douglas (2001) that making gestures does facilitate word production in some patients with aphasia. I argue that the gestures produced in the experiment by Rose and Douglas are not guaranteed to be of the same kind as the gestures that are produced spontaneously under naturalistic, communicative conditions, which makes it difficult to generalise from that particular study to general gesture behavior. As a final point, I encourage researchers in the area of aphasia to put more emphasis on communication in naturalistic contexts (e.g., conversation) in testing the capabilities of people with aphasia.

De Ruiter, J. P. (2000). The production of gesture and speech. In McNeill, D. (Ed.), Language and Gesture (pp. 248-311). Cambridge: Cambridge University Press.

Fill in this form to receive a download link:

Research topics in the field of speech-related gesture that have received con- siderable attention are the function of gesture, its synchronization with speech, and its semiotic properties. While the findings of these studies often have interesting implications for theories about the processing of gesture in the human brain, few studies have addressed this issue in the framework of information processing.

In this chapter, I will present a general processing architecture for gesture production. It can be used as a starting point for investigating the processes and representations involved in gesture and speech. For convenience, I will use the term ‘model’ when referring to ‘processing architecture’ throughout this chapter.

Since the use of information-processing models is not believed by every gesture researcher to be an appropriate way of investigating gesture (see, e.g., McNeill 1992), I will first argue that information-processing models are essential theoretical tools for understanding the processing involved in gesture and speech. I will then proceed to formulate a new model for the production of gesture and speech, called the Sketch Model. It is an exten- sion of Levelt’s (1989) model for speech production. The modifications and additions to Levelt’s model are discussed in detail. At the end of the section, the working of the Sketch Model is demonstrated, using a number of illus- trative gesture/speech fragments as examples.

Subsequently, I will compare the Sketch Model with both McNeill’s (1992) growth-point theory and with the information-processing model by Krauss, Chen & Gottesman (this volume). While the Sketch Model and the model by Krauss et al. are formulated within the same framework, they are based on fundamentally different assumptions. A comparison between the Sketch Model and growth-point theory is hard to make, since growth-point theory is not an information-processing theory. Nevertheless, the Sketch Model and growth-point theory share a number of fundamental assump- tions.

De Ruiter, J.P. (2017). The asymmetric redundancy of gesture and speech. In Church, R.B., Alibali, M.W., & Kelly, S.D. (eds). Why gesture? How the hands function in speaking, thinking and communicating John Benjamins Publishing Company, Amsterdam.

Fill in this form to receive a download link:

A number of studies from the last decades have demonstrated that the iconic gestures are shaped not only by our mental imagery but also, quite strong-
ly, by structural properties of the accompanying speech. These findings are problematic for the central assumption in the Sketch Model (De Ruiter, 2000) about the function of representational gesture. I suggest a seemingly small but fundamental modification to the processing assumptions in the Sketch Model that not only accommodates the discussed empirical findings, but also explains many other well-known gesture phenomena. The new model also generates new and testable predictions regarding the relationship between gesture and speech.