July 2007


Citation:

Cereb Cortex. 2005 Aug;15(8):1261-9. Epub 2005 Jan 5. 

The neural mechanisms of speech comprehension: fMRI studies of semantic ambiguity.

Rodd JM, Davis MH, Johnsrude IS.

Department of Psychology, University College London, UK. j.rodd@ucl.ac.uk

A number of regions of the temporal and frontal lobes are known to be important for spoken language comprehension, yet we do not have a clear understanding of their functional role(s). In particular, there is considerable disagreement about which brain regions are involved in the semantic aspects of comprehension. Two functional magnetic resonance studies use the phenomenon of semantic ambiguity to identify regions within the fronto-temporal language network that subserve the semantic aspects of spoken language comprehension. Volunteers heard sentences containing ambiguous words (e.g. ‘the shell was fired towards the tank’) and well-matched low-ambiguity sentences (e.g. ‘her secrets were written in her diary’). Although these sentences have similar acoustic, phonological, syntactic and prosodic properties (and were rated as being equally natural), the high-ambiguity sentences require additional processing by those brain regions involved in activating and selecting contextually appropriate word meanings. The ambiguity in these sentences goes largely unnoticed, and yet high-ambiguity sentences produced increased signal in left posterior inferior temporal cortex and inferior frontal gyri bilaterally. Given the ubiquity of semantic ambiguity, we conclude that these brain regions form an important part of the network that is involved in computing the meaning of spoken sentences. (My emphasis.)

 

Here we may have a possible biological locus for exactly the sort of phenomenon I was positing in my previous post. Interestingly enough, ambiguity seems to a core process, and again we have evidence that language users are able to actively engage with ambiguous language and that an important step in cognition is pre-disambiguated. Importantly, it is in all likelihood that linguistic comprehension engages in parallel visualization of multiple possibilities. This is probably responsible for so much of what makes poetry interesting and road signs uninteresting.

The inferior temporal cortex is a higher-level part of the ventral stream of the visual processing system of the human brain. The ventral stream engages in classification and identification of phenomena. The adjacent inferior frontal gyrus coontains Broadmans Areas 44 and 45, which contain a number of non-visual areas heavily engaged in linguistic understanding. Broca’s Area is contained in Broadmans Area 44. Broca’s area is connected to Wernicke’s area via the arculate fasciculus.

One way to disprove my present theory is to see the neural precursors to these differentiated brain areas in fetal development. Do human brains develop the visual system first? Do these linguistic areas develop out of the visual tissues? Or do they come out of a wholly different set of neural tissues? Anyone know a neuroembryologist?

Advertisement

When reading some Steven Pinker a couple of years back I wondered whether language could be better understood via sound, sentence, and vision rather than by words and rules as Pinker suggests (see his Words and Rules). Rules seem to be elements of narration we use or rather abuse to divine a neat model of causality. However there seems to be very little in biology that’s rather rule-like. Biology is inherently anti-functional, at least in the strict mathematical sense of the word function. Cells and subcellular systems can and do appear to regularly do different things given the same input. And that’s assuming we can even truly tightly control an input to a biological system in any meaningful (re: in vivo) way. Weak and strong AI proponents would have us think that neurons are analogues for computer circuits, but the complexity of neural matter is hardly reducible to such a model without sacrificing crucial information.

Rules just don’t seem inherent to language. Words, however, do seem on some level fundamental to language. From a textual perspective certainly. We can see evidence for this in many ways; in my experience the evidence is in building representations of document collections for various text mining experiments. But from an oral perspective, are words fundamental?

Spoken language seems far more continuous that written language not only from a processing standpoint but also from a sensory point of view. Spoken language is experienced and performed in a rather continuous way; words are deduced in learning language, but it remains to be shown whether words are in and of themselves mere narrative convenience for explaining how we understand language rather than language itself. these sounds continue rather fluidly within sentences. The auditory experience of language is that the most coarse break, the most distinct break, is the break between sentences. But spoken language is not just continuous in the way it is serially composed and experienced in an auditory fashion. It is also continuous in that it speads across the sensory spectrum, from sound to vision. Inflection and gesture are essential to processing meaning, and such experience and interpretation is so incredibly integrated and automatic it operates as intuition does.

While the fundamental descriptive unit of language seems to be the word, with the description generating itself through the appearances of language acquisition, the fundamental unit of language seems to be the sequence of sounds, the sentence. The word “book” or for that matter the sound of the word has some basic meaning but no real rich semantics. What book? What’s it doing? Where is it? What’s in it? How thick is it? Do you even mean a thing with pages? Frankly we have no idea what questions even make sense to ask in the first place. The word and the sound alike seem devoid of context, seem completely empty of a single thought. But once we launch into a sentence, the book comes to life, to at least a bare minimum of utility, representation with correspondence to some reality. It seems the sentence is the first level at which language has information.

But it seems that the sentence, the meaning-melody of distinct thought, is composed more essentially with some visual representational content, something rudimentary that is pre-experiential (children blind from birth seem to have no profound barriers to becoming healthy and fully literate adult language users). There seems to be something visual that is degenerative in nature involved in language. Not generative. It seems that language comprehension is based on breaking down the continuous auditory signal into something very roughly visual and then the utterance becomes informative.

My take on such a process is really not so unusual but rather fundamental to one of the most important linguistic discoveries of the modern era. Wernicke believed that the input to both language comprehension and language production systems was the “auditory word image.”

So here’s what I’m thinking. Language’s syntax is not fundamentally linguistic per se nor compositional but rather sensory (audio-visual) and decompositional. So I wonder, is there some sort of syntax for vision, some decompositional apparatus? Or are we just getting back into rule-sets?

I think we can understand something fundamental in this syntax between the sensory and the linguistic. Linguistic decompositon, which is really either auditory or visual decompositon, becomes visual composition in understanding. Likewise, the visual must be decomposed before it can be composed into a sentence.

In other words, if we knew rules for visual decompositon we could automatically compose descriptions of scenes. Likewise we should be able to compose images from decomposition of linguistic signals.

And how do we do that without rules or functions?


But language is not pure sign, it is also a thing. This exteriority -word as object rather than sense- is an irreducible element within the signifying scene. Language is tied to voice, to typeface, to bitmaps on a screen, to materiality. But graphic traces, visualizations are irreducible to words. Their interpretation is never fully controllable by the writing scientist.

– Timothy Lenoir and Hans Ulrich Gumbrecht,
from the introduction to the Writing Science series

I am hung up on a concern about the application of text mining to scientific discovery from which I seem unable to shake free. That simple hang-up is due to the importance of visual analogy to scientific discovery and the rather trivial or secondary narration that follows it. That narrative content (see narrative fallacy – explaining an event post hoc so that it will seem to have a cause) is the very material that text mining seeks to leverage. Language is supposed to capture in some way the network of causes, many of them supposedly sufficient to help presage novel treatments, procedures, further explanations, and so on. But if the generative seed of discovery is visual analogy itself, no amount of linguistic-based reasoning, whether contextual, deductive, or inductive, can ever make new discoveries. Because the explanation is not equivalent to the image.

And yet. And yet we know that we can make discoveries by deductions from multiple texts, as Don Swanson has repeatedly shown us. But Swanson’s discoveries using disjoint literatures are marginal and hypothetical and remain in desperate need of empirical review. Disjoint literatures don’t appear to be radically increasing the speed at which scientific discovery is made, which means that the process of leveraging implicit multi-document logics is missing something essential.

I’ll venture a guess and say that pictures are missing.

If a picture is worth a thousand words, is the relation symmetric? That is to ask, given a thousand words, can we draw a picture? Could we, say, use hypothesis generation to augment the creation of visual metaphor apparently crucial to scientific discovery? Alternately, it seems that a picture is not inherently worth any word whatsoever, and that inequivalence is symmetric.

Most pictures generated these days via automated means are entirely dimensionless, metaphorically speaking. Graphs, trees, constellations of points in a space. But what makes our understanding of constellations rich? Ahh yes, those stars in our southern summer sky appear to look like a scorpion, become known as Scorpius, and that’s how we remember those specks, and that’s how we use them as well. Memory, after all, is inseparable from use. And yet those stars are no more a scorpion than a snake or a lock of hair or whatever else you can make up.

So it’s not enough perhaps to plot networks on a 2D screen. Why not compare those assemblages of seemingly random points to visual shapes? Why not revtrieve the visual metaphor for an item automatically?

This however is utterly unconvincing. There’s no way, for example, special relativity could be arrived at in such a way. And yet, hold on just a sec, elements of the discovery of special relativity are in part a result of a visual search activity–Einstein imagining many rich ways of illustrating previous mathematical expressions and testing the illustrations to measure their utility, their usability, their ability to survive multiple looks and provide a rich metaphor capturing the scientific phenomenon. And then using those images to tell further stories, and then usiong those stories to generate more mathematical expressions. A picture is worth a thousand words and a thousand words is worth many pictures.