I am hung up on a concern about the application of text mining to scientific discovery from which I seem unable to shake free. That simple hang-up is due to the importance of visual analogy to scientific discovery and the rather trivial or secondary narration that follows it. That narrative content (see narrative fallacy – explaining an event post hoc so that it will seem to have a cause) is the very material that text mining seeks to leverage. Language is supposed to capture in some way the network of causes, many of them supposedly sufficient to help presage novel treatments, procedures, further explanations, and so on. But if the generative seed of discovery is visual analogy itself, no amount of linguistic-based reasoning, whether contextual, deductive, or inductive, can ever make new discoveries. Because the explanation is not equivalent to the image.

And yet. And yet we know that we can make discoveries by deductions from multiple texts, as Don Swanson has repeatedly shown us. But Swanson’s discoveries using disjoint literatures are marginal and hypothetical and remain in desperate need of empirical review. Disjoint literatures don’t appear to be radically increasing the speed at which scientific discovery is made, which means that the process of leveraging implicit multi-document logics is missing something essential.

I’ll venture a guess and say that pictures are missing.

If a picture is worth a thousand words, is the relation symmetric? That is to ask, given a thousand words, can we draw a picture? Could we, say, use hypothesis generation to augment the creation of visual metaphor apparently crucial to scientific discovery? Alternately, it seems that a picture is not inherently worth any word whatsoever, and that inequivalence is symmetric.

Most pictures generated these days via automated means are entirely dimensionless, metaphorically speaking. Graphs, trees, constellations of points in a space. But what makes our understanding of constellations rich? Ahh yes, those stars in our southern summer sky appear to look like a scorpion, become known as Scorpius, and that’s how we remember those specks, and that’s how we use them as well. Memory, after all, is inseparable from use. And yet those stars are no more a scorpion than a snake or a lock of hair or whatever else you can make up.

So it’s not enough perhaps to plot networks on a 2D screen. Why not compare those assemblages of seemingly random points to visual shapes? Why not revtrieve the visual metaphor for an item automatically?

This however is utterly unconvincing. There’s no way, for example, special relativity could be arrived at in such a way. And yet, hold on just a sec, elements of the discovery of special relativity are in part a result of a visual search activity–Einstein imagining many rich ways of illustrating previous mathematical expressions and testing the illustrations to measure their utility, their usability, their ability to survive multiple looks and provide a rich metaphor capturing the scientific phenomenon. And then using those images to tell further stories, and then usiong those stories to generate more mathematical expressions. A picture is worth a thousand words and a thousand words is worth many pictures.