“The best way to predict the future is to invent it”
– Alan Kay

“Don’t listen to the physics majors until you also check with the Vaudevillians.”
– Tom Notti, The Bubble Guy

In my introduction to Hypothia I briefly referred to a paradigm shift in the web that I wish to participate in. That paradigm shift as I imagine it is the change from information retrieval (IR) to information generation as the core technology for utilizing the web.

Web 1.0 was basically the infancy of the web, the Mosaic-Netscape-Alta Vista web. Web 2.0 has ushered in a user-centric era, where the winners will be those who effectively repurpose user data assets generated from agile services.

We can safely say that the web is no longer in its infancy but rather in its adolescence. The somewhat adolesecent appeal of paradigmatic Web 2.0 applications (e.g., YouTube, MySpace) is no mere accident but rather a reflection of a youth-oriented culture of innovation and capital that is Silicon Valley mistaking youth appeal for broad appeal. Mistaking it, or perhaps pushing it and making it so.

The good news about the mistaken horizontal appeal is that there’s still quite a lot of room for the web to grow in terms of utility and users. The world is far from wired. Heck, Sequoia’s grandmom probably isn’t too keen on “the computer nonsense” yet. And why on earth would she be, other than for photos of the grandkids? OK, Google may be usable by most grandmothers, something which has been every bit as important to Google’s success as its PageRank algorithm.

The trend towards Web 3.0, the Semantic Web, has already begun to sprout. Instead of social web applications with so-called “horizontal” appeal (“so-called”, because if you are over 40 and on Friendster, you might get some strange looks from other users), we are already beginning to see niche social tools. In other words, web 2.0 tools are slowly becoming vertically oriented. Services such as LinkedIn, geared towards a semi-broad niche of white collar professionals looking to “network”, are succeeding as even more specialized and vertically-oreitned tools appear.

Such tools as LinkedIn, Gwagle and Trip usher in the Semantic Web simply because in narrowing the content scope, the content to be searched maps better to meaning. In other words, terms that users choose to search these tools become less ambiguous as the content scope shrinks. At the same time that terms map better to their intended senses, such sites make it more and more possible for ontology building and use. We can see this, for example, on Trip, an evidence-based medical search tool that provides faceted search features and leverages the admittedly rudimentary MeSH ontology. (I would add that disambiguation is best performed by the contours of context rather than by any set of rules applied to document collections. This emergent nature of disambiguation, and the concomitant necessity for ambiguity in understanding, is best saved for a later discussion.)

But it’s not just the narrowing semantic spaces that help usher in the Semantic Web. It’s also the more complex sets of user data, things like tags and search terms, applied to specific domains, that help automate user-responsive architectures expanding the possibilities for advanced analytics and responsive content.

Another indication that the Semantic Web in the sense of vertically-oriented semantic retrieval is on its way is the the work of George Miller’s research group at Princeton. Miller is renowned for a number of things, among them the creation and development of WordNet. Christiane Fellbaum, a colleague of Miller’s and long-time participant in the WordNet project, has apparently initiated work on a project called Medical WordNet (also here). Unlike WordNet, Medical WordNet will benefit from the fact that it will be applied to a much narrower semantic space. It will add specialized terms not in WordNet while limiting senses and relations between terms shared with WordNet.

Yet another indication of the rise of the Semantic Web is the finalization of the XQuery standard along with the development of XML content servers. Simply put, why invest months learning, say, data warehousing and OLAP cubes, when you can just implement advanced linguistic representations in XML and query them in an amazingly simple scripting language? Further, with XML content servers such as Mark Logic or eXist, you can query document collections and synthesize new documents, taking pieces of multiple documents and assembling them together, bound only by the limits of XPath and whatever heuristics you can add.

But OK so with Web 3.0 we will have basic semantics incorporated into our content and the ability to leverage meaning in order to find what we want to find. In the article in which he coins the phrase, “Semantic Web,” Tim Berners-Lee speculated extensively about the possibility of using meaning-annontated content to make basic deductions. But while the infusion of meaning into information retrieval is well under way, the infusion of domain rules for drawing conclusions from that which is retrieved is not nearly so immanent.

So Web 3.0 will be yet another era of information retrieval in the literal sense. Finding information will become refined to domain-specific, context-limited, user-experience-friendly, meaning-aware, document synthesis retrieval. But without the maturation of reasoning, retrieval will remain nothing more than that–retrieval. Information processing tools will remain stuck on regurgitation, however elaborate the regurgitation may be.

With all of the power of retrieval extended and leveraged, and the introduction of vertically-oriented ontology tools such as Medical WordNet, the next crush will be to develop systems that think. Well, thinking? No not really thinking. Rather, with the application of context, domain, and user-tuned semantics to search, the need for the development of domain-specific heuristics will become readily apparent. Instead of the emergence of question answering systems whereby such systems answer multiple questions, I imagine we’ll see services dedicated to solving single problems quite well.

What’s crucial in solving problems using information retrieval tools is that the output of rule-using systems is novel content. In other words, such tools are no longer merely finding existing content but rather creating new content. And the creation of useful information places a heavy burden on evaluating statements for their quality.

The long-term goal of Hypothia is to pursue the development of problem-specific information generation services with a particular eye on scientific discovery. Hypothia in short aims to become the first innovation service and help usher in Web 4.0. It’s far-off, far-fetched, far-out, and maybe a bit ridiculous as a vision, but someone’s got to create the future.

Advertisements