This started out as a reddit reply to
Alex Iskold's article about semantic search engine technology. Alex suggests that the reason why we haven't seen the semantic web as overtaking as the search technology of today is because of two points: that they won't provide better search results today and that the current semantic search engines have UI problems. At the same time, he claims that (wrt. to the search results) "emantic search is going to be big and it is going to help us answer questions that we simply cannot answer today - complex, inferencing queries asked over the entire web as if it was a database." I think that the UI point is irrelevant, or perhaps only relevant at this point in time.
I agree with the search result point, but I'm much more sceptical about the future results to expect.
Here's the simple reason: I'm sure that it will never happen that people will annotate enough of their webpages -- that's probably a task that at best will be done by specialists in search-engine optimization, if the availability of semantic meta data helps getting better search rank someday.
I see one chance for semantic annotation in applications, that may be utilized by whoever wins the semantic search engine war (or make that "survives", instead of "wins"): what Freebase does nowadays ultimately is a task of web-application providers. Not only I mean the big players like say, amazon, wikipedia and ebay, but more the content providers like newspapers, publishers and the like. But the major problem is that although one might suspect otherwise, publishers today still don't use structured data format as much as we would imagine or as would be the presupposition to start adding semantic annotation. Sure, there are exceptions to the rule, and perhaps these publishers will last longer than the others. Introducing structured information in content publishing is hard, especially if you're making successful business without it (fiction publishers, for instance) and there seems to be no easy way to figure out any way how you'll receive a return of the investment. So, we have a chicken and egg-situation : if we would have a semantic web today, it would be easy to benefit from providing semantic information for content providers. Without it, it's hard for semantic search engines to show any benefit and for content provider to even start providing such information. The problem is even greater as of today their is no standard solution to the problem how to structure your data. Sure, there's RDF and OWL, but there are also a dozen of other techniques. And even worse, RDF and OWL are just skeletons, just like SGML or XML -- we may have a common alphabet, but everybody uses it to make up their own vocabulary or language.
Hence I believe that in order to make semantic search you need to solve a different task than annotating data, a task mainly for the search engine makers: they need to figure out a way how to build their structured or semantic data from natural language and other data in their input set, i.e. the raw web pages. This is where the real problem lies, not in query analysis. And while Alexs sweeps over NLP as if it was a problem already solved, it's surely not: it's unbelievably hard, especially if what you ultimately aim for is understanding, not only, say, syntactic processing. Have a look at conference proceedings, say, from the artificial intelligence or computatinal linguistics camp: progress is very, very slow and always focussed on some specialized aspect of "understanding".
Also in contrast to what Alex seems to suggest, I'm not convinced that google is loosing the semantic search engine war when it will begin -- they have a great bunch of NLP people working for them. Sure, from what we hear, it's all statistics, but that doesn't mean you can't make semantic analysis with it (see "latent semantic analysis", for instance). And then I'm sure their NLP people know about approaches to combine GOFAI and statistics. So, dismissing Google as ignoring the semantic web is probably a misconception.
So, yes, the task that Haika or Powerset are undertaking is indeed a huge and hard one. But if you solve the problem of semantic analysis of billions of web pages, adding semantic query analysis is a piece of cake.