The problems of the semantic Web

Posted on: Jun, 9th 2013

Tim Berners Lee, inventor of the Web, from the beginning wanted that his invention was semantic forming a global data base of the human knowledge, in which the information could be consulted by all the world and understood by the machines, to generate new knowledge, answer questions and create artificial intelligences that could surpass the Turing Test. This idea begins to become true, but first have to overcome several difficulties that I'm going to expose next.

The basic idea of the Semantic Web, Web 3.0 or Linked Data, is that if the original Web is formed by links between documents, the Semantic Web is formed between links between data (called RDF links). But let's see why the semantic Web still isn't going to allow bring to life to Skynet:

The idea has not been sold well: the vision that divulges Tim Berners Lee of the Semantic Web is purely altruist, is something that is not bad, but the fact of be part of a global Web of data isn't a motivation for the Web's proprietary to use the semantic technologies. They are more interested in know how the semantic technologies can bring traffic to his Web and in what can contribute to achieve that the users arrive to accomplish the objectives of monetization. Is not just about of get benefits to amortize the developer expenses necessary for implement semantic technologies, but that they can also get value and extra earnings to his business.

Complexity and heterogeneity of the technologies: now, there are many different and complex semantic technologies, that a lot of them will disappear and will survive the ones more easy to use or that have tools to use it easily and that they integrate with the current Web. Some of these technologies, only taking into account languages, are: RDFS, RDF, RDFa, N3, Turtle, OWL, SKOS, Microdata, Microformats, Hypernotation, JSON-LD, RDF/JSON y GRDDL.

High computational cost for queries: to get information of the global graph formed by the Web of data, we can navigate by the links one by one or use the query language SPARQL. The problem is that these queries can require jump from one server to another doing the most costly operation in the relational data bases, the join. For instance, if we have in a document that the concept A is related with the concept B, normally we will need to check which other concepts are related with the concept B to solve the query. If these concept B is defined in other document in another server, we will have to analyze it to find where is the concept B. If we extract data from hundreds of nodes in this way, perhaps we will get the answer to the query several hours later, due to that actually the data base technologies of graph try to store in memory the bigger graph that they can get from the sources of data more relevant. For try to solve the problem, the computer scientists are trying to apply the technologies Big Data as Hadoop, because those are designed for work with big quantities of data and resolve queries in parallel with the architecture Map-Reduce. However, that approximation has to evolve to be a real solution, so the search engines they can't use yet the part of the Web that is semantic as a global data base.

SPAM: the SPAM is going to be one of the bigger problems of the semantic Web, because is easy to introduce RDF links that establish false relations for, by example, bring a user to buy something. One proposed solution for this problem, is indicate the provenance of the link, because the technology allows do it and in this way we can know if comes from a reliable source. The problem is what to do when this information is false or doesn't appear.

Reliability and quality of the data: In the RDF links we can define that a concept is equal to another defined in another site. This is fine in the case that appears the same concept described from different points of view, the problem is that the link can be incorrect. For instance, the link can define something that is called equal but have another meaning or even could be that don't have nothing to do with that link, so in the case of that an artificial intelligence analyze it, will have to use algorithms of word sense disambiguation to discard the incorrect information. Let's see another example, if we have to define the concept "Hydrogen" that have a link that says that its chemical symbol is the "H", and we find another link that says that is "Hy", ¿what value would take a machine? If the vocabulary that use the link is well defined, the machine will know that the Hydrogen only can have one symbol, but even so will have to apply some algorithm that help it to decide what is the correct data. Probably, the best strategy will be look up in more sources to analyze which is the correct information, although this strategy also can fail.

In a branch of the artificial intelligence called natural processing language (NPL), have been used, from the appearance of this discipline, models of representation of the knowledge. The semantic Web is another model more of knowledge representation that the only difference that it has, are huge amounts of not reliable information, that nowadays needs a lot of time to be consulted. So, it seems unlikely that the semantic Web will become the definitive knowledge representation model, neither that will resolve all the problems of the algorithms of natural processing language. I don't want to say that it is useless, simply that there are to take into account the ambiguous nature of a global model knowledge representation, where anyone can collaborate and give their own opinions, that could be wrong or not. So, to get new knowledge, from the Web of data, will be preferable use fuzzy logic algorithms. This knowledge, will be true with certain probability, and we can't use first order logic neither other techniques of reasoning that establish that the inferred knowledge be true at 100%. No, at least, over all the Web, but over the parts of this containing reliable sources, as already we can do with several development frameworks.

Comments (0): Comment

Categories: AI, Critic

History and future of the Web

The Turing test and the Chinese room

When the machines talked to each other

Copy and paste in your page:

How about you!? Don't give your opinion?

Send this post to a friend

< Previous post

Computable Minds | Return

En español: Los problemas de la Web semántica