Yesterday, I attended a research seminar at the « Université de Paris 8 ». Pierre Levy is a philosopher and professor and head of the collective intelligence chair at the University of Ottawa, Canada. He presented the latest developments in his work on IEML, which stands for Information Economy Meta Language. Things are taking shape on this side and this presentation gave me the opportunity to better understand how IEML compares to the technologies of the Semantic Web (SW).
IEML: not another layer on top of the SW cake
IEML is proposed as an alternative to SW ontologies. In SW, the basic technology is URI (Uniform Resource Identifier) which uniquely (and hopefully permanently) identify concepts (« resources »). Triples then combine these URIs into assertions which then form a graph of meaning that is called an ontology. IEML introduces identifiers which are not URIs. The main difference between URIs and IEML identifiers is that IEML identifiers are semantically rich. They carry meaning. They are meaningful. From a given IEML identifier, one could derive some (or ideally all?) of the semantics of the concept it identifies. Indeed these identifiers are composed of 6 semantic primitives. These 6 primitives are Emptiness, Virtual, Actual, Sign, Being, Thing (E,V,A,S,V and T) and were chosen to be as universal as possible, i.e. not dependent on any specific culture or natural language. The IEML grammar is a way to combine these primitives and logically build concepts with them (also using the notion of triples-based graphs). These primitives are comparable to the 4 bases of DNA (A,C,T and G) that are combined into a complex polymer (DNA) : with a limited alphabet, IEML can express an astronomically huge number of concepts in the same way the 4 letters-alphabet of DNA can express a huge number of phenotypes.
Meaningness of identifiers
When I realized that the meaningful IEML identifiers are similar in their role to URIs, my first reaction was of being horrified. I have struggled for years against « old-school » IT workers who tend to rely on database keys for deriving properties of records. In a former life in the IT department of big industrial corporation, I was highly paid to design and impose a meaningless unique person identifier in order to uniquely and permanently identify the 200 000 employees and contractors of that multinational company in its corporate directory. The main superiority in meaningless identifiers is probably that they can be permanent: you don’t have to change the identifier of an object (of a person for instance) when some property of this object changes over time (the color of the hair of the person, or Miss Dupont getting married and getting called Misses Durand while still keeping the same corporate identifier).
The same is true for URIs whenever it is feasible: if a given resource is to change over time, its URI should not be dependent on its variable property (http://someone.com/blond/big/MissDurand having to change into http://someone.com/white/big/MissesDupont is a bad thing).
The same may not be true when concepts (not people) are to be identified. Concepts are supposed to be permanent and abstract things with IEML (as in the SW I guess). If some meaningful semantic component of a given concept changes then… it’s no longer the same concept (even though we may keep using the same word in a natural language in order to identify this derived concept).
In the old days, IT workers used to introduce meaning in identifiers so that (database) records could more easily be managed by humans, especially during tasks like visually classifying or sorting records in a table or getting an immediate overview of what a given record is about. But this often got seen as a bad practice when the cost of storage (having specific fields for properties that used to be stored as part of a DB key) and the cost of computation (getting a GUI for querying/filtering a DB based on properties) got lower. More often that not, the meaningful key was not permanent and this introduced perverse effects including having to assign a new key to a given record when some property changed or managing human errors when the properties « as seen in the key » were no longer in sync with the « real » properties of the record according to some field.
That’s probably part of the rationale behind the best practices in URI design and web architecture: an URI should be as permanent as possible I guess, in order not to change when the properties of a resource it identifies change over time. Thus web architectures are made more robust to time.
With IEML, we are back to the ol’times of meaningful identifiers. Is it such a bad thing ? Probably not because the power of IEML relies in the meaningness of these identifiers which allow all sorts of computational operations on the concepts. Anyway, that’s probably one of the biggest basic difference between IEML and the SW ontologies.
Matching concepts with IEML
Another aspect of IEML struck me yesterday: IEML gives no magic solution to the problem of mapping (or matching) concepts together. In the SW universe, there is this recurring issue of getting two experts or ontologies agree on the equivalence of 2 resources/concepts: are they really the same concept expressed with distinct but equivalent URIs ? or are they distinct concepts ? How to solve semantic ambiguities ? Unless we get a solution to this issue, the grand graph of semantic data can’t be universally unified and people get isolated in semantic islands which are nothing more than badly interconnected domain ontologies. This is called the problem of semantic integration, ontology mapping, ontology matching or ontology alignment.
A couple of years ago, I hoped that IEML would solve this issue. IEML being such a regular and to-be-universal language, one could project any concept onto the IEML semantic space and obtain the coordinates (identifier) of this concept in this space. A second person or expert or ontology could also project its own concepts. Then it would just be a matter of calculating the distance between these points in the IEML space. (IEML provides ways of calculating such distances). And if the distance was inferior to some threshold, 2 concepts could then be considered as equivalent for a given pragmatic purpose.
But yesterday, I realized that the art of projecting concepts into the IEML space (i.e. assigning an identifier to a concept) is very subjective. Even though a Pierre Levy could propose a 3000-concepts dictionary that assigns IEML coordinates (identifiers) to concepts that are also identified by a short natural language sentence (like in a classic dictionary), this would not prevent a Tim Berners-Lee to come with a very different dictionary that assigns different coordinates to the same described concepts. Thus the distance between a Pierre-Levy-based IEML word and a TBL-based IEML word would be … meaningless.
In the SW, there is a basic assumption that anyone may come with a different URI for the same concepts and the URIs have to be associated via a « same as » property so that they are said to refer to the very same concept. When you get to bunches of URIs (2 ontologies for instance), you then have to match these URIs which refer to the same concepts. You have to align these ontologies. This can be a very tedious, manual and tricky process. The SW does not unify concepts. It only provides a syntax to represent and handle them. Humans still have to interprete them and match them together when they want to communicate with each other and agree on the meaning that these ontologies carry.
The same is more or less true with IEML. With IEML, identifiers are not arbitrarily defined (meaningful identifiers) whereas SW URIs are almost arbitrarily defined (meaningless identifiers). But the meaningful IEML identifiers only carry human meaning if they refer to the same (or similar) human/IEML dictionary.
Hence it seems to me that IEML is only valuable if some consensus exists about how to translate human concepts into the IEML space. It is only valuable to the extent that there is some universally accepted IEML dictionary. At least for basic concepts (primitives and simple combinations of IEML primitives). The same is true in the universe of SW technologies and there are some attemps at building « top ontologies » that are proposed as shared referentials for ontology builders to align their own ontologies with. But the alignment process, even if theoretically made easier with the existence of these top ontologies is still tricky, tedious and costly. And the critical mass has not been reached in sharing the use of such top ontologies. There is no top consensus to refer to.
Pierre Levy proposes a dictionary of about 3000 IEML words (identifiers) that represent almost all possible low-level combinations of IEML primitives. He invites people to enhance or extend his dictionary, or to come with their own dictionaries. Let’s assume that only minor changes are made to the basic Pierre Levy dictionary. Let’s assume that several conflicting dictionary extensions are made for more precise concepts (higher-level combinations of IEML primitives) . Given the fact that these conflicting extensions still share a basic foundation (the basic Pierre Levy dictionary), would the process of comparing and possibly matching IEML-expressed concepts be made easier ? Even though IEML does not give any automagical solution to the problem of ontology mapping, I wonder whether it makes things easier or not.
In other words, is IEML a superior alternative to SW ontologies ?
Apples and bananas
Yesterday, someone asked: « If someone assigns IEML coordinates to the concept of bananas, how will these coordinates compare to the concept of apples ? » The answer did not satisfy me because it was along the lines of : « IEML may not be the right tool for comparing bananas to apples. ». I don’t see why it would be more suitable for comparing competencies to achievements than for comparing bananas to apples. Or I misunderstood the answer. Anyway…
Pierre Levy made much effort in describing the properties of his abstract IEML space so that IT programmers could start programming libraries for handling and processing IEML coordinates and operations. There even is a programming language being developped that allows semantic functions and operations to be applied to IEML graphs and to allow quantities (economic values, energy potentials, distances) to flow along IEML-based semantic graphs. Hence the name of Information Economy.
So there are (or will soon be) tools and services for surviving in the IEML space. But I strongly feel that there is a lack of tools for moving back and forth between the world of humans and the IEML space. How would you say « bananas » in IEML ? Assuming this concept is not already in a consensual dictionary.
As far as I understand the process of assigning IEML coordinates to the concept of « bananas » is somehow similar to the process of guessing the « right » (or best?) chinese ideogram for bananas. I don’t speak chinese at all. But I imagine one would have to combine existing ideograms that would best describe what a banana is. For instance, « bananas » could be written with a combination of the ideograms that mean « fruits of herbaceous plant cultivated throughout the tropics and grow in hanging clusters« . It could also be written with a combination of the ideograms that mean « fruits of the plants of the genus Musa that are native to the tropical region of Southeast Asia and Australia. » Distinct definitions of bananas could refer to distinct combinations of existing IEML concepts (fruits + herbaceous plant + hanging clusters + tropics or fruits + plants + genus Musa + Southeast Asia + Australia). Would the resulting IEML coordinates be far away from each other ? Could a machine infer that these concepts are closely related if not practically equivalent to each other ? How dependent would the resulting distance be on conflicts or errors in underlying IEML dictionaries ?
I ended the day with this question in my mind: How robust is the IEML translation process to human conflicts, disagreements and errors ? Is it more robust than the process of building and aligning SW ontologies ? Its robustness seems to me as the main determinent factor of the feasibility of the new collective-intelligence-based civilization Pierre Levy promises. If only there were a paper comparing this process to what the SW already provides, I guess people would realize the value of IEML.
Let’s play with IEML, bananas and apples. Using the IEML dictionary, I found some interesting IEML words and their english equivalent:
– fruit is written « fd » ; it is a « relation » (i.e. it is a « level four » concept i.e. it is a triple of triples of triples of IEML primitives) ; the source of this triple is « life » and its destination is « truth », the predicate of the triple is empty (or is it emptiness?)
– « apple » is not in the dictionary, but I guess it would be related to both « fruit » and « tree » (possibly also to yellow? or red ? or err… don’t know)
– since bananas don’t grow on trees but on herbaceous plants, I guess the IEML word for bananas would not refer to « tree » (« df » in IEML) but to « plant » (« sf » in IEML) ; as combinations of the relations « fruit » and « plant » or « tree », apples and bananas must be a level five (or upper) concept, namely a cycle or an idea… bananas are probably not paradigms (level seven concepts) !
There should be a wiki-based IEML dictionary, speaking IEML is fun !
Edit: Oops, there already is one IEML wiki for consensus to be built around IEML translations and in order to feed the ieml.org dictionary.