What I want is to build an ontology for use with common-sense inference. An ontology induces a distance measure amongst words (or concepts), which can be used to distribute the concepts inside a high-dimensional sphere, with the concept of "everything" at the center. This will be useful for my "matrix trick" (which I have explained in some slides, but I'll write about it later in this blog).
So, what I had in my mind was an ontology that looked like this (I just made this up):
But I quickly found out that the reality of ontologies is very different from what I expected!
The first problem is that WordNet has too many words, but I want to load an upper ontology into main memory and still be able to do inference search. So my idea is to mine the ontology on demand. For example, if I need an upper ontology of 1000 basic English words, then I can just construct the ontology by looking up the relevant relations in WordNet.
The idea seemed nice, but I found out that WordNet has too many hypernyms (= super-classes) for nouns and almost no hypernyms for adverbs. For example, look at this visualization of the hypernyms of "cat" and "dog":
We have some surprises such as "a dog is a kind of unpleasant woman".
Also, some meanings are not what we expect, for example, for "cat" to mean a whip, or for "dog" to mean a mechanical device. Because there is no fuzziness or probabilities, it is hard to distinguish between common and uncommon usage.
The red lines are links that connect "cat" to "dog". There is the obvious connection via "carnivore" but there are also 2 other connections based on slang usage.
This is just the visualization of 2 iterations of hypernyms. If we continue like this, the number of hypernyms will grow very large before they start to converge (because everything will ultimately end up as "entity").
This is a visualization of 1 step of hypernyms (blue) for 100 basic words (red):
Notice that at the bottom there are some loners left out of the game because they have no hypernyms (mostly adverbs). Nevertheless, they are important words and should be included in a good ontology of concepts.
This is the visualization of 2 steps of hypernyms (blue) for 100 basic words (red):
The iteration will terminate ~10 steps for most branches, leaving us with 1000's of nodes and 100K's of linkages. It is a non-trivial algorithmic problem to prune the graph. Also disappointing is the fact that this graph will have many more intermediate nodes than the basic English words we started with -- something that I didn't expect.
At this point I gave up and started to try another method: WordNet provides some similarity measures (such as "path_similarity") between words. Maybe I can use these distances to cluster the words flatly?
This is a visualization of the similarities between 100 words:
As you can see, again, some adverbs or special words are left out at the bottom.
Also, there is no strong link between "left" and "right" (the strength is only 0.333, relatively low). This is where I think WordNet's way of defining similarity in [0,1] is wrong. Similarity should be measured in [-1,1] so that opposite concepts can be measured properly.
My conclusion so far: "In artificial intelligence, avoid hand-crafting low-level data; It's usually better to machine-learn!"
Looking forward: we may have to set up Genifer like a baby to learn a small set of vocabularies. Or, we can use a flawed ontology but somehow supplement it to correct the deficiencies... (Perhaps WordNet wasn't designed for using adverbs in a way similar to nouns.)
Subscribe to:
Post Comments (Atom)
HI, so this is how you understand the human sentence? Do you have the code about it?
ReplyDeleteBTW, how is the input and output? Look like it just tokenize the sentence, but how about how to apply the result? Say, if I need to build something like "Siri", what is the next step?
How did you generate such graphics
ReplyDelete