AnalogySpace

  • warning: Parameter 1 to profile_load_profile() expected to be a reference, value given in /usr/share/drupal6/includes/module.inc on line 476.
  • warning: Parameter 1 to profile_load_profile() expected to be a reference, value given in /usr/share/drupal6/includes/module.inc on line 476.

AnalogySpace is a way of representing a knowledge base of common sense in a multidimensional vector space. It uses dimensionality reduction to automatically discover large-scale patterns in the data collected by the common-sense knowledge resource ConceptNet. These patterns, called "eigenconcepts" or "axes", help to classify the knowledge and predict new knowledge by filling in the gaps.

AnalogySpace can be used to infer new common-sense knowledge, organize ideas into ad-hoc categories, detect topics in text, correlate knowledge between different languages or data sources, and compare concepts on arbitrary scales that can be generated on the fly.

The current implementation is built on Divisi, a general-purpose tool for reasoning over semantic networks. We now also have a GUI visualizer for AnalogySpace called Luminoso, which allows you to experiment with AnalogySpace and a set of input documents interactively, instead of having to write code using Divisi.

Eigenconcepts

eigenconcept 0: desirability

On the left are concepts considered "undesirable," while the concepts on the right are considered "desirable."

Eigenconcepts are the axes that define AnalogySpace. In ConceptNet, a concept is described by the common sense features it has, such as "it is a kind of animal" or "people want it". In AnalogySpace, these features are summarized by a smaller number of eigenconcepts. The degree of correlation with each of these eigenconcepts defines a concept's coordinates in AnalogySpace.

One of the most prominent eigenconcepts in AnalogySpace distinguishes "desirable" concepts from "undesirable" ones. The graph above plots this eigenconcept, with undesirable concepts such as "slavery" on the left, and desirable concepts such as "make friends" on the right. Each red + represents a concept on this scale.

The blue crosses clustered around the center, meanwhile, represent features, which can be represented in the same space as the concepts. Features to the right of the origin, for example, are those that contribute to desirability.

Ad hoc categories

It is possible to build a computationally-efficient classification system using AnalogySpace and a small handful of examples. Treating training samples as vectors in AnalogySpace, we can immediately identify concepts and properties that fall in the shared proximity of those vectors.

Robert SpeerCatherine HavasiJason AlonsoKenneth ArnoldHenry Lieberman

MIT Media LabBrandeis Lab for Linguistics and ComputationMIT CSAIL