The corpus module contains classes and methods for working with the Open Mind data as a low-level corpus. It provides the basic functionality that ConceptNet is built on, while being agnostic about the representation of such a semantic network.
The most important database models described here are Language and Sentence. We also define the generally useful abstract class ScoredModel, and the TaggedSentence and DependencyParse classes that are used in some applications that work with raw Open Mind data.
A database object representing a language.
Instances of Language can be used in filter expressions to select only objects that apply to a particular language. For example:
>>> en = Language.get('en')
>>> english_sentences = Sentence.objects.filter(language=en)
Get a language from its ISO language code.
Some relevant language codes:
en = English
pt = Portuguese
ko = Korean
ja = Japanese
nl = Dutch
es = Spanish
fr = French
ar = Arabic
zh = Chinese
This database model represents all the sentences that Open Mind has collected in a variety of languages. Some of them come from the original Open Mind, which took free-text input; some come from activities on modern iterations of the Open Mind web site; and some come from related sites such as GlobalMind.
A statement entered by a contributor, in unparsed natural language.
A ScoredModel is one that users can vote on through a Django-based Web site.
The score is cached in a column of the object’s database table, and updated whenever necessary.
This makes use of the django-voting library. However, if you alter votes by using the django-voting library directly, the score will not be updated correctly.
The results of running a sentence through a tagger such as MXPOST.
We could use this as a step in parsing ConceptNet, but we currently don’t.