The corpus module

The corpus module contains classes and methods for working with the Open Mind data as a low-level corpus. It provides the basic functionality that ConceptNet is built on, while being agnostic about the representation of such a semantic network.

The most important database models described here are Language and Sentence. We also define the generally useful abstract class ScoredModel, and the TaggedSentence and DependencyParse classes that are used in some applications that work with raw Open Mind data.

Languages

class csc.corpus.models.Language(*args, **kwargs)

A database object representing a language.

Instances of Language can be used in filter expressions to select only objects that apply to a particular language. For example:

>>> en = Language.get('en')
>>> english_sentences = Sentence.objects.filter(language=en)
id
The ISO language code of this language.
name
The name of this language in English.
sentence_count
A cached count of how many sentences Open Mind has collected in this language.
static get(id)

Get a language from its ISO language code.

Some relevant language codes:

en = English
pt = Portuguese
ko = Korean
ja = Japanese
nl = Dutch
es = Spanish
fr = French
ar = Arabic
zh = Chinese
nl

A collection of natural language tools for a language.

See csc.nl for more information on using these tools.

Sentences

This database model represents all the sentences that Open Mind has collected in a variety of languages. Some of them come from the original Open Mind, which took free-text input; some come from activities on modern iterations of the Open Mind web site; and some come from related sites such as GlobalMind.

class csc.corpus.models.Sentence(*args, **kwargs)

A statement entered by a contributor, in unparsed natural language.

text
The natural-language text that a user entered into Open Mind.
language
The Language that this sentence is in.
creator
The User who entered this sentence.
created_on
The timestamp when this sentence was created.
score
The cached score of this sentence: the number of users who have voted for it versus the number who have voted against it.
activity
An object identifying how this sentence came to be.

Scored Models

class csc.corpus.models.ScoredModel

A ScoredModel is one that users can vote on through a Django-based Web site.

The score is cached in a column of the object’s database table, and updated whenever necessary.

This makes use of the django-voting library. However, if you alter votes by using the django-voting library directly, the score will not be updated correctly.

get_rating(user)
Get the Vote object representing a certain user’s vote on a certain object. Returns None if the user has not voted on that object.
set_rating(user, val, activity)
Set a user’s Vote on a certain object. If the user has previously voted on that object, it removes the old vote.
update_score()
Ensure that the score property of this object agrees with the sum of the votes it has received.

Other classes

class csc.corpus.models.DependencyParse(*args, **kwargs)
Each instance of DependencyParse is a single link in the Stanford dependency parse of a sentence.
class csc.corpus.models.TaggedSentence(*args, **kwargs)

The results of running a sentence through a tagger such as MXPOST.

We could use this as a step in parsing ConceptNet, but we currently don’t.

Table Of Contents

Previous topic

The conceptnet module

Next topic

Natural language tools

This Page