Visualizing SVD results

The export_svdview module allows SVD results to be visualized using the separate program svdview (http://launchpad.net/svdview).

Denormalization

Concepts are often stored in Divisi tensors in a normalized form, which is often not human-friendly. The denormalize callback provides a way “undo” the normalization as concepts are returned. A denormalizer for ConceptNet concepts is provided, which returns the “canonical name” of concepts.

File formats

Binary format

The binary format is newer and faster. It consists of a header and a body (everything is stored in big-endian (network) byte order):

Header:
  • 4 bytes: number of dimensions (integer)
  • 4 bytes: number of items (integer)

The body is a sequence of items with no separator. Each item has a coordinate for each dimension. Each coordinate is an IEEE float (32-bit) in big-endian order.

TSV format

The old TSV format is easier to edit by hand or with simple scripts. Each line is a sequence of fields separated by tabs. The first field on each line is the concept name. It is followed by a floating point number for each dimension.

csc.divisi.export_svdview.denormalize(concept_text)
Returns the canonical denormalized (user-visible) form of a concept, given its normalized text of a concept.
csc.divisi.export_svdview.write_annotated(matrix, out_basename, denormalize=None, cutoff=40, filter=None, annotations=None, links=None)
Export in the new binary coordinate file format.
csc.divisi.export_svdview.write_packed(matrix, out_basename, denormalize=None, cutoff=40)
Export in the new binary coordinate file format.
csc.divisi.export_svdview.write_tsv(matrix, outfn, denormalize=None, cutoff=40)
Export a tab-separated value file that can be visualized with svdview. The data is saved to the file named _outfn_.

Table Of Contents

Previous topic

The svd module

Next topic

CSC Development

This Page