An Incomplete Data Cube - Java Implementation
A Data Cube Tool for Missing Data

Cube Home
Overview
Publications
Code

Curtis Dyreson
	Home
	Publications
	Projects
	Software
	Demos
	Teaching
	Contact me

Versions

0.90 beta - Dec. 9, 1997

API documentation - javadoc generated
cube.tar.gz - tarred, gzipped JDK 1.02 classes and source (250KB)

How to use the cube

There are three phases to using an incomplete data cube. The code that you download is already configured to build an example cube (the README file in the release package will provide more details).

The dimensions of the cube must be constructed. A dimension consists of units and measures. The data cube implementor (DCI) writes one text file to describe each dimension. These specifications are then parsed and (internally) stored as graphs and tables of names within the cube store.
The DCI edits a list of cubette specifications. These specifications are then parsed and (internally) stored within the cube store. During the parsing, flex (flex is a lexical analyser generator) source is produced to sieve data from a text file and populate the incomplete data cube. The flex source is compiled and input data is passed through the resulting lexical analyser to populate the cube. (Only flex was up to the task of handling the hundreds of thousands of regular expressions that are potentially produced during construction of the data sieve. It is unfortunate because this introduces a non-Java and non-Perl dependency, hence the code for the sieve is not 100% pure Java or Perl).
Queries can now be made on the populated cube either through a GUI or by calling the appropriate methods directly.

Currently, the incomplete data cube is designed to obtain data from a text file, the example cube parses an HTTPD access log. Specifications for units and measures in three dimension (Time, Machines, and Pages) are also provided.

Below we outline the current state of the implementation. While the cube is currently functional, much remains to be done.

Dependencies

JDK 1.0.2 - we use some deprecated methods so that Netscape can run the applet
gdbm - Unix gdbm files are used to store the cube (you can configure it to use any db however)
flex (a lexical analyzer generator) - to sieve data, not for the cube per se
development environment is UNIX (SunOS5) - untested in other environments
JavaLex/JavaCup - only if you would like to change how the specification files are parsed
Jigsaw/jdbm - the Java implementation of gdbm is a (small) part of Jigsaw (but, it's Java so there should few portability issues!)

Implemented Features

unlimited number of dimensions
unlimited number of measures (e.g., days, years, countries)
unlimited number of units (e.g., '1 October 1997', 1995, Australia)
any user-specified measures and units (none are "built-in")
Unicode for all names of measures, units, etc. (a Java feature)
input data parsed from a text file
data is "filtered" through cubette specifications
data cube data, units, and measures stored in generic database, for speed the database is currently configured to Unix dbm files
query satisfaction algorithm
sum queries
GUI for query engine
sparse storage of data, only points in the cube that have data are materialized

Unimplemented features (wish list)

incomplete data cube store configured to use JDBC
input data to be retrieved from database using JDBC
GUI to modify cubette store
deletion/renaming of units/measures
min - max queries
upper - lower bounds on query results
completeness measures for query results
suggestion of alternate answers

E-mail questions or comments to Curtis.Dyreson at usu.edu