logo
 
Assignment Three
CS 6890 - Web-based Database Management Systems
Utah State University
Home
Lectures
Homework   
Syllabus
Resources
People

Overview

Weight: The assignment will count 11% of your final grade.
Due Date Wednesday, April 23 (by 11:59PM)
This assignment is to implement a persistent document manager for XML data, that can be queried using the xpath package developed in assignment two (I'm expecting you to use your xpath package from assignment 2). Part of the assignment is in the tarred, gzipped, directory three.tar.gz (also available as three.zip) You may do the assignment using Windows or Linux.

Groups

Groups can consist of at most two individuals. Groups will be graded the same as individuals. Forming a group will usually result in less work for each group member, but you may choose to do the assignment individually if you so desire. To register your group, please follow the instructions at groups.htm. Once a group is formed, there will be no changes to the group for any reason. Ideally, each group will be happy and harmonious, but if not, then you are responsible for the entire assignment.

Turnin

All code should be developed in the three directory. Turnin in the assignment using the turnin page. Please upload the file three.tar.gz or three.zip. You may turnin your assignment as many times as you like.

Documentation

All documentation should be done using javadoc. We will automatically run your programs using javadoc and examine the results.

Grading

The assignment will be graded for good programming style (indentation and appropriate comments), as well as clean compilation and execution (the programs should work!).

Persistent Tree

In this assignment you will store XML in a database. Multiple documents will be stored in the database. As a document is stored, every node is given an object ID (OID). The OID can be a long (or a Long). You should decide on a "good" numbering scheme and can use whatever numbering scheme you want. For instance, you may decide that a node is numbered as N and its K children are numbered as N+1 to N+K (and node number N+K+1 is skipped in the numbering scheme). The persistent tree will have a single data model root (you may wish to number this node as 0 or 1). When a document is grafted to the data store it is grafted, by default, as the new rightmost child of the data model root.

Database aspects

To store data, use the BerkeleyDB Java Edition is available from Oracle. BerkeleyDB Java Edition is a pure Java implementation of BerkeleyDB. The database will be implemented as a collection of BerkeleyDB files. A BerkeleyDB file is a persistent, associative hash. Recall that a hash table associates keys with values. A persistent is simply a hash table that is stored on disk (it is also called a StoredMap). There will be one BerkeleyDB file for each database Table. Each BerkeleyDB file contains a set of tuples. Each tuple maps a 'key' to a 'value'. The design of the data store is up to you, but here are a few suggested tables.
  1. CATALOG - The CATALOG keeps track of system data. The most important thing it stores is the OID counter. Each node in the persistent tree is given a unique OID. All other "system" information should be stored in the CATALOG. You may assume that the OID is a long.
  2. NODES - The NODES Tables associates OIDs with node information. The node information consists of the node type (text, element, attribute, or root) node name, and node value. If you desire, you may store nodes of different types in different Tables, or type and value in separate tables. For a text node, the value is the text itself. For example, a text node storing the text "xxx" has the value "xxx". For an element node, the name is the element tag. For example, an element node for "" has the name "x". For an attribute node, the name is the attribute name and the value is the attribute value. For example, an attribute node for "age=23" has the name "age" and the value "23". The root node has the value "root". You may ignore other kinds of nodes like comment and processing instruction nodes.
  3. PARENTS - The PARENTS table maps each child OID to its parent OID. It is essentially just a set of "up" edges.
  4. CHILDREN - The CHILDREN table maps each parent OID to a "list" of children OIDs.
Do not worry about concurrency, locking, or recovery. Also, do not worry about optimizing the queries, rather just use the xpath package from the previous assignment.

xpath package

Modify your xpath package as needed to evaluate queries using the persistent package described below. Ideally, you should not have to modify it at all. The persistent package will be tested using the xpath package, so a Query should evaluate correctly and return XML from the database as it did from a single document.

persistent package

Develop a persistent package. The persistent directory will contain XMLDataStore.java and other related classes that might be needed, such as the following.
  • DocumentImpl - A class that implements (as dummy code) all of the methods in the Document interface. XMLDataStore can extend this class and just implement the few Document methods it needs.
  • StoredNode - A class that implements the methods you need in the Node interface, e.g., getFirstChild.
  • NodeImpl - A class that implements (as dummy code) all of the methods in the Node interface. StoredNode extends this class.
  • NamedNodeMapImpl - A class that implements (as dummy code) all of the methods in the NamedNodeMap interface (which is implemented as StoredMap).

XMLDataStore should implements part of the Document interface; it should implement all of the methods that you need from the Document interface for running XPath queries, as implemented in assignment two, since you'll be using XMLDataStore as your Document.

The XMLDataStore class will provide functionality to store a set of documents in a database. The class should also support the following public methods. You may add other public and private methods as needed.

  • XMLDataStore(String directory) - This is a constructor that creates an XMLDataStore object. It opens an existing database or creates a new one if no database is present in the directory. The XMLDataStore database consists of several BerkeleyDB files so a directory must be specified rather than a file.
  • void graft(Document doc) - Add the information in a the parsed document to the XMLDataStore. Assume that the root of the parsed document will be added as the new rightmost child of the root of the persistent tree.
  • void delete() - Delete all the information in the XMLDataStore. This should delete all the database files or delete all of the entries in each database file.
Finally the package may contain additional classes as you see fit.


                                                                                                                                                                                                                                                                                                                                             

  E-mail questions or comments to Curtis.Dyreson at usu dot edu