|
|
|
Overview
Weight: The assignment will count 11% of your final grade.
Due Date
Wednesday, April 23 (by 11:59PM)
This assignment is to implement a persistent document manager for XML data,
that can be queried using the xpath package developed in
assignment two (I'm expecting you to use your xpath package from assignment 2).
Part of the assignment is in the tarred,
gzipped, directory three.tar.gz
(also available as three.zip)
You may do the assignment using Windows or Linux.
Groups
Groups can consist of at most two individuals.
Groups will be graded the same as individuals.
Forming a group will usually result in less work for each group member, but
you may choose to do the assignment individually if you so desire.
To register your group, please follow the instructions at
groups.htm.
Once a group is formed, there will be no changes to the group for
any reason.
Ideally, each group will be happy and harmonious, but
if not, then you are responsible for the entire
assignment.
Turnin
All code should be developed in the three directory.
Turnin in the assignment using
the turnin page.
Please upload the file three.tar.gz or
three.zip.
You may turnin your assignment as many times as you like.
Documentation
All documentation should be done using javadoc.
We will automatically run your programs using javadoc and examine the
results.
Grading
The assignment will be graded for good programming
style (indentation and appropriate comments), as well as
clean compilation and execution (the programs should work!).
Persistent Tree
In this assignment you will store XML in a database. Multiple documents
will be stored in the database. As a document is stored, every node is
given an object ID (OID). The OID can be a long (or a Long).
You should decide on a "good" numbering scheme and can use whatever
numbering scheme you want. For instance, you may decide that a
node is numbered as N and its K children are numbered as N+1 to N+K
(and node number N+K+1 is skipped in the numbering scheme).
The persistent tree will have a single data model root (you may wish to
number this node as 0 or 1). When a document is grafted to the data store
it is grafted, by default, as the new rightmost child of the data model root.
Database aspects
To store data, use the
BerkeleyDB Java Edition is available from Oracle.
BerkeleyDB Java Edition is a pure Java implementation of BerkeleyDB.
The database will be implemented as a collection of BerkeleyDB files.
A BerkeleyDB file is a persistent, associative hash.
Recall that a hash table associates keys with values.
A persistent is simply a hash table that is stored on disk (it is also
called a StoredMap).
There will be one BerkeleyDB file for each database Table.
Each BerkeleyDB file contains a set of tuples. Each tuple maps
a 'key' to a 'value'.
The design of the data store is up to you, but here are a few suggested
tables.
- CATALOG - The CATALOG keeps track of system data. The most important
thing it stores is the OID counter. Each node in the persistent tree
is given a unique OID. All other "system" information should be stored
in the CATALOG. You may assume that the OID is a long.
-
NODES - The NODES Tables associates OIDs with node information.
The node information consists of
the node type (text, element, attribute, or root)
node name,
and node value.
If you desire, you may store nodes of different types
in different Tables, or type and value in separate tables.
For a text node, the value is the text itself.
For example, a text node storing the text "xxx" has the value
"xxx".
For an element node, the name is the element tag.
For example, an element node for "" has the name
"x".
For an attribute node, the name is the attribute name and the
value is the attribute value.
For example, an attribute node for "age=23" has the name "age" and the
value "23".
The root node has the value "root".
You may ignore other kinds of nodes like
comment and processing instruction nodes.
-
PARENTS - The PARENTS table maps each child OID to its
parent OID. It is essentially just a set of "up" edges.
-
CHILDREN - The CHILDREN table maps each parent OID to a
"list" of children OIDs.
Do not worry about concurrency, locking, or recovery. Also, do not
worry about optimizing the queries, rather just use the xpath package
from the previous assignment.
xpath package
Modify your xpath package as needed to evaluate queries using the
persistent package described below.
Ideally, you should not have to modify it at all.
The persistent package will be tested using the xpath package, so
a Query should evaluate correctly and return XML from the database
as it did from a single document.
persistent package
Develop a persistent package.
The persistent directory will contain
XMLDataStore.java and other related classes that might
be needed, such as the following.
-
DocumentImpl - A class that implements (as dummy code) all of the methods
in the Document interface. XMLDataStore can extend this class and just
implement the few Document methods it needs.
-
StoredNode - A class that implements the methods you need in the Node
interface, e.g., getFirstChild.
-
NodeImpl - A class that implements (as dummy code) all of the methods in
the Node interface. StoredNode extends this class.
-
NamedNodeMapImpl - A class that implements (as dummy code) all of the
methods in the NamedNodeMap interface (which is implemented as StoredMap).
XMLDataStore should implements part of
the Document interface;
it should implement all of the methods that you need from
the Document interface for running XPath queries,
as implemented in assignment two,
since you'll be using XMLDataStore as your
Document.
The XMLDataStore class
will provide functionality to store a set of documents in a database.
The class should also support the following public methods.
You may add other public and private methods as needed.
-
XMLDataStore(String directory) - This is a constructor that creates an
XMLDataStore object.
It opens an existing database or creates a new one if no database
is present in the directory.
The XMLDataStore database consists of several BerkeleyDB files so a
directory must be specified rather than a file.
-
void graft(Document doc) - Add the information in a the parsed
document to the XMLDataStore.
Assume that the root of the parsed document will be added as the new rightmost
child of the root of the persistent tree.
-
void delete() - Delete all the information in the XMLDataStore.
This should delete all the database files or delete all of the entries in
each database file.
Finally the package may contain additional classes as you see fit.
|