Tuple-Marks
NULLs everyone can live with
v0.1 [mlf-970602]
Abstract
There has been a great deal of discussion about NULL values in relational algebra. There is the argument that NULLs help us with real database problems of handling lack of information. There is a counter-argument that NULLs and three-valued logic (3VL) cause major problems with relational algebra and are unnecessary. These arguments have been very informative and there is no reason to repeat them. Instead I will change the playing field.
I propose a simple but unusual meaning for NULLs. A NULL describes a tuple, not an attribute value: A tuple with a NULL attribute belongs to a different relation than a tuple without a NULL attribute. A NULL becomes a Relation Distinguishing Tuple-Mark. This meaning provides the standard benefits of NULLs: It helps us model missing information with a small number of relation variables (i.e. tables) and simplifies understanding and interacting with a database. On the other hand, because NULLs are never attribute values (or value "marks") they have no impact on domain operations or the 2VL of basic predicate logic. These are NULLs I believe everyone can live with.
Background Requirements
This paper rests heavily on the work of E. F. Codd, C.J. Date, and all the rest of the people who helped grow relational theory. There are many references to their writings within the text and it will be hard to follow this document without a good understanding of the relational model. For example, you will need to be familiar with the precise meanings of relation, relation variable, relation value, tuple, domain, attribute, and value. I will be using these terms because they are more precise and correct for the relational model than table, column, and row.
It would also help to have read some of the previous debates on NULLs and missing information in relational modeling although this papers approach does not directly participate in those debates.
Overview
This paper progresses as follows. First we briefly discuss how the relational model represents knowledge. We then turn to how it can represent lack of knowledge without any additions to the relational model (e.g. no NULLs or special values). This approach for representing lack of knowledge works well but has the problem that the database model can become very complex. The next part of the paper introduces simple additions to the relational model, which will allow us to simplify the database with as little impact on relational algebra as possible.
First we introduce the concept of a Multi-Relation Variable that can simplify a complex scheme while still being as expressive and correct as the original scheme. Next we make Multi-Relation Variables easy to use through the Relation-Distinguishing Tuple-Mark. The tuple-mark is the main topic for the rest of the paper: We cover tuple-marks in the NULL debate, comparisons of tuple-marks to other approaches, and how to implement tuple-marks with SQL. Finally we close with some quotes from related work and a summary.
Table of Contents
Other Versions
|