Data Representation
This page describes a way to represent data that I think is particularly good. It’s not perfect, but it’s a lot better than many of the other ones I’ve seen. Since it’s easier to talk about something if it has a name, let’s call the scheme “CKS”.
The goal is for CKS to be used as a model for easily defining serializable data types. I think this is what XML’s creators set out to do. Unfortunately, XML is a bad model and there are lots of data types for which XML is a horrible format.
There’s nothing really new here. The CKS type model is built on algebraic data types, which been around for a long time and are the basis for data types in functional programming languages like Haskell and ML.
Primitives
Any data model needs primitive types. The basic primitive type in CKS is called “void” (and written “
()
”). There is only one value that is of type void and that value, also called “void” (and also written “
()
”).
The void type contains/provides no information at all because it must always have the void value. It should become clear later on why this type is useful.
Fundamentally, the void type is the only primitive type you ever need. All other values can be defined by creating compound types. But, for practical reasons, it’s useful to have types like “
String
” and “
Integer
”. Values for these types are written in the usual way:
String
- double-quoted C language string literal
Integer
- base-10 string of digits (any embedded underscores are ignored)
Aliases
For convenience, you can create type aliases by writing: “
def
AliasName =
Type”. For example:
def Nothing = ()
def Text = String
The first line creates a new type “
Nothing
” that is equivalent to the void type.
Compounds
To do anything useful, we need more complex types. A compound type is type that is composed of members. Each member itself has a type and a tag. The tag is just unique a name for the member within the compound type.
Name: String
Age: Integer
X: ()
Listed above are three members. The tag is on the left of the colon and the type is on the right. The first member’s tag is “
Name
” and its type is “
String
”.
In an compound type, the order of the members doesn’t matter. The only requirement is that the tags be unique within the group.
Entries can be combined in different ways. When you want a compound value that consists of
all of the members, you create a
record type (a.k.a. a product type). When you want a compound value that consists of
exactly one of the members (i.e. a “sum”) you create a
variant (a.k.a. a sum type).
Records
A record is a set of members in which all the members are active. Each member of a record is a
field. This is basically a “struct” in C. Most languages support this in some form.
For CKS, we’ll write record types by surrounding members with braces:
def Person = {
Name: String
Age: Integer
}
Record values are also surrounded by braces. Each field value is written in the form: “
Tag = Value
”.
{
Name = "Scooby Doo"
Age = 42
}
Some languages allow data aggregation through “
tuples”. Tuples are records where the fields’ tags are implicit (based on the field’s position in the tuple). For example, the tuple “
(String,Integer,Integer)
” is basically equivalent to the record:
{
_1: String
_2: Integer
_3: Integer
}
Variants
A variant is a set of members in which exactly one of the members is active. Each member of a variant is an
option. This is sometimes called a “disjoint union” or a “tagged union”.
For CKS, we’ll write variant types by surrounding the options with angle brackets:
def PackageTracker = <
BikeMessengerName: String
UpsNumber: Integer
NoPackage: ()
>
Variant values consist of the active option’s tag followed by the active option’s value. Here are three separate variant values:
BikeMessengerName "Bob"
UpsNumber 12345
NoPackage ()
The void type (“
()
”) is useful here because, even though it contains no information, it’s exactly what we want. The
NoPackage
option doesn’t need to carry any additional information.
For convenience, the “
()
” type can be omitted. If no type is specified, void is assumed. The “
()
” value can be omitted as well. If the type of a value is void, then the “
()
” is implied (after all, what else could it be?).
def PackageTracker = <
BikeMessengerName: String
UpsNumber: Integer
NoPackage
>
BikeMessengerName "Bob"
UpsNumber 12345
NoPackage
The C language (and many of its derivatives) have a feature called “enumerations”, which are just a special case of variants; the type of each option is always “
()
”. For example, the following C enum:
enum Direction {
Up,
Down,
Left,
Right,
};
is similar to the following CKS variant:
def Direction = <
Up: ()
Down: ()
Left: ()
Right: ()
>
Example
To give you a feel for what you can do by combining these two compound types, here is an example address book data type.
def AddressBookEntry = {
Category: < Personal, Business >
Name: {
First: String
Last: String
}
DaytimeContact: Contact
EveningContact: Contact
}
def Contact = <
Phone: String
InstantMessenger: <
AOL: String
Yahoo: String
ICQ: Integer
>
>
Example value:
{
Category = Personal
Name = {
First = "Scooby"
Last = "Doo"
}
DaytimeContact = Phone "(123) 456-7890"
EveningContact = InstantMessenger Yahoo "SwtK9PrincessLOLZ"
}
Parametrized Types
A parameterized type is a type definition that has placeholders in it. For example:
def Boxer(N) = {
Name: N
Weight: Integer
}
The type alias “
Boxer
” requires a type be passed in as a parameter before it becomes a concrete type.
def NormalName = {
First: String
Last: String
}
def RobotName = {
Model: String
SerialNumber: Integer
}
def FloosieName = String
The following two type definitions are roughly equivalent:
def NormalBoxer = Boxer(NormalName)
def NormalBoxer = {
Name: {
First: String
Last: String
}
Weight: Integer
}
Deriving The Essentials
That’s all there is to core CKS. The rest can be derived in terms of records, variants, and “()
”.
Booleans
def Boolean = <
True: ()
False: ()
>
Since the “
()
” can be omitted, this can also be written:
def Boolean = <
True
False
>
Optional Values
It is common for field values to be optional. For example, let’s say you’re modelling a web form where entering your birthdate is optional:
def FormData = {
Name: String
Birthdate: <
Birthdate: String
NotGiven
>
}
Example values:
{
Name = "Scooby Doo"
Birthdate = Birthdate "13 Sep 1969"
}
{
Name = "God"
Birthdate = NotGiven
}
The notion of an optional value is common enough that we should probably just create a generic type that can turn any type into an optional type.
def Optional(T) = <
Present: T
Missing
>
Now the “
FormData
” type could be defined:
def FormData = {
Name: String
Birthdate: Optional(String)
}
The example value would now be:
{
Name = "Scooby Doo"
Birthdate = Present "13 Sep 1969"
}
{
Name = "God"
Birthdate = Missing
}
Lists
Lists are very common and should be easy to use. However, it’s important to notice that we don’t absolutely
need special support for lists because they can be built with the existing type system. Most functional programming languages do this by chaining together linked list nodes. For example, the datatype for a list of
String
s would be:
def StringList = <
Link: {
Current: String
Next: StringList
}
End: ()
>
“
StringList
” is the type of the list. An empty list would simply be:
End
A list with one element would be:
{
Current = "Scooby Doo"
Next = End
}
A list with three elements would be:
{
Current = "First"
Next = Link {
Current = "Second"
Next = Link {
Current = "Last"
Next = End
}
}
}
The generic version:
def List(T) = <
Link: {
Current: T
Next: List(T)
}
End
>
def StringList = List(String)
No, it’s not pretty. But it can be done.
Syntatic Shorthand
Though a miniamilistic core is nice to have, some shorthand syntax for common patterns would be helpful.
Optional Values
Our optional value generic type isn’t too inconvient, but some syntactic sugar can help. The question mark would make for an intuitive indicator of optionality.
def FormData = {
Name: String
Birthdate: String?
}
{
Name = "Scooby Doo"
Birthdate = "13 Sep 1969"
}
{
Name = "Scooby Doo"
Birthdate = ? # Can explicitly say "no value" by using "?"
}
{
Name = "Scooby Doo"
# Can implicitly say "no value" by omitting the field
}
Lists
Creating list values is prohibitively inconvienient with our existing definition. A more convenient syntax is absolutely necessary. Haskell and ML use the following syntax for list values:
[] == End ()
[1] == { Current = 1, Next = End () }
[1, 2] == { Current = 1, Next = Link { Current = 2, Next = End () } }
List types aren’t difficult to write (the type for a list of “
String
” values is just “
List(String)
”), but Haskell uses a type syntax that is similar to the list value syntax:
[String] == List(String)
[[String]] == List(List(String))
XML-ish Syntax
XML’s general syntax (taken from HTML) is convenient for document-style data. For example:
<p>What are <em>you</em> doing? <a href="whatiamdoing.html">Answer</a>.</p>
Translated into CKS’s standard syntax, it would look something like:
p {
content = [
text "What are ",
em { content = [text "you"] },
text " doing? ",
a {
href = "whatamidoing.html"
content = [text "Answer"]
},
text "."
]
}
If you’ve done any XML programming, you’ll notice how the CKS version looks like the data structure you’d get back from a DOM library. The convenience of XML’s syntax comes from two simple optimizations:
- “
text
” nodes are implicit strings.
- “
content
” nodes are implicit lists.
With those two optimizations in mind, we can easily create a parser that knows how to parse XML-ish syntax and construct a CKS data value.
Omit Unnecessary Tag Names
def Person = {
Names: [String]
Age: Integer
}
The “.” means that the tag name can be inferred from the context.
<Person>
<Names>
<.>Scooby</>
<.>Dooby</>
<.>Doo</>
</Names>
<Age>45</Age>
</Person>
{
Names = ["Scooby", "Dooby", "Doo"]
Age = 45
}
Attributes And Sub-Elements Are Interchangeable
def Person = {
Name: FullName
Age: Integer
}
def FullName = {
First: String
Last: String
}
The following fragments are all equivalent once parsed into a CKS data type:
<Person>
<Name>
<First>Scooby</First>
<Last>Doo</Last>
</Name>
<Age>45</Age>
</Person>
<Person>
<Name First="Scooby" Last="Doo"/>
<Age>45</Age>
</Person>
<Person Age=45 Name=<FullName First="Scooby" Last="Doo"/>>
<Person Age=45 Name=<. First="Scooby" Last="Doo"/>>
{
Name = {
First = "Scooby"
Last = "Doo"
}
Age = 45
}
Implementation
The core is done. You can get it from
http://cakoose.com/tools/cks/.
Currently, you tell it to parse a file and you get back and AST (kind of like existing XML DOM libraries). It’s missing the ability to automatically bind to objects in the host language (like JAXB, Castor, etc.).