Auchencairn, Scotland, Feb 27, 2006
A variable is a handle in a namespace; it gives a name to a value, so that we can recall it. Storing a value in a variable never causes an exception to be thrown because the value cannot be stored. But it may, reasonably, justifiably, throw an exception because the value violates domain expectations. Furthermore, this exception can be either soft or hard. We might throw a soft exception if someone stored, in a variable representing the age of a person in years, the value 122. We don't expect people to reach one hundred and twenty two years of age. It's reasonable to flag back to whatever tried to set this value that it is out of the expected range. But we should store it, because it's not impossible. If, however, someone tries to store 372 in a variable representing longitude in degrees, we should throw a hard exception and not store it, because that violates not merely a domain expectation but a domain rule.
So a variable is more than just a name. It is a slot: a name with some optional knowledge about what may reasonably be associated with itself. It has some sort of setter method, and possibly a getter method as well.
I've talked about variables, about names and values. Now I'll talk about the most powerful abstraction I use - possibly the most powerful abstraction in software - the namespace. A namespace is a sort of pool into which we can throw arbitrary things, tagging each with a distinct name. When we return to the pool and invoke a name, the thing in the pool to which we gave that name appears.
Database tables, considered as sets of namespaces, have a special property: they are regular. Every namespace which is a record in the same table has the same names. A class in a conventional object oriented language is similar: each object in the class has the same set of named instance variables. They match a pattern: they are in fact constrained to match it, simply by being created in that table or class.
Records in a table, and instance variables in a class, also have another property in common. For any given name of a field or instance variable, the value which each record or object will store under that name is of the same type. If 'Age' is an integer in the definition of the table or class, the Age of every member will be an integer. This property is different from regularity, and, lacking a better word for it, I'll call it homogeneity. A set of spaces which are regular (i.e. share the same names) need not be homogeneous (i.e. share the same value types for those names), but a set which is homogeneous must be regular.
But records in a table, in a view, in a result set are normally in themselves values whose names are the values of the key field. And the tables and views, too, are values in a namespace whose names are the table names, and so on up. Namespaces, like Russian dolls, can be nested indefinitely. By applying names to the nested spaces at each level, we can form a path of names to every space in the meta-space and to each value in each space, provided that the meta-space forms an acyclic directed graph (this is, after all, the basis of the XPath language. Indeed, we can form paths even if the graph has cycles, provided every cycle in the graph has some link back to the root.
It's pretty useful to gather together all objects in the data space which match the same pattern; it's pretty useful for them all to have distinct names. So the general concept of a regularity which is itself a namespace is a useful one, even if the names have to be gensymed.
To be in a class (or table), must a space be created in that class (or table)? I don't see why. One of my earlier projects was an inference engine called Wildwood, in which objects inferred their own class by exploring the taxonomy of classes until they found the one in which they felt most comfortable. I think this is a good model. You ought to be able to give your dataspace a good shake and then pull out of it as a collection all the objects which match any given pattern, and this collection ought to be a namespace. It ought to be so even if the pattern did not previously exist in the data space as the definition of a table or class or regularity or whatever you care to call it.
A consequence of this concept is that objects which acquire new name-value pairs may move out of the regularity in which they were created either to exist as stateless persons in the no-man's land of the dataspace, or into a new regularity; or may form the seed around which a new regularity can grow. An object which acquires a value for one of its names which violates the validation constraints of one homogeneity may similarly move out into no-man's land or into another. In some domains, in some regularities, it may be a hard error to do this (i.e. the system will prevent it). In some domains, in some regularities, it may be a soft error (i.e. the system allows it under protest). In some domains, in some regularities, it may be normal; social mobility of objects will be allowed.
There's another feature of namespaces which gets hard wired into lots of software structures without very often being generalised, and that is permeability, semi-translucency. In my toolkit Jacquard, for example, values are first searched for in the namespace of http parameters; if not found there, in the namespace of cookies; next, in the namespace of session variables, then in local configuration parameters, finally in global configutation parameters. There is in effect a layering of semi-translucent namespaces like the veils of a dancer.
It's not a pattern that's novel or unique to Jacquard, of course. But in Jacquard it's hard wired and in all the other contexts in which I've seen this pattern it's hardwired. I'd like to be able to manipulate the veils; to add, or remove, of alter the layering. I'd like this to be a normal thing to be able to do.
I have a friend called Big Nasty. Not everyone, of course, calls him Big Nasty. His sons call him 'Dad'. His wife calls him 'Norman'. People who don't know him very well call him 'Mr Maxwell'. He does not have one true name.
The concept of a true name is a seductive one. In many of the traditions of magic - and I have always seen software as a technological descendant or even a technological implementation of magic - a being invoked by its true name must obey. In most modern programming languages, things tend to have true names. There is a protocol for naming Java packages which is intended to guarantee that every package written anywhere in the world has a globally unique true name. Globally unique true names do then have utility. It's often important when invoking something to be certain you know exactly what it is you're invoking.
But it does not seem to me that this hegemonistic view of the dataspace is required by my messy conception. Certainly it cannot be true that an object has only one true name, since it may be the value of several names within several spaces (and of course this is true of Java; a class well may have One True Name, but I can still create an instance variable within an object whose name is anythingILike, and have its value is that class).
The dataspace I conceive is a soup. The relationships between regularities are not fixed, and so paths will inevitably shift. And in the dataspace, one sword can be in many pools - or even many times in the same pool, under different names - at the same time. We can shake the dataspace in different ways to see different views on the data. There should be no One True hegemonistic view.
This does raise the question, 'what is a name'. In many modern relational databases, all primary keys are abstract and are numbers, even if natural primary keys exist in the data - simply because it is so easy to create a table with an auto-incrementer on the key field. Easy, quick, convenient, lazy, not always a good thing. In terms of implementation details, namespaces are implemented on top of hashtables, and any data object can be hashed. So can anything be a name?
In principle yes. However, my preference would be to purely arbitrarily say no. My preference would be to say that a name must be a 'thing people say', a pronounceable sequence of characters; and also, with no specific upper bound, reasonably short.
I started my professional life writing LISP on Xerox 1108s and 1186s - Dandelions and Daybreaks, if you prefer names to numbers. When I wanted to multiply two numbers, I multiplied two numbers. I didn't make sure that the result wouldn't overflow some arbitrary store size first. When a function I wrote broke, I edited in its structure in its position on the stack, and continued the computation. I didn't abort the computation, find a source file (source file? How crude and primitive), load it into a text editor, edit the text, save it, check for syntax errors, compile it, load the new binary, and restart the computation. That was more than twenty years ago. It is truly remarkable how software development environments have failed to advance - have actually gone backwards - in that time.
LISP's problem is that it dared to try to behave as though it were a post-scarcity language too soon. The big LISP machines - not just the Xerox machines, the LMI, Symbolics, Ti Explorer machines - were vastly too expensive. My Daybreak had 8Mb of core and 80Mb of disk when PCs usually didn't even have the full 640Kb. They were out-competed by UNIX boxes from Sun and Apollo, which delivered less good software development environments but at a much lower cost. They paid the price for coming too early: they died. And programmers have been paying the price for their failure ever since.
But you only have to look at a fern moss, a frond of bracken, an elm sapling, the water curling over the lip of a waterfall, to know that if God does not write LISP He writes some language so similar to LISP as to make no difference. DNA encodes recursive functions; turbulent fluids move in patterns formed by recursion, whorls within whorls within whorls.
The internal structure, then, of the post scarcity language is rather lisp-like. Don't get hung up on that! Remember that syntax isn't language, that the syntax you see need not be the syntax I see. What I mean by saying the language is lisp-like is that its fundamental operation is recursion, that things can easily be arranged into arbitrary structures, that new types of structure can be created on the fly, that new code (code is just data, after all) can be created and executed on the fly, that there is no primacy of the structures and the code created by the programmer over the structures and code created by the running system; that new code can be loaded and linked seamlessly into a running system at any time. That instead of little discrete programs doing little discrete specialised things in separate data spaces each with its own special internal format and internal structures, the whole data space of all the data available to the machine (including, of course, all the code owned by the machine) exists in a single, complex, messy, powerful pool. That a process doesn't have to make a special arrangement, use a special protocol, to talk to another process or to exchange data with it.
In that pool, the internal storage representation of data objects is DKDC. We neither have nor need to have access to it. It may well change over time without application layer programs even being aware or needing to be aware of the change, certainly without them being recompiled.
The things we can store in the dataspace include:
Things which we no longer store - which we no longer store because they no longer have any utility - include
Files are the most stupid, arbitrary way to store data. Again, with a persistent data pool, they cease to have any purpose. Post scarcity, there are no files and there is no filesystem. There's no distinction between in core and out of core. Or rather, if there are files and a filesystem, if there is a distinction between in core and out of core, that distinction falls under the doctrine of DKDC: we don't know about it, and we don't care about it. When something in the pool wants to use or refer to another something, then that other something is available in the pool. Whether it was there all along, or whether it was suddenly brought in from somewhere outside by the runtime system, we neither know nor care. If things in the pool which haven't been looked at for a long time are sent to sulk elsewhere by the runtime system that is equally uninteresting. Things which are not referenced at all, of course, may be quietly dropped by the runtime system in the course of normal garbage collection.
One of the things we've overloaded onto the filesystem is security. In core, in modern systems, each process guards its own pool of store jealously, allowing other processes to share data with it only through special channels and protocols, even if the two processes are run by the same user identity with the same privilege. That's ridiculous. Out of core, data is stored in files often with inscrutable internal format, each with its own permissions and access control list.
It doesn't need to be that way. Each primitive data item in core - each integer, each list node, each slot, each namespace - can have its own access control mechanism. Processes, as such, will never 'own' data items, and will certainly never 'own' chunks of store - at the application layer, even the concept of a chunk of store will be invisible. A process can share a data item it has just created simply by setting an appropriate access policy on it, and programmers will be encouraged normally to be as liberal in this sharing as security allows. So the slot Salary of the namespace Simon might be visible only to the user Simon and the role Payroll, but that wouldn't stop anyone else looking at the slot Phone number of the same namespace.
Ends. |
[NITF]
| Link this story:
|
|
|
|