Auchencairn, Scotland, Feb 27, 2006
Let me start by saying that I really don't understand the problem with syntax. Programming language designers spend a lot of time worrying about it, but I believe they're simply missing the point. People say 'I can't learn LISP because I couldn't cope with all the brackets'. People - the Dylan team, for one - have developed systems which put a skin of 'normal' (i.e., ALGOL-like) syntax on top of LISP. I personally won't learn Python because I don't trust a language where whitespace is significant. But in admitting that prejudice I'm admitting to a mistake which most software people make.
We treat code as if it wasn't data. We treat code as if it were different, special. This is the mistake made by the LISP2 brigade, when they gave their LISPs (ultimately including Common LISP) separate namespaces, one for 'code' and one for 'data'. It's a fundamental mistake, a mistake which fundamentally limits our ability to even think about software.
What do I mean by this?
Suppose I ask my computer to store pi, 3.14159265358979. Do I imagine that somewhere deep within the machine there is a bitmap representation of the characters? No, of course I don't. Do I imagine there's a vector starting with the bytes 50 46 49 51 49 53 57 ...? Well, of course, there might be, but I hope there isn't because it would be horribly inefficient. No, I hope and expect there's an IEEE 754 binary encoding of the form 01100100100001111...10. But actually, frankly, I don't know, and I don't care, provided that it is stored and that it can be computed with.
However, as to what happens if I then ask my computer to show me the value it has stored, I do know and I do care. I expect it to show me the character string '3.14159265358979' (although I will accept a small amount of rounding error, and I might want it to be truncated to a certain number of significant figures). The point is, I expect the computer to reflect the value I have stored back to me in a form which it is convenient for me to read, and, of course, it can.
We don't, however, expect the computer to be able to reflect back an executable for us in a convenient form, and that is in itself a curious thing. If we load, for example, the UNIX command 'ls' into a text editor, we don't see the source code. We see instead, the raw internal format. And the amazing thing is that we tolerate this.
It isn't even that hard to write a 'decompiler' which can take a binary and reflect back source code in a usable form. Here, for example, is a method I wrote:
/**
* Return my action: a method, to allow for specialisation. Note: this
* method was formerly 'getAction()'; it has been renamed to disambiguate
* it from 'action' in the sense of ActionWidgets, etc.
*/
public String getNextActionURL( Context context ) throws Exception
{
String nextaction = null;
HttpServletRequest request =
(HttpServletRequest) context.get( REQUESTMAGICTOKEN );
if ( request != null )
{
StringBuffer myURL = request.getRequestURL( );
if ( action == null )
{
nextaction = myURL.toString( );
// If I have no action, default my action
// to recall myself
}
else
{
nextaction =
new URL( new URL( myURL.toString( ) ), action ).toString( );
// convert my action into a fully
// qualified URL in the context of my
// own
}
}
else
{ // should not happen!
throw new ServletException( "No request?" );
}
return nextaction;
}
and here is the result of 'decompiling' that method with an open-source Java decompiler, jreversepro :
public String getNextActionURL(Context context)
throws Exception
{
Object object = null;
HttpServletRequest httpservletrequest =
(HttpServletRequest)context.get( "servlet_request");
String string;
if (httpservletrequest != null) {
StringBuffer stringbuffer = httpservletrequest.getRequestURL();
if (action == null)
string = stringbuffer.toString();
else
string = new URL(new URL(stringbuffer.toString()) ,
action).toString();
}
else
throw new ServletException("No request?");
return (string);
}
As you can see, the comments have been lost and some variable names have changed, but the code is essentially the same and is perfectly readable. And this is with an internal form which has not been designed with decompilation in mind. If decompilation had been designed for in the first place, the binary could have contained pointers to the variable names and comments. Historically we haven't done this, both for 'intellectual property' reasons and because of store poverty. In future, we can ansplayed.
Again, like so much in software, this isn't actually new. The microcomputer BASICs of the seventies and eighties 'tokenised' the source input by the user. This tokenisation was not of course compilation, but it was analogous to it. The internal form of the program that was stored was much terser then the representation the user typed. But when the user asked to list the program, it was expanded into its original form.
Compilation - even compilation into the language of a virtual machine - is much more sophisticated than tokenising, of course. Optimisation means that many source constucts may map onto one object construct, and even that one source constuct may in different circumstances map onto many object constructs. Nevertheless it is not impossible - nor even hugely difficult - to decompile object code back into readable, understandable and editable source.
But Java syntax is merely a format. When I type a date into a computer, say '05-02-2005', and ask it to reflect that date back to me, I expect it to be able to reflect back to me '05-02-2006'. But I expect it to be able to reflect back to an American '02-05-2006', and to either of us 'Sunday 5th February 2006' as well. I don't expect the input format to dictate the output format. I expect the output format to reflect the needs and expectations of the person to whom it is displayed.
To summarise, again.
Code is data. The internal representation of data is Don't Know, Don't Care. The output format of data is not constrained by the input format; it should suit the use to which it is to be put, the person to whom it is to be displayed.
Thus if the person to whom my Java code is reflected back is a LISP programmer, it should be reflected back in LISP syntax; if a Python programmer, in Python syntax. Let us not, for goodness sake, get hung up about syntax; syntax is frosting on the top. What's important is that the programmer editing the code should edit something which is clearly understandable to him or her.
This has, of course, a corollary. In InterLISP, one didn't edit files 'out of core' with a text editor. One edited the source code of functions as S-expressions, in core, with a structure editor. The canonical form of the function was therefore the S-expression structure, and not the printed representation of it. If a piece of code - a piece of executable binary, or rather, of executable DKDC - can be reflected back to users with a variety of different syntactic frostings, none of these can be canonical. The canonical form of the code, which must be stored in version control systems or their equivalent, is the DKDC itself; and to that extent we do care and do need to know, at least to the extent that we need to know that the surface frosting can again be applied systematically to the recovered content of the archive.
Welcome, then, to post scarcity computing. It may not look much like what you're used to, but if it doesn't it's because you've grown up with scarcity, and even since we left scarcity behind you've been living with software designed by people who grew up with scarcity, who still hoard when there's no need, who don't understand how to use wealth. It's a richer world, a world without arbitrary restrictions. If it looks a lot like Alan Kay (and friends)'s Croquet, that's because Alan Kay has been going down the right path for a long time
Ends. |
[NITF]
| Link this story:
|
|
|
|