Etymon: July 2004

Saturday, July 31, 2004

Normal people in bad situations.

Clinton wonders aloud about the behaviour of the Khmer Rouge interrogators. Unfortunately this sort of behaviour is both normal, and predicted; anyone who doesn't understand this needs to take the time to read about such experiments as "The Stanford Prison Experiment"

Welcome to the Stanford Prison Experiment web site, which features an extensive slide show and information about this classic psychology experiment, including parallels with the recent abuse of Iraqi prisoners. What happens when you put good people in an evil place? Does humanity win over evil, or does evil triumph? These are some of the questions we posed in this dramatic simulation of prison life conducted in the summer of 1971 at Stanford University.

Friday, July 30, 2004

Why types? -- lambda, the main course

First the surreal, then possibly one of the most interesting threads I think I've seen on lambda. Probably why I'm now clocking 10 hours catching up on lambda, and not only am I still at it, but I'm still motivated.

In Why type systems are interesting, Anton van Straaten lifted a comment by Luke Gorrie to a toplevel discussion:

If someone could next show what makes type systems so interesting in terms that I'll understand then I'll be well on my way into the modern functional programming world.. :-)

and in doing so launched a fascinating thread that has seen 44 posts so far and launched at least one independent thread. I seriously recommend this thread to anyone who considers themselves a serious programmer. A few highlights:

Anton managed to summarise the thread in the introduction:

When you write code, even in the most dynamic language, you always know something about the types of the values you're operating on. You couldn't write useful code otherwise - if you doubt that, try coming up with some examples of operations you can perform on a value without knowing something about the type of that value. Type systems provide a way to encode, manipulate, communicate and check this type knowledge. In an earlier thread, you mentioned that you write the types of functions in the comments. That's an informal type system! Why do you do use a type system? Because type systems are useful, important, and as a result, interesting.

Note that the first paragraph is deceptively simple. Consider the most common example of a generic data-structure, the list. If you have a truely hetrogeneous list, one whose contents are truely untyped, you your list is actually indistinguishable from a natural number! The only operations you can perform on a completely typeless list form the monoid ([], Lists, concat) which of course form a monoid isomorphic with (0, Naturals, +), and supports the Isomorphic Arrow length() mapping lists and naturals. For a fuller discussion of this I recommend Basic Category Theory for Computer Scientists, Benjamin Pierce (and before I get asked, I borrowed it from my local university library).

Anyone who has been reading lambda much immediately recognises Frank Atanassow, his writing style is distinctive, and his opinions both strongly held, and consistent. One of the really exciting things about this thread was that Frank's comment Rebuttal finally managed to clarify for me, what he has been talking about these past months. Although if you haven't been reading lambda much and intend to read that post I would strongly advise you read No contraints[sic], and keep in mind Anton's comment in Constraints

if b then 42 else "ha!" becomes something like: if b then Num 42 else Str "ha!" ...where Num and Str are type constructors for the Univ type. This is kind of ironic, considering that it's the latter language that's supposed to be able to inference types. You can think of dynamically-typed languages as a shortcut syntax for creating programs which rely on a universal type. This demonstrates that statically-typed languages can be less expressive than dynamically-typed ones. emphasis mine

It was also very amusing to watch a curveball from Julian Squires who wanted to know how static type systems interact with his desire to actively develop an executing application (think smalltalk, or possibly php) in Re: Software development as a static activity. Personally I'm not sure any existing languages combine a modern static type system with the ability to do runtime modifications. Still even if you only consider static type checking a safety net, surely one time you would really really like to have it is when doing something as delicate as runtime development.

Just surreal.

The past couple of evengings have been 'catch up on lambda' evenings. So I only just stumbled across one of the most surreal threads I think I've ever read. Old computer science and technical books worth searching for certainly looked promising. It started will, and I will definately have to keep an eye out for "ACM Turing Award Lectures: The First Twenty Years". Then someone mentioned generating functions and type derivatives, and before too long I end up ambushed with

The |x| < 1 condition is a red herring in this case. It is quite standard to work with generating functions in the context of the ring of formal power series k[[x]] with coefficients in a field k. (You can generalize the heck out of this definition.) A formal power series is a polynomial except for the fact that we don't require the elements of its coefficient sequence to become zero after a certain point. We define addition in this ring in the obvious way and multiplication by convolution of the coefficient sequences, just like you would in a polynomial ring.

Ouch! My poor head. I still need warning and a good runup when approaching this sort of thing.

Trudeau: Ch 1. Maths, Games, and Education.

I'm currently reading chapter 4 of Trudeau, but lets do this properly and I'll start with chapter 1.

Chapter 1 is the introduction, and like all introductions everywhere, is largely content free --- or more precisely, is the realm of opinion and motivation. But, before I discuss Trudeau's opinion of why I should read this book, Introduction to Graph Theory has what is probably the most excuberant blurb I think I have ever read on a textbook.

This delightfully written introduction to graph theory offers a stimulating excursion into pure mathematics. It is aimed at those whom the author calls "the mathematically traumatized," but it is a treasury of challenging fun ... "The topics are so well motivated, the exposition so lucid and delightful, that the book's appeal should be virtually universal....[sic] Every library should have several copies"

Wow, somebody was certainly in a good mood. You could almost be forgiven for forgetting that you're actually reading about a textbook. I think this gives new meaning to 'rave review'.

Anyway, back on topic, Trudeau would like us to approach pure mathematics as a game.

Basically pure mathmatics is a box of games. At last count it contained more than eighty of them. One of the is called "Euclidean geometry".

He then goes on to approach the closed-world nature of pure maths; something I as an engineer struggle to deal with.

Games have one more feature in common with pure mathematics. It is subtle but important. It is that the objects with which a game is played have no meaning outside the context of the game. ... You may balk at this, since for historical reasons, chessmen have names and shapes that imply an essential correspondence with the external world. The key word is "essential"; there is indeed a correspondence, but it is inessential. ...this interpretation of the gaem, like all others, has nothing to do with chess per se.

So in a similar way, while the words plane, point, and line suggest real-world analogues, it is only an implied interpretation. In Euclidean geometry they are axiomatic, with specific properties, but without definition.

...no one knows what planes, points or lines are, except to say that they are objects which are related to one another in accordance with the axioms. The three words are merely convenient names for the three types of object geometers play with. Any other names would do as well.

Still the most interesting paragraph in the introduction is right at the beginning. Along with a number of my peers, I cruised through High School mathematics. In fact I largely finished read Lord of the Rings, and War and Peace in my maths classes. It seemed that my teachers would take about a week to teach a days work, and yet it was obvious that dispite the slow pace many of my classmates were still being left behind. Naturally as a geek, I have spent a fair amount of time analysing and self analysing to try and understand the difference. While I was still at school, I just assumed I was just more intelligent. However experience has since taught me I wasn't. Which reopens the question, "What was I doing that made me so much more effective at learning High School maths?". I certainly didn't work harder. I believe Trudeau's introduction gives us a clue:

To understand what mathematics is, you need to understand what pure mathematics is. Unfortunately, most people have either seen no pure mathematics at all, or so little that they have no real feeling for it. Consequently most people don't really udnerstand mathematics; I think this is why many people are afraid of mathematics and quick to proclaim themselves mathematically stupid. Of course, since pure mathematics is the foundation of applied mathematics, you can see the pure mathematics beneath the applications if you look hard enough. But what people see, and remember, is a matter of emphasis. People are told about bridges and missiles and computers. Usually they don't hear about the fascinating intellectual game that lies beneath it all.

My current hypothesis is that my gift was the ability to see past the smokescreen of 'applications', and rote-work, to the fundamental patterns and structure of the pure mathematics underneath. This would also explain why I find such a return on time spent studying Computer Science --- and consequently my interest in language semantics and hence why I'm reading a book on graph theory largely for fun. So assuming I am right, can this insight into how a 'naturally gifted' high-school maths student attains their aptitude be used to improve the pedalogy of our existing maths curriculums?

Shiny Papers.

I'm a reading magpie. I find it almost impossible to browse citeseer, or a amazon without walking away with at least a half dozen new books, and twenty or thirty new papers to read. Citeseer is particularly bad... 'Ooo shiny paper', , , and I have yet another pdf to add to my already impossibly huge collection. So I've decided to be a little more systematic about my reading.

I'm going to start posting summaries of the articles, papers, and books I'm reading. This will hopefully result in a permanant record for my own reference, and if I'm lucky, suggestions for additional reading.

So for starters I thought I would post my current (heavilly abbreviated) reading list.

Essentials of Programming Languages, Daniel Freidman: I'll be working through this one concurrently with the other papers as I want to work the examples, and that will slow down the reading. I would be interested in advice; this is a textbook widely used in CS courses, would it be inappropriate for me to post my solutions?
Types and Programming Languages, Benjamin Pierce: I will be starting this as soon as I finish EOPL. Again I expect to work the examples, and a similar question arises, except that this is a graduate text so hopefully less chance of encouraging cheating

And concurrently with this:

Introduction to Graph Theory, Richard J. Trudeau
Making reliable distributed systems in the presesnce of software errors, Joe Armstrong
The Polyadic PI Calculus : A Tutorial, Robin Milner: I'll be rereading this one in order to summarise it properly. I'm finding it too useful as a framework for thinking about OOP to not want a more concrete reference to my own understanding of this topic.
Genuinely Functional User Interfaces, Antony Courtney, Conal Elliott: This is my primary area of interest, so I am particularly interested in any interesting references on FRP, Arrows, and FRP/Arrows applied to GUI's
Arrows, Robots, and Functional Reactive Programming, Paul Hudak, Henrik Nilsson, Antony Courtney, John Peterson

There are more, but that will do for now. I'll keep this list updated as I find other 'shiny papers', receive recommendations, or actually finish reading some of them ;).

Thursday, July 29, 2004

It's not as scary as it sounds.

Greg Black comments that he took a look at Joe Armstrong's thesis I linked to below. Just in case his discription makes it sound intimidating, the error handling philosophy discussion --- let it crash --- is one section of chapter 4 (ie. about 3 pages). Of course, handling errors is only one step towards a reliable system.

In fact, the chapters of the thesis are largely approachable independently of each other. Chapters 2 and 4 (Architecture and Programming Principles) are particularly good in this regard.

In the meantime for those who are feeling too lazy to read the actual pdf, an executive summary:

We don't know how to write bug-free programs
So every substantial program will have bugs
Even if we are lucky enough to miss our bugs, unexpected interactions with the outside world (including the hardware we are running on) will cause periodic failures in any long-running process
So make sure any faults that do occur can't interfere with the execution of your program
Faults are nasty, subtle, vicious creatures with thousands of non-deterministic side-effects to compensate for
So the only safe way to handle a bug is to terminate the buggy process
So don't program defensively: Just let it crash, and make sure
1. Your runtime provides adequate logging/tracing/hot-upgrade support to detect/debug/repair the running system
2. You run multiple levels of supervisor/watchdog all the way from supervisor trees to automatic, hot-failover hardware clusters

Simple really ;)

Wednesday, July 28, 2004

Let it crash

Martin Pool links to a usenix paper from last year on Crash only software. It is an interesting paper, the first I have seen applying this principle to J2EE. Still I was rather disappointed that the only prior work on supervisorary fault recovery cited seem to be their own. For instance the erlang community has made this a fundamental part of their culture, and the approach is pervasive from the language primitves thru standard libraries to system architecture recommendations and community norms.

Anyone who might be interested in more detail on how this philosophy works in practice is encouraged to check out Joe Armstrong's PhD thesis Making reliable systems in the presence of software errors.

Thursday, July 22, 2004

The importance of checking references.

Jeff Pollock has responded to Paul Gearon's post on Jeff's recent comments to the RDF DAWG list. In doing so he makes three points, I will address his other two points another time (if Paul dosn't beat me to it), but wanted to make a quick note here regarding the first.

Jeff's quote from the dawg charter didn't ring true to me in light of various comments on the dawg list, so I went and checked. Jeff wrote:

the DAWG charter says "It is a requirement that the query language be compatible with an XQuery context"

The charter reads:

There is a requirement for RDF data to be accessable within an XML Query context.

And in fact even this snipet looses signifigant, and important context. A true contextual quote reads:

At this stage, it is not clear to what extent XQuery technology is applicable to the task of querying RDF datasets. The RDF DAWG should aim to maximize W3C technology re-use, while also taking account of differences between the RDF graph data model and the XQuery data model. There is a requirement for RDF data to be accessable within an XML Query context. The working group should specify at least one mechanism for exposing RDF query facilities in an XQuery environment; that is, a way to take a piece of RDF Query abstract syntax and map it into a piece of XML Query using some form of extension to XQuery. ... While the data model of the query language of this protocol is dissimilar to that of XQuery, a non-XML concrete syntax might reuse syntactic elements from XQuery to aid learning time, even if XQuery is not chosen as the strawman.

Note that the sentence Jeff quoted is referring to an extension to XQuery to allow an RDF query to be embedded in, and accessed by, an XQuery. This is very different from a requirement for compatibility, and falls far short of requiring the use of an XQuery syntax for the rdf query language.

It is also worth noting here that the charter explictly recognises the difficulty in mapping XML-infoset semantics to rdf.

At this stage, it is not clear to what extent XQuery technology is applicable to the task of querying RDF datasets. ... While the data model of the query language of this protocol is dissimilar to that of XQuery, ... While RDF query addresses a different data domain (the RDF graph),

On the other hand, the charter also naturally recognises the value in cohesive standards that leverage off each other:

The RDF DAWG should aim to maximize W3C technology re-use, while also taking account of differences between the RDF graph data model and the XQuery data model. ... a non-XML concrete syntax might reuse syntactic elements from XQuery to aid learning time ... any query mechanisms defined by this group must leverage off, and, where possible, interoperate with, the related XML recommendations

In conclusion I will repeat what I said in my first post on this subject.

I would have absolutely not problem with discovering that XQuery can be married with the semantics required by an RDF query language, in fact that would be great.

Update Andy Seabourne has posted a link to a paper by the jena team which nicely illustrates how BRDF can be made accessable to XQuery/XSLT.

Wednesday, July 21, 2004

Tuesday, July 20, 2004

Watch this space.

I've been rather busy this week trying to hammer the new Kowari Resolver SPI into shape, so I don't have time to comment properly on the DAWG discussions regarding NI's recent XQuery proposals. I'll do that as soon as I can. In the meantime Kendall Clark was kind enough to correct some inaccuracies in my earlier post, and I feel it is appropiate to give his corrections equal billing to my errors.

Saturday, July 17, 2004

Objects and Relational Databases

Ben Carlyle has been rusing recently about Object-Orientation and Relational (or at least SQL) databases. I suggested that some background reading on some of the theory supporting the two worlds might be useful. As he didn't have pen and paper handy I'm blogging it.

For OO, Actor theory and the Pi-Calculus: Robin Milner, Communicating and Mobile Systems: the Pi-Calculus and Robin Milner, The Polyadic pi-Calculus: a Tutorial (1991)

For Relational Theory: Chris Date, Introduction to Database Systems, Eighth Edition and Chris Date, The Third Manifesto (1995).

Update on XQuery - RDF

As expected the DAWG has rejected Network Inferencing's proposal. Jeff Pollock has responded on the public-rdf-dawg list. While Jeff is obviously disappointed, I suspect that the dawg hasn't rejected XQuery. NI's proposal was a syntactic one, and the DAWG decided months ago to defer syntax until after the semantics are decided.

Two weeks ago I wrote a new rdf-query language - it took me about half a day. The parser was trivial, and the semantics a subset of iTql (Tucana's rdf query language implemented in Kowari). At the end of the day, syntax isn't as important as semantics.

Friday, July 16, 2004

MIML

Machine Independent Machine Language. AKA C.

Somehow I've managed to come across this acronymn twice today from independent sources. I still find it a beautifully accurate description of both C's greatest strength, and it's greatest weakness.

Still, somehow regardless of how many new languages I learn, I still seem to have a soft spot for C --- it is a remarkably elegant MIML.

Jumping the gun.

Well the Data Access Working Group (DAWG) of the W3C is working towards standardising a query language for RDF. Jeff Pollock from Network Inferencing recently posted to the DAWG mailing list Post/Attached Proposal

Therefore, NI proposes that a new requirement be considered by this group: "The query language shall have an XQuery compatible concrete language syntax."

I don't think you could make a more premature proposal. The DAWG is trying to avoid syntax issues, preferring to get the semantics right first --- a good idea. To be proposing concrete syntax requirements at this stage is surely jumping the gun. I am even more surprised to see a proposal to mandate as poor a prima-facie case as XQuery.

XML has a document centric, single-rooted, hierachial view of its data; and XQuery reflects that. RDF on the other hand is a open-world, non-rooted, cyclic directional graph-based view. It is not immediately obvious how to reconcile these two approaches; neither is it apparent that a syntax designed to query the former, will be appropriate for the latter. This becomes particularly apparent when you look at the block-and-layer diagram in Jeff's proposal pdf. The only link between the XQ Data Model, and the RDF Triples is that both are sometimes serialised as XML --- ok XML tends to be serialised as XML, but lets not quibble, RDF often isn't.

On the other hand, XQuery is itself quite a nice solution to its problem. Particularly nice is its definition of two isomorphic concrete grammars sharing a common semantic model. I would have absolutely not problem with discovering that XQuery can be married with the semantics required by an RDF query language, in fact that would be great. Still I would like to see the DAWG finalise the required semantics first.

I would be concerned if XQuery was even considered desirable at this stage, let alone a requirement.

Wednesday, July 14, 2004

When is an infinite loop not an infinite loop?

When it's body takes enough time that the transaction times out before the stack overflows.

This of course was aggrivated by the fact that everytime I identified precisely when the transaction was marked for rollback, it was iterating over a Tuples. This isn't surprising given we spend alot of our time iterating over Tuples. Not that that occurred to me until after I found the bug. Hence I spent all of yesterday looking inside the TuplesOperations and half a dozen Tuples subclasses.

Yesterday was definately the most fustrating days I've had in some time.

Tuesday, July 13, 2004

UML Editors

Sometimes all you want to do is draw a diagram. I have periodically wanted a simple UML editor to allow me to produce diagrams to help explain the occasionally complicated class and message structures periodically required by the OO paradigm. UML does a good job of providing a vocabulary to support such informal, explainatory diagrams. Unfortunately UML comes from a high-ceremony community that inspired and supports RUP, Rational Rose, and is currently pursuing executable UML and MDA with a vengence. Hence every UML product I have been able to find is either a high-ceremony UML based IDE (ie. ArgoUML, Rational Rose), or an extension to a traditional charting program (ie. Visio, Dia). All of them suck if all you want is to quickly draw a single sequence diagram to enhance some documentation; Visio even manages the worst of both worlds, by trying to play the pseudo-ide game while retaining the lack of structured assistance of your charting programs.

Why mention this? Because I have finally found a UML program that is probably close-enough that I can live with it's short-comings; and shows sufficient promise that I might even be convinced it's worth contributing to.

So I would like to recommend that the next time you want to throw together a quick UML diagram, you check out Violet.

Monday, July 12, 2004

Continuations.

I wrote these a while back, and I make reference to them occasionally. Now I have a blog, I should probably blog them properly.

Continuation Passing Style in Java

Continuation Passing Style in C

One of these days I'll actually get around to setting up my own wiki :).

Worthwhile languages.

There are literally thousands of computer languages, and several paradigms. Here's a few languages I consider worth looking at. Languages I know, use, and admire:

C - Some people describe this as portable assembly. They are right. C feels remarkably close to the ideal minimal abstraction from the concerns of computer architecture. I can't think of a good programmer who dosn't know C well, there is a good reason for this.

Python - Possibly the most pleasant language I know. Also by far the easiest language I know of to teach to absolute beginners. Python has the advantage of supporting both imperative, functional, and object-oriented paradigms. However it's best feature is its truely remarkable community. I am unaware of any community that has such a healthy focus on usability, readability, and statistics-driven optimisation. I don't do much python programming anymore, but I still enjoy reading the community.

Scheme - Take one very powerful language; add very simple syntax; mixin safe syntactic closure and you get scheme. Hence it is widely used in the language research community, where people are understandably more interested in the power and facilities of their language than popularity.

Languages I know, want to use, and admire:

Erlang - One of the nicest languages I've come across. Scalable, safe concurrency, with pattern matching, and a whole gamult of features supporting fault-tolerant computing all the way down to low-level wire-protocol work.

Haskell - This is a language for thinking outside the box. Lazy, pure-functional, static-strong typing, advanced HM type inferencing, monads, arrows, the list goes on. If you have only ever done imperative procedual or object-oriented code (read java, C, C++, smalltalk, python, perl, etc) learn the exceptions to most of the rules you have osmosed.

SML/OCaml; Clean, elegant, safe, and fast! What's not to like about ML? Yes you can pick a single metric on which your language of choice wins against ML, but you'll be hard pressed to find two.

Yes, I recognise the distinct lack of logic and constraint languages in this list so I'll just point at two I have been recommended but haven't found time to investigate sufficiently to add to the list above: Mercury, Mozart, and one that looks very promising Paul Reppy's Moby.

An impressive rant.

I mentioned this amusing rant to Steve last friday night, and promised to provide a link. I'm just going to assume that blogging it counts as fulfilling that promise :). I came across this on the c2 wiki, it is probably the most impressive and entertaining anti-C++ rant I've ever read.

Erik Naggum on C++

C++ is philosophically and cognitively unsound as it forces a violation of all known epistemological processes on the programmer. as a language, it requires you to specify in great detail what you do not know in order to obtain the experience necessary to learn it. C++ has taken premature optimization to the level of divine edict since it _cannot_ be vague in the way the state of the system necessarily is.

C++ is a language strongly optimized for liars and people who go by guesswork and ignorance.

C++ turns otherwise good people into paranoid, insecure prostitutes, and it comes from creating such horrible living environments for themselves and compounded by trying to explain to themselves that "it's OK, really".

Friday, July 09, 2004

Database Abstraction Layers.

Interesting post on lambda regarding database abstraction layers. For quite a while I have been wondering about clean ways to provide persistency in modern languages. Specifically you want to provide:

Persistence
Integrity
Closure

I dicuss these issues later, but for now the link to the existing discussion

Database Abstraction Layers and Programming Languages

Mnemosyne, the best persistence/query/language integration attempt I am personally aware of.

Wednesday, July 07, 2004

A very cute use for egrep.

Local UUG is currently in the midst of a cascade on funky shell tricks; I found this one from Russell Stuart, particularly cool.

tr "[:print:]" "[x*]" < FILE | egrep -vno '^(..+)\1+$'

[H-CHAT] What's your top three shell tricks?

The thread explains what it does, and I would probably replace the .'s in the egrep pattern with x's; still very cute.

It can also provide a good example of when cs theory can occasionally be useful: Given the provided solution uses a gnu-extensions to regex(7), can this be done with a pure POSIX regular expression?