Wednesday, May 27, 2015

Orderly Conduct


In the preceding post, we looked at all the various problems that make initializing all the "global state" of a program particularly tricky.  In essence, packages have to be initialized ("elaborated") before they can be accessed, but doing so may access code or data in other packages, which obviously have to be done prior.  The compiler can't do it all for you, so you have to work together to ensure you select an order that works.

Luckily, figuring all this out is not altogether difficult (or, at least, doesn't have to be).  Earlier we saw all the reasons why the compiler couldn't do this itself; it basically boiled down to lots of esoteric edge cases that could happen, but which don't often happen.  But since the LRM can't just decide to ignore the inconvenient edge cases, we are all stuck having to deal with the fallout, even for problems we never encounter.

But what if we had some mechanism to indicate to the compiler that our program didn't go in for any of these mutually recursive elaboration time dependency shenanigans?  By promising the compiler that all those complex edge cases can't occur in our code (and, of course, letting to compiler reject our code if we try), all the problems we spent so much time discussing suddenly vanish.

And so the easiest way through the elaboration swamp is around it.  That is, subject our units to more rigorous and restrictive checks (that are acceptable most of the time), and nipping elaboration problems in the bud. Of course, there still has to be a way to get something correct if you do need to utilize the complicated edge cases, so Ada arms you (get it?) with an arsenal of pragmas to accomplish both.

At the most basic level, every package can be classified into one of two fundamental groups:
  1. Units that don't contain any elaboration time code.
  2. Units that do.
We saw from before that only select things actually cause dynamic, arbitrary code to be executed during elaboration, notably initialization of 'module level' objects via functions calls from other units (there are of course others, but we restrict ourselves to objects for simplicity).  If a unit, for example, doesn't contain any module level objects at all, then there can't be any elaboration problems since it's not actually elaborating any code.

Now if we have two of these units that each have no elaboration order dependencies, it's clear that we can elaborate them in any order we please.  If neither A nor B have global variables that reference each other for their initialization, then I can just flip a coin when it comes time to pick which package goes first, because there's nothing in either that can fail.

Furthermore, if I take a third unit without any elaboration dependencies and add it into the mix, I again can pick any order I want.  A-B-C is just as good as C-B-A, as is any of the other permutations, since none of these units have any elaboration code that could fail.  Following this line of thought, given any 'N' number of units, none of which have any constructs that might cause access before elaboration, I can elaborate them in any order I please.

These types of units are classified in Ada as the surprisingly difficult to pronounce name preelaborable (pre-e-lab-or-a-bull).  Any unit that abides by a few select restrictions (e.g. global variables initialized from function calls) can be marked as being preelaborable, which places it squarely into the first group of units and essentially informs the compiler that the unit has zero elaboration time dependencies.  Consequently, the compiler forces all these units to the front of the elaboration line, and effectively 'flips a coin' as to what order they get done within that group, since they can never fail, and can even avoid having to check that they didn't.

As a general rule of thumb, there's rarely a good reason why your unit shouldn't be preelaborable.  The list of restrictions is actually fairly short:
  1. No package body initialization portion
  2. No module-level objects initialized from functions (or by default)
  3. Preelaborable units can only depend on other Preelaborable units.
And that's about it.  There are of course exceptions and caveats to these rules, and you can read about them in the LRM if you feel so inclined, but there's a very good chance that so long as you are following basic programming etiquette (i.e. packages provides types and operations that act on those types), most of your units should abide by these rules already.  That being the case, you can mark all your units with the preelaborate pragma and call it a day, confident that your program can never fail to start.

Before moving onto the next group of units, there are two other pragmas associated with preelaborable units that are worth discussing: Preelaborable_Initialization and Pure.

The first concerns private types.  We said before that one of the rules is that an object can't be "dynamically" initialized (no function calls, no other objects, no default initialization, etc); for example, if some module-level object was a controlled type, then its default initialization (or that of any of its components) would call its Initialize procedure, which puts us right back into the realm of arbitrary elaboration time code.  On the other hand, if a record is just a bunch of integers or other discrete types, then just leaving it uninitialized doesn't cause any harm.

But if our record type is private, we are in trouble; we can't see if that record is a controlled type (or perhaps contains other controlled types), or if it's just full of integers, or for that matter even if it's a null record with nothing.  We can't assume anything about a private type, so we must assume the worst.  We would have to assume that any private type is potentially trouble, and thus prohibit any preelaborable unit from having an uninitialized object of a private type.

But this is aggravating, since much of the time records are just discrete types with no elaboration concerns.  So to help mitigate this, the Preelaborable_Initialization pragma is available to allow a package to specify that the default initialization of private type does not have elaboration order concerns (this is in contrast to most other elaboration pragmas, which apply to packages as a whole).  With this applied, other units can have uninitialized objects of the private type and still stay preelaborable.

The second pragma, Pure, imposes all the same restrictions as Preelaborate, plus several more; call it preelaborate on steroids.  A pure unit has absolutely no saved state, which is important because it means that the procedures it contains are "true" functions (in the mathematical sense); i.e. the same inputs always supply the same exact outputs.  From an elaboration standpoint there's no difference, but it does allow the compiler to make certain important optimizations it couldn't otherwise.  For instance, if a 'sine' function is declared in a Pure package, the compiler knows that sin(0) will always be zero, no other side effects could occur, and thus cache the result for reuse.  Otherwise, it has to assume that the function might do something nefarious (print to the screen, log to a file, etc), and make the same call every time.

In any case, take together, these three pragmas create a closed group of preelaborable units, none of which have any elaboration order concerns, and all which depend only on other units in the group.  The compiler can go through and elaborate them all first in whatever way it wants, and even forgo the check to ensure it's correct.

But what about units that don't meet the criteria of being preelaborable?

What if, like before, we want to initialize a object in the body of 'A' to a value returned by a function in 'B'?  We are not preelaborable, but then again we aren't doing anything particularly egregious.  In this case, our previously hypothesized "tweak" of the LRM, that is to elaborate the spec directly before it's body, instead of just some point before its body, would be completely acceptable.  B's spec would have to come before A's body (because of the with clause), but we would also have to do B's body directly after B's spec (for an order of <B> [B] [A]), and all is well.

Ada actually has a pragma that essentially achieves this, but on a package-by-package basis: Elaborate_Body.  When applied to a package, the compiler ensures that the body of a unit is evaluated directly after the spec, without anything else in-between.  Applying this to packages gives you a nice, neat, orderly elaboration of a body right after its spec, so you can be confident that if you 'with' in a package marked as elaborate body, its body will be there when your package is elaborated.

All this leads to the general "rule of thumb" specified in the Ada 95 Rationale (et al): All packages should be marked as either Pure, Preelaborate, or Elaborate_Body, in that order of preference.

These impose decreasing levels of restrictions on units, but also an increasing chance of elaboration problems.  However, for the vast majority of the time, supposing you don't do anything fancy, these three pragmas will give you what you need.

Which begs one last question: what if we do want to do something "fancy"?

Consider the following code:

package A is
  one : integer := 1;  -- no dependencies
  function X return integer;
end A:

with B;
package body A is

  three : integer := B.two + 1; -- dependency on <B>

  function X return integer is
  begin
    return three;
  end X;

end A;

with A;
package B is
  two: integer := A.one + 1;  -- dependency on <A>
  ...(other stuff to make a body legal)
end B;

package body B is
  four : integer := A.X + 1;
  ... 
end B;

Now we have big fun.  Note the following dependencies:
  • The spec of A depends on nothing
  • The spec of B depends on the spec of A (via 'one')
  • The body of A depends the spec of B (via 'two')
  • The body of B depends on the spec and body of A (via A.X)
Perhaps most surprising is that this obviously convoluted code is, in fact, legal.  But we have a problem, because there is only a single elaboration order that will work:

<A> - <B> - [A] - [B]

That is, we need 'one' to exist so we can create 'two', which has to exist so we can make 'three', which has to exist so we can return it from A.X to create 'four'.

But none of our rules of thumb work for this.  We are clearly not preelaborate, but we also can't elaborate the bodies directly after the spec!  Now we have the dreaded edge case: multiple legal elaboration orders, not all of which are correct, that we must specify by hand.  To do so, we have two more pragmas:

Elaborate
Elaborate_All

Unlike the previous pragmas, which the programmer applied to the package he was creating, these pragmas are put amongst the with statements to apply to units he's referencing.  They ensure that the unit called out in the pragma is elaborated before the current unit, such as:

with A;
pragma Elaborate(A);
package body B is....

This instructs the compiler that it must select an order in which the body of A is elaborated before the current unit (B).  Given that small addition, the compiler now has the additional requirement that [A] must come before [B], which along with the original rules gives us a legal (albeit strange) program. (Note that in the above example, you would have to add another seemingly redundant with clause to [B]).

But from a practical standpoint, this is tougher than it looks.  Sure, we can go through and add Elaborate pragmas everywhere, but most real code is far more complex and contains many more units.  What if, in the above, A.X called out to other units doing other things, which themselves called other things, and so on?  This has massive scalability problems.

We aim to put software together from reusable components, so often we can't change package A (i.e. it's a COTS library).  Plus, this violates our inherent sense of encapsulation, because we shouldn't need to peek into the body of A and start mucking with things based on what we see.  But there's not much we can do, since the person writing the body for A can't possibly know that sometime in the future, some errant unit B was going to call it's function at elaboration time, instead of at run time.  And what if A calls a procedure in C that calls something from D?  Must we open up every single unit in the entire call tree?

For these reasons, Ada95 added the "Elaborate_All" pragma, which is essentially a recursive form of Elaborate; instead of just elaborating the body of the unit you specify, it elaborates that unit and all the bodies of units on which it depends, all the way down.  Now the package you depend on is a true "black box", and you can be assured that your single pragma in the client will make the entire subsystem available  (in most cases, Elaborate_All is the better choice than Elaborate, which is for the most part obsolete).

Though just because Ada has this ability doesn't necessarily mean you should utilize it.  Of course there are situations where this is desired, if not necessary, but for the most part, adding Elaborate_All pragmas is a sign of bad design.  Avoid the problem altogether and redesign the code such that your packages are Preelaborate (or at least Elaborate_Body).

But at the same time, don't just rely on GNAT's static checking crutch.  Take elaboration into account as you write the code, not simply because it's the right thing to do or because your code will get better, but because most of the time you won't find the problems until it's far too late.  Do you really want to go back and add 500 pragmas to 500 packages since it didn't occur to Joe the C Programmer that this would ever be a problem?  And before you let Joe the C Programmer slander Ada for requiring such pedantic verbosity in the first place, go and Google "static initialization order fiasco"; C++ has the same problem as Ada, except it's so problematic and unsolvable that it's even got its own cute name containing the word fiasco!

So go forth, enlightened one, and banish elaboration circularities and access elaboration exceptions back from whence they came!  Every time your program starts, a slight smile should creep to your lips, since now you know it's no accident how it happened.

Sunday, May 24, 2015

Elaboration Station

One of the great niceties of programming is that, fundamentally, programs execute one statement at a time, plodding along in a nice, neat, well-defined order.  It begins at the first instruction and ends at the last, and nothing out of the ordinary (threads and interrupts notwithstanding) messes that up.

Well, almost nothing.  Programs have but a single entry point, but most languages, Ada included, have the ability to declare objects at a scope just above the main subprogram.  We normally call these global variables, or library-level objects, or module level objects, or by whatever other name your particular flavor of language calls them.  But by any name, these objects get created before the main program starts, and live (just) past where the program ends.

(Note: I will talk exclusively about initializing 'global' variables in regular units, regardless of whether they are in the spec or a body.  As always, this is an over-simplification, and ignores things like child/parent packages, and other elaboration time code like library level tasks or packages containing initialization code.  However, the rules are essentially the same).

This opens up an interesting loophole in the "start at the beginning" philosophy, because this means there are actually TWO beginnings to any program.  The program has to go through and initialize all these global variables to whatever their initial values are supposed to be, before the main function begins in earnest, so that when it comes time for the 'real' beginning, everything is as it ought to be.

But in any modestly complex language, this is a much more subtle and difficult problem than at first it may appear.  Usually we just initialize constants to values, such as "pi : constant Float := 3.14", and there are no particular concerns, or for that matter anything that the program actually has to do at runtime (the compiler can normally handle these sort of static objects itself as a hardcoded variable in the file).  But, as always, general purpose languages have general purpose problems.

We have to consider the unpleasant fact that these global objects can also be initialized from other expressions apart from numeric literals; particularly, we might initialize them from other objects in other packages, or from the results of function calls (among others). For example:

package X is
  R : constant Float := P1.Calculate_Radius;
  C : constant Float := 2*P2.pi*R;
end X;

Here, we initialize R from the result of a function in another package (P1), and then initialize C via a calculation using both the previous object R and our declaration of pi in a third package (P2).  We can quickly see this start to spiral out of control, because what can be said about the Calculate_Radius subprogram?  It might execute any arbitrary code at all, perhaps prompting the user for a value, or communicating with a satellite to retrieve a value from the Mars rover.  In general, we might execute any code at all.

The issue at hand is that we've got all these packages with all these objects that need initialization, and no good way to figure out the order in which to do it.  For instance, in the above example, we need to ensure that P2 (with the 'pi' constant) is initialized before we try and use it to calculate C (or else the program is nonsense).  Of course, if the calculation of pi depends on some other unit, we have to do that unit before P1, before X, and so on and so on.

To help untangle this mess of "elaboration order", Ada has basically two rules governing the order in which the compiler should go through and initialize all the units:

  1. A package spec must be initialized before its corresponding package body.
  2. The package spec of any 'withed' unit must be initialized before the unit that 'withs' them.

But the perhaps unexpected thing about these rules is not just that you end up with several different potential elaboration orders, but that not all of them are necessarily correct.  To account for this, Ada also mandates that if the order the compiler selects is wrong, an exception should be raised when you start the program and stop things dead in their tracks.

This raises two interesting questions.  First, why don't we have rules that make sure the order is correct in the first place, and second, why do we get a run-time exception instead of a compile-time failure?  After all, at first blush, these rules seem pretty good.  A global variable can't be initialized from anywhere except either something in its spec, which is gaurenteed to be there by rule #1, or another unit that is 'withed in', which is handled by rule #2.  So what gives?

If you go back and re-read rule #1 like a language lawyer, you might note a conspicuous omission.  It requires that a unit's spec must be initialized before its body, but not immediately before, only at some point before.  The compiler is allowed to try and elaborate any number of other units between a spec and its body, and this creates interesting loopholes.

For instance, suppose a global variable in the body of A is initialized to the result of a function declared in B.  Ada's built in rules require the following:
  1. <A> is elaborated before [A] (rule 1)
  2. <B> is elaborated before [B] (rule 1)
  3. <B> is elaborated before [A] (rule 2)
 
Note that both the following orders are legal in accordance with our rules:
  • <B> - <A> - [B] - [A]
  • <A> - <B> - [A] - [B]
but the latter one would result in the dreaded "access before elaboration".  We initialize the body of A, including the global variable, which executes code from [B] which has NOT been initialized yet, and we have a flawed program with undefined values.  In the first one, however, [A] is not initialized until after [B], and everything is acceptable.

Why then can't we just tweak the LRM to specify that a body should be elaborated DIRECTLY after its spec?  It would appear that such a modification would be a good thing, since it would narrow down our choices to just a single, correct order:
  • <B> - [B] - <A> - [A]
The problem is that, like we said before, general-purpose languages have general-purpose problems.  Such a modification would be far too restrictive, since it would impose a strict hierarchical structure for a program.  We could not, for instance, have two mutually recursive functions in different packages, or have finite state machines, or many other interesting and fun software solutions that involve code in two different packages calling each other at elaboration time.

If we were to add such a 'direct' elaboration rule, these case examples would become illegal, despite not having any elaboration problems.  Consequently, we have to leave some bit of ambiguity to the order so that legal albeit niche programs are still possible.

But while this answers the question of why the LRM cannot specify rules that work for every program in every situation, it doesn't really answer why they can't put the onus on the compiler to find an order that works on a program-by-program basis.  The compiler has to analyze the source code, right?  Why can't we program it to be smart enough to check for elaboration dependencies as it goes, to narrow down the potential list to one that's correct?

For example, let's take the same example from above.  The compiler identifies the global variable in the body of [A], identifies it as being initialized from a call to <B>, and can in theory prune down the list of potential elaboration orders so that only those with [B] ahead of [A] are acceptable.  If we have some sort of untenable situation, such as a global variable in both [A] and [B] that are initialized from functions in <B> and <A> respectively, then we can spit out an error indicating that no correct order is ever possible, and thus prevent bad programs from ever being ran at all.

Again, at first glance this seems like an acceptable rule and in fact early drafts of Ada actually did require these sorts of checks.  But at the risk of repeating myself, general purpose languages have general purpose problems.  Don't forget, *we can run any arbitrary code* during elaboration, and nothing says that code has to work the same way every time.  It can have any manner of if/else blocks, which means the behavior (and success or failure) of the initialization can be drastically different.

Suppose we write the following function, which perhaps questionably puts the code for gladly paying people Tuesday for hamburgers today into the calculation of how many hamburgers liabilities we currently have:

function Hamburger_Debt return Integer is
begin
  if Day_Of_Week = Tuesday then
     Hamburger_Database.Glady_Pay_All_Lendors
     return 0;
  else
     return Current_Liabilities.Hamburgers;
  end if;
   
end Hamburger_Debt;
 
For or better or worse, this code is legal.  But if somewhere in another unit we have a global constant (perhaps "Starting_Hamburger_Debt") initialized to the return value of this function, this function gets executed during elaboration time, and real fun begins.

Now we are in real trouble, because the elaboration now depends on what day of the week it is when we start the program.  The compiler needs to elaborate the Hamburger_Database prior to whatever unit our function is in, but only on tuesdays.  On any other day of the week, we don't touch the hamburger database and have no care as to when it's elaborated, but in that case we need to ensure the Current_Liabilities package is elaborated prior.  The required elaboration order changes based on the date, which we obviously can't know until we run the program!

Alright, perhaps you argue the compiler should take a "worst case" approach, and say that any unit referenced at all, regardless of any control transfers, should be considered as a potential elaboration issue, and we should sort things out like that.  But then what do we do with something like this:

function F return Integer is
begin
  if false then
    return P2.X;
  else
    return 0;
  end if;
end F;
 
Obviously this is dead code and no elaboration order problem possibly exists (in fact, this entire subprogram can be optimized away).  But we just said the compiler should not perform this type of exhaustive analysis!  Should this code, statically known to be free of any elaboration order problems, potentially fail to compile because of a non-existent elaboration problem?

So we can see that trying to statically prove something that's inherently dynamic is a slippery slope.  The 'correctness' of a particular elaboration order can only be determined by somehow knowing the outcome of every if statement, which even in the best case means exhaustively testing all the possible combinations of states, and even then dynamically choosing how to elaborate packages as decisions are made.  Clearly this is hard, not just in the algorithmic sense but certainly in the programming sense, and it's unlikely any compiler could ever hope to implement it (and still have programs compile before the sun burns out).  We must either be overly strict and reject legal code (as is the case with accessibility), or be dynamic and account for the possibility of runtime failures (as with elaboration).

And so, at the end of the day, the compiler has no other recourse than to simply dump the responsibility of elaboration order on the programmer, but then double check at runtime that the order was sufficient (e.g. tracking the initialization state of each package, and verifying it has been set prior to access).  Various pragmas exist to let the programmer try to solve their own unsolvable problems, which perhaps we'll see in a future post.

But wait, there's more: GNAT stirs up the already murky water by deviating from the LRM-specified behavior by actually attempting to statically prove the correctness at the expense of being overly pessimistic.  The default behavior of the binder is to take the 'worst case' approach detailed above, and try to analyze the code for constructs that cause elaboration time dependencies (e.g. global variables).  Should they be found, it takes up the slack for the programmer and ensures the "right" order is used.  But like we said, this means GNAT is "overly strict", and many valid Ada programs fail to compile in GNAT due to elaboration circularities.  I suppose they feel the trade off is one worth making (and, of course, switches exist to enforce the less-strict Ada rules).

Like most non-standard GNAT behavior, this is both a blessing and a curse.  On the plus side, it means that the program can be proven, statically, to not fail the elaboration check, which means the check is obviously unnecessary.  No chance of failure and faster startup times are a good thing.

But at the same time, it also means your program is not necessarily portable between compilers, or for that matter even between versions of GNAT (their static checker gets better each time around, so dubious situations marked 'safe' in one version might suddenly no longer compile).  More importantly, though, it makes us all dumber.  Many programmers go their entire career without ever realizing elaboration is a thing or considering how it might fail, let alone inserting the appropriate pragmas and in some (rare) cases, actually making a concerted effort to take them out (true story).
      
Moreover, apart from being good at your job just for the sake of it, worrying about elaboration dependencies makes you code better.  It's not a coincidence that most of my outlier problem-causing examples seem convoluted; code is cleaner and better when these sorts of dependencies don't exist in the first place (Text_IO, I'm looking in your direction!).  Having to deal with them is the best incentive to restructure your code so that you don't have to deal with them, and GNAT is making us all lazy by cleaning up our mess.

So that's the problem; but what's the solution? To be continued...

Saturday, December 14, 2013

Ada 2005 access types, part I

If you've read the previous series of articles, then you are up to date with how Ada95 deals with so-called "general" access types.  You've got your accessibility checks that fence a local access value inside its type's scope, and you've got your anonymous access values that let the access value out on a leash.  Inside the fence, you can run free and do whatever you want, except you can't get out.  If you want to go out, you can only do so on a leash, in which case you are extremely restricted in your movement.  Put together, you've got a nice neat "either/or" system that ensures your pointers never get lost or hit by a car.

Fast forward a decade to Ada 2005.  One of the major updates in the language, warranting an entire chapter in the rationale, was the expansion of the anonymous access type.  Unfortunately, history has not exactly been kind to this 'feature', and many (most?) programmers treat them with disdain.

But the Ada05 AAT's suffer much of the same fate of Ada95 AAT's: poor documentation and worse understanding.  Just like Ada95, Ada05 anonymous access types are special-purpose features for special-purpose jobs, and yet they are not described this way.  Ada05 actually suffers worse, since there are no less than three new special kinds of access values, all serving different purposes, all of which are different than the Ada95 versions, and all of which are lumped together as a single topic.  Mix in the fact that they all have identical syntax, and you've got a lot of confused programmers.

And that really is the downfall of Ada's AAT's: they all have the same syntax, but all serve completely orthogonal purposes.  And while this might be elegant and simple from a language design standpoint, having the same thing mean different things in different contexts is a recipe for disaster.  But I digress. 

Not counting the "null exclusions", which are straightforward and won't be discussed, Ada05 introduced three new kinds of anonymous access types.  Note that these are the names that I've given them, and not necessarily what the ARG says about them:
  1. The anonymous access component.
  2. The anonymous function return.
  3. The anonymous access to subprogram parameter.
Again, to the layperson, all three seem identical to both each other and the Ada95 access parameter and access discriminant.  Each involves using an "in line" access type declaration instead of a named access type.  But don't be fooled: each one is there for a specific reason, each of which will be discussed in upcoming posts.

First up: the anonymous access component.  The brochure version of this feature is that now you can use anonymous access types in record components, array elements, and stand-alone objects.  But why?  To understand why this feature was added (and when, exactly, you should consider using it), like the cast-able access parameters before them, we have to once again talk about object oriented programming.

For those who deal in the domain of OOP, a fundamental tenant is that child types can be freely substituted for parent types.  This is implemented in Ada via the 'classwide' type, which represents not only the type in question, but any of the types that derive from it.  For example, if C extends P, then any code that requires a P'Class can freely use a C or a P, instead of just a P.  This is the quintessential "is-a" relationship.

This setup works well, except for one issue: the same cannot be said of pointers.  For example, let's suppose a serial killer with a penchant for software engineering is interested in modeling his victims.  This is neatly decomposed (ha!) into a class hierarchy:

type Victim is interface;
type Victim_Ptr is access all Victim'Class;
procedure Dismember (This : Victim);

type Hooker is new Victim with (...)
type Hooker_Ptr is access all Hooker;

type Vagrant is new Victim with (...)
type Vagrant_Ptr is access all Vagrant

And, like any good classwide programmer knows, we can freely use Vagrants or Hookers as a parameter or object of Vicitim'Class.  Bully abstraction!

But the type conversion rules for pointers are not so easily duped; there are no "classwide pointers" (thought perhaps life would be different if there were!).  Consider the following code:

Dead_Body : aliased Hooker := ...
p1 : Hooker_Ptr := Dead_Body'Access;
p2 : Victim_Ptr := Dead_Body'Access;
p3 : Victim_Ptr := p1;
p4 : Victim_Ptr := p2;
p5 : Victim_Ptr := Victim_Ptr(p1);

Quiz time: which of these statements are legal, and which aren't?  Anyone who is well-versed in the Ada95 access type conversion rules (<cricket, cricket>) will be able to identify that, oddly, it is p3 that will fail the conversion.  This is anathema to an OOP programmer, since quite obviously they all point to the same damn object.

Note that both p1 and p2 are initialized via the access attribute, which is legal since the compiler knows that our Dead_Body is both a Hooker and a Victim'Class.  p4 is legal, since both types are Victim_Ptrs, but p3 fails since, technically speaking, the types don't match, which the rules require.  We can always force the issue, as in p5, but people generally agree that casting is bad.

This is unfortunate, because using a "pointer to a hooker" as a "pointer to a Victim'Class" is no less unsafe or illegal than using a "hooker" and a "victim'Class", which is the underpinning of all OOP.  Sure we can cast it, but casting is a red flag for a conversion that might fail, and in cases like this, it is perfectly safe.

The pessimists in the crowd will say that proper typing in the first place will solve this conundrum.  That is, figure out whether we need a 'victim' or a 'hooker', and use the appropriate type.  But where this might work in simple situations, this isn't possible in general.  Let's suppose our serial killer expands his system like so:

type Burial_Record is
   record
      Body : Victim_Ptr;
      Location : Lat_Long;
      Date_Interred : Date;
   end record;

procedure Bury_Body (Body : Burial_Record);

type Hooker_Array is array (Positive range <>) of Hooker_Class_Ptr;

procedure Violate_Bodies (x : Hooker_Array);

We want to track all our victims, presumably in some sort of set or container, so that we might disinter them later as needed.  Similarly, we might also want to do strange, awful things with the dead bodies of the hookers. 

But now we are in real trouble.  Because we want them nested inside records or arrays but the items are indefinite, we need to use pointers.  And in Ada95, that means named pointers.  But now we've painted ourselves into a corner, since either way we choose, we will have an unnecessary cast forced upon us.  If we allocate a dead hooker as a Hooker_Class_Ptr, we will be able to violate it, but not bury it.  Similarly, if we allocate it as a Victim_Ptr, we can bury it but not violate it (and rightly so; a classwide victim might be a vagrant, and only a twisted psychopath would violate a dead vagrant).

So then, why not just change the rules to say implicit conversion is okay?  Alas, such changes, after the fact, are not quite so simple.  The hiccup here is that incompatible types are what make overloading work.  For instance, suppose we have something like this in Ada95:

function F return Victim_Ptr;
function F return Hooker_Ptr;
p : Victim_Ptr := F;

Since these types are incompatible, the compiler knows that we really want to call the first version of F.  But if we suddenly make Hookers implicitly convertible to classwide Victims, this code is now ambiguous because either call is valid, and backwards compatibility is broken.  So another way must be found.

If you remember the discussion of access parameters and access discriminants, the trick behind them was that we could use substitute any access type of varying accessibility level for the actual value, and the lack of a name meant that we couldn't copy it.  The expectation was that all these different types would be necessary in order to abide by the accessibility rules; otherwise, you could just use named subtypes to achieve the same affect.

But looking at things in a more general sense, nothing says the accessibility levels have to be different.  We could have two (incompatible) integer pointer types, both declared at the library level (with identical accessibility), and an access parameter would work equally well with either.  Again, there wouldn't be much point, since would could just as easily (and perhaps more readably) define them as subtypes and use named values, since accessibility is not coming into play.

And this is exactly the ability we are looking for.  For instance, some hypothetical function that, instead of taking a named Victim_Ptr, accepts an access parameter to a Victim'Class does almost exactly what we want.  We can pass in Hooker_Ptrs, Vagrant_Ptrs, Victim_Ptrs, and any other type we can come up with.  The downside is that we can't copy it, which is of course what the whole system was set up to do.

And so in Ada05, the ability to use anonymous access values was added to both record components, array elements, and stand-alone objects, but with one crucial difference: nothing is stopping you from copying them.  If you recall the Ada95 anonymous access types, the rules were carefully setup so that once you converted a named type to an anonymous type, you were barred from every copying them (which was, after all, the point).  Ada05 anonymous access types, on the other hand, are designed to let you get the implicit conversion, but continue to copy them.

Let's revisit the example from above and see how AAT's help the problem.  Instead of using named pointers in our record and array, let's use anonymous types:

type Burial_Record is
   record
      Body : access Victim'Class;  --AAT
      Location : Lat_Long;
      Date_Interred : Date;
   end record;

procedure Bury_Body (Body : Burial_Record);

type Hooker_Array is array (Positive range <>) of access Hooker'Class; -- AAT

procedure Violate_Bodies (x : Hooker_Array);

It's mostly the same, except we've removed the named types and replaced them with 'inline' declarations, similar to access parameters or discriminants.  But this changes everything!  If we allocate a Hooker_Ptr, we can put it into either the array or the structure, without having to cast or convert anything.  However, all engineering is a trade-off, and we still lose the ability to convert it back to it's original type after making it anonymous (which means that within the two procedures, the values are only compatible with other AAT's.

But there is a key distinction between Ada 95 and Ada 2005.  In Ada95, the AAT rules were designed to allow you to work with shorter-lived objects, i.e. those where the accessibility level was deeper.  But the rules for Ada05 AAT's are not only different, it's practically the reverse: the type level of anonymous access values is defined to be the same level as where it's declared.

This means, just like named types, the object has to live at least as long as the type, or else you will fail the accessibility check.  Presuming our types and functions from above are the library level, you will never be able to pass a local object to either, only other library level objects (or allocations).  But within those function, you can copy them, put them into sets, what have you.  This is in stark contrast to access discriminants and parameters, which allow local objects, but prevent copying.

The great irony in this is that whereas access parameters and discriminants are there to help you work with shorter-lived objects, access components and elements have rules to prevent you from using them with local types.  To a layperson, an access discriminant and access component look exactly alike, yet are fundamentally different, and there for different purposes.  All this adds to the 'expert-friendliness' of the language, to both the expert's and layperson's disdain.

So before you blindly start using anonymous access types everywhere to save you the trouble of using a named type, ask yourself do I really need one?  If you are not trying to resolve the conundrum of having to needlessly convert access values between their concrete types and classwide parent's type, then the answer is probably no, and you probably need to be using a proper named type.

But at the same time, don't be one of those programmer's who summarily decides that anonymous access types are evil.  Like everything in Ada, they are there to solve a problem; just because they don't solve your problem doesn't make them bad.  Properly used, there are situations where only an anonymous access type will do.  But like any other tool, using them for jobs they are not designed for is likely to break the tool and the project.  Except for stabbing hookers with screwdrivers: that works every time.

Sunday, December 1, 2013

License to Program

First, the disclaimer: I am not a lawyer, nor am I giving legal advice.  Consult an actual lawyer before distributing your program.  Secondly, all this is about life in these United States, so international visitors should refer to their own laws and customs.

Software is a funny thing, in that you almost never actually buy it.  Sure, you might go to Best Buy and plunk down some of your hard-earned cash for a disc full of code (or, if you are a hip cool kid, download it from Steam), but ironically, exchanging money for goods doesn't necessarily mean you are buying anything.

Most of the software in the world today is licensed.  Some of it comes with expensive, restrictive licenses, and some of it comes with inexpensive, permissive licenses.  More to the point, almost all of these licenses are different, and create fun and exiting conundrums when you try and mix and match them.  The bottom line is that almost nobody has any idea what they can or can't do with a piece of software, and programmers have it the worst.

So to clear all this up, we have to fire up the wayback machine to 1787, when the U.S. constitution gave congress "the power to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries", which we know today as copyright.  Informally, if you create something new and original, the government says that you get exclusive rights to decide who gets a copy of it and when.

But one of the things people don't often realize is that copyright gives just as much protection to the person receiving the copy, as well as the author himself.  It's not just that the author gets to decide who does and doesn't receive a copy; after someone does get a copy, what the receiver does with that copy is his business.

Let's say I write a book.  The law says nobody can copy that book except me, so assuming I want to make some money, I make five copies of it.  This is legal, since I own the copyright.  Now, I sell (in a legal sense) one of those copies to you.  You are of course prohibited from making copies of it, which is the whole point, but there is a whole world of possibilities of non-copy things you can do, and I as the author can't stop you.  You can cross out every word containing a 'y', you can use it to prop up a wobbly table leg, you can rip out every page and sew a dress from them, you can give it to your mother for a Christmas gift, you can whack your mouthy wife over the back of the head with it, or you can burn it in the center of town to express your outrage over my egregious use of four letter words.  Just so long as you don't make a copy, well, that book is yours to do with as you please.  I, as the author, can't step in and stop you.

And this is how copyright worked for a few hundred years.  Admittedly though, a lot of the copyright enforcement took place simply because, well, things were just inherently difficult to copy.  Copying a book in the 1700's involved copying each word with a quill pen, a task so arduous that it was easier to just buy the book yourself.  You could always go to the trouble of buying a printing press, but the investment and typesetting costs were so high that you would have to start selling your bootleg copies on a large scale to recoup your expenses.  And once you start your high-profile book bootlegging operation, it's a lot easier for The Man to hunt you down and throw your ass in jail.  And so copyright naturally just enforced itself, whether it was books, or records, or paintings.

And then one word changed everything: tape.  In the late 1970's and early 80's, the cassette tape market began to take shape, and suddenly the copyright landscape became very murky.  Remember how once you buy a (legal) copy of something, that copy was yours to do with what you please?  That includes me buying a book, and then lending it to you to read, and then you returning it to me when you're done.  After all, no copy has been made, and so no crime has been committed.  By extension, I am also allowed to lend you my copy for a fee, a process we all know as renting.  Renting was perfectly legal under the copyright law, since when you have my copy, I don't, and once you give it back, you don't have it anymore.  Nice and neat.

But this simple idea became much more scary idea for the music industry once tape was introduced.  Now, I could legally purchase an album, and then legally lend it to you for a small fee, you could illegally make a copy of it simply by pressing a button and return it, leaving me free to legally rent it out to the next customer.  The cops can't hassle me, because I haven't broken any laws, and the cops can't hassle you, because they have no reason to suspect you so long as you stay in the privacy of your home, don't gloat about your crimes all around town, and don't start selling large quantities of bootlegs.  I make a bunch of money, you save a bunch of money, and the music industry takes it on the chin.  There were, in fact, actual businesses that would "rent" you a cassette album, a dual-deck tape player, and a blank tape all at the same time (with, of course, a wink and nudge), and under the copyright law, everything was above board.

Obviously, music industry fat-cats were not going to stand for this, and so in 1984 congress passed the Record Rental Amendment of 1984.  In essence, this simply changed the copyright law to prohibit outright the renting of a sound recordings.  You could still sell them, you could still give them away, and you could still lend them to friends for free, but you couldn't charge money for it.  And this worked well enough, since the aforementioned "music rental stores" became illegal, and the fat-cat lawyers could sue them into oblivion.

Of course, 1984 was right around the same time a brand new, copyrightable medium was about to explode on the world.  And this type of work was not just easy to copy; it was required to be copied to be used.  After all, I have to copy those five floppy disks of data to my hard drive in order to play DOOM, which is technically copyright infringement.  But in classic congressional lack-of-any-forward-thinking, this newfangled "software" was completely ignored, and all the same problems that cassettes had were going to be repeated for software.

However, software lawyers were certainly well aware of the trials and tribulations of the music industry lawyers, and knew that without some sort of protective measure, nothing was going to stop people from renting out software (or even just lending it for free) without ensuring it was actually uninstalled.  After all, once DOOM is installed on my hard drive, the disks just sit in the box, and it's much for fun to lend them to a friend so we can play over the network.  So to avoid the same problem of "software rental stores" lending out the same legally purchased copy of a program over and over, software lawyers resorted to the one thing that trumps copyright law: contract law.

Suppose, like before, that I write a book.  But now suppose I don't just want to sell five copies, I want to sell five million copies.  This means a printing press, operators to run the printing press, entire forests worth of paper, 55-gallon drums of ink, expensive typesetting, book binding, covers, boxes to put them in, trucks to ship the boxes, drivers to run the trucks, and even after all that, I still have to find people and stores to buy the damn things.  This is an expensive proposition, especially when all I really want to do is write books and cash the checks.

So most industries have a sort of middle-man known as a publisher or distributor.  I do the creative work of writing the book, and they do the heavy lifting of printing them, moving them, and selling them.  But right off the bat, we see that this setup is technically illegal.  As the author, I have exclusive right to make copies of that book, which means that the publisher can't, even if I want to let him.

Obviously, this is not the intent.  I want to be able to give the publisher conditional permission to make copies, so long as certain conditions are met.  For instance, I let them make as many copies as they want with the condition that they give me half the money they make from selling them, and they agree to that so long as I promise to write them three more books in the next five years, or what-have-you.  Here, the publisher and I enter into a contract that we both agree to, and so long as the contract is above-board and negotiated in good faith, I can "give up" my copyright protection to allow someone else to make copies.  The key is that I haven't sold them a copy of my book, I have conditionally allowed them to use the original work.  Other people still can't make copies of it, and the publisher can make copies only so long as they honor the original contract.

In any case, some software lawyer who I'm now sure lives in a much larger house than I, had a wonderful, awful idea: let's use the same loophole between the publisher and the end user!  Now, instead of the consumer buying a copy of the software, they are simply conditionally allowed to use the software.  Now, all bets are off: any "condition" to which the user agrees is nice and legal, no matter what the copyright laws say.  After all, if they don't like the conditions, well, they don't have to buy the software.  And so begat the nefarious end-user license agreement.

And for awhile, this was a mostly benign way to prevent legalized software piracy.  Most EULA's basically said nothing more than "you can use this software however you want, except rent it out or give copies away".  If you decided to start a software rental shop, you were breaking the "contract" that you implicitly signed by ripping open the package, and the lawyers had legal standing to sue your ass.  But as should have been expected, things quickly spiraled way out of control.  Lawyers realized that nobody was actually even reading the damn things, let alone weighing whether or not the license was too restrictive, and bit by bit, these licenses began to strip away more and more of the users rights.  Most users, of course, didn't care.

But at the same time copyright wasn't protecting programmers wishing to make money, it was also not protecting programmers who had no desire to profit at all.  Suppose for a moment that I write a neat little simulation library for some academic paper I am writing, and I want to share it with all my other scientist buddies to verify my results and spur future work.  Copyright gets right in the way, because while I as the author can give copies away (for free) to whomever I want, those on the receiving end can't distribute it further, despite me being alright with it.

My only option in this case is to release my code in the public domain, which means I give up all claims to ownership in any way.  This serves my purpose, except not everyone is as honorable as my scientist buddies; once in the public domain, nothing is stopping evil software corporation X from getting hold of a copy, and then turning around and selling it as part of their project (under a restrictive license, no less).

So like fighting fire with fire, people began to use the same EULA trick as the evil software corporations, but with a twist: instead of restricting your rights, it actually broadened them.  You could make as many copies as you wanted, with the condition that you distribute them with the same license.  The recursive nature of this style of license, in essence, ensured that your free software stayed free software.

And this is pretty much where we are today.  Software you "buy" comes with a longwinded EULA that is normally chock-full of fun surprises, like only being able to legally use your OEM copy of Windows on the computer it came with, not being able to do performance tests and publish the results, promising the company your first born son, and so on.  On the other hand, "free" software licenses let you use the software however you want, with the condition that your redistributions remain free, even if part (especially if part) of a larger software program.

What's confusing about both types of licenses is that your rights are a quality of the license, and not inherent to the software itself.  Suppose I write some software, and I license it to both Joe and Bob.  In the license Joe and I agree to, I charge him $100 and make him promise not to give it to anyone else, but in the license for Bob I charge him nothing and let him distribute as many copies as he pleases.  Now imagine Steve comes along; it's illegal for Joe to give him a copy, and I can sue him into next week if he tries, however Bob can give him a copy for free, and yet it's the same damn software.  Just because person 'A' got it for free, doesn't mean he can give it to you for free.  Moreover, just because person 'A' paid a high price for the software doesn't necessarily mean he is restricted from distributing it to a friend.  You can do anything you want as long as the license says its okay.

All this is fine for a stand-alone program, but things start to get incredibly hazy when you start talking about software components.  Remember, software is normally built out of other software, which means an inherent part of purchasing it is to make copies and sell it.  This is, of course, exactly what copyright is trying to prevent.

Let's say I write a library of functions that can search strings in constant time, and I want to sell it and make a shitzillion dollars.  On the one hand, I obviously want copyright to apply, since I don't want customers making bootleg copies and selling them to friends.  But on the other hand, the idea is that the customer will use that code within the context or a larger program, and sell it to friends.  Copying and selling my work is both prohibited and required at the same time!

Now the license needs some sort of complex language like "It's okay to copy this code and distribute it so long as it's in binary form and within some other program in binary form, but not otherwise".  Unless, of course, we are using dynamic linking, which only makes matters worse.  The programmer I sell my DLL to has to be free to distribute it as part of his program (or else it won't work), but just by being present on the customers system means it's there for other programs to use.  So now, someone can purchase the software that comes with my DLL, and write software that uses it without paying me!  He can legally sell his part of the code that uses my DLL, but not the DLL itself, and just hope the other user has my DLL installed from something else.

So just use a free license, right?  Well, not so fast.  If using a freely-licensed software means you can't sell it, all the billion dollar companies who are actually trying to make money will probably not use your library.  And programmer's, being creatures of habit, will stick to the same tools they are familiar with, and your free software will just live forever in the niche underworld.  So now you create two licenses: you can freely use libraries so long as you don't sell the end product, or you can pay (usually through the nose) for a license that does allow you to sell your work.  Same library, two different sets of rules! 

Perhaps the most interesting part about EULA's, free or restrictive, is that nobody is really quite sure if they are even legal.  People have the tendency to assume that contracts are enforced like an episode of Law & Order, where things are very black and white, the special guest star is always guilty, and the dad from Dirty Dancing always has a smart quip.  In reality, there are no cops, no juries, and for that matter not even really any laws.  The judge decides what's right and wrong in a totally subjective way, and most of that is based on which lawyer has the nicer tie.  Even worse, for every court case where a judge has upheld the verdict you want, there is another court case somewhere else where another judge has struck it down.  Moreover, just because the company doesn't feel like paying the lawyer with the nice tie $1000 an hour to sue the 30-year old pirating software in his parents basement, doesn't make it any more or less legal. 

Somewhat ironically, the whole licensing point is now moot, because way back in 1990, Congress passed the Software Rental Amendments Act of 1990, which did for software what the 1984 amendment did for cassettes: you can no longer rent software (except for cartridge-style video games).  So the whole idea of finding a loophole to prevent people from renting software is now antiquated, and frankly so is the idea of a loophole to combat the loophole.

So the point of all this is that you better read your license, and more importantly make sure the vendor you buy from reads the licenses too.  You will be surprised by what it says, and there's a pretty good chance you're breaking the law.  This, in turn, leads to all sorts of interesting mind-blowing legal conundrums, where free code under a proprietary license  (i.e. example code distributed with an SDK), has been erroneously included in a public domain library, that was legally included in a GPL library, that has been erroneously included in a proprietary library, that was sold to someone to use in a proprietary program!  That's a true story, with an unhappy ending.

Or, on the other hand, if you are not going to read your license, make damn sure your lawyer has the nicest tie in the courtroom.

Saturday, July 13, 2013

Abstraction Extraction


Thirty years ago Fred Brooks wrote "The best way to attack the essence of building software is not to build it at all", but for decades, reuse has stubbornly refused to graft to the software lifecycle.  Programmers continually reinvent the wheel, or even worse, apply an open-source style of "copy and paste reuse" by tweaking what others have done (or sometimes just copying it verbatim), calling it 'new', and duplicating the workload for everyone.  Software cost is proportional to lines of code, yet we all wrong-headedly reward programmers for the amount of code they add!

For a long time, the concept of "object orientation" was heralded as the solution to this problem.  Programmers naively bought OOP languages, and some were shocked and appalled when their original problems remained (and often exacerbated).  A backlash inevitably occurred, and now programmers are either brainwashed cult members of a failed technology, or aged and lazy moldy-figs who refuse to embrace change out of fear and laziness.

Here's the rub: OOP represents about a shitzillion different programming concepts, all mashed together under one catch-all buzzword.  Ten different programmers will claim they use OOP, all have completely different styles, and all be technically correct.  But "technically" using OOP, especially in this piecemeal fashion, doesn't net you the massive, oversold gains that people have been promising for decades.  In fact, most programmers sprinkle a little OOP here, a few objects there, come up with a program that is no better than the original procedural program (and often much, much worse), and deride OOP for failing.

But used properly, OOP is great; however, almost nobody uses it right.  In fact, many programmers aren't even sure what it is, apart from putting your arguments before your subprograms.  Most definitions say that object oriented programming is programming oriented around objects, or some other form of self-referential nonsense.  Even "good" definitions can't boil it down any further than a list of no less than five different concepts, each a confusing buzzword in its own right.

And so for the most part, programmers are just confused.  They use OOP technologies to create obfuscated procedural programs that cost more but don't do more, and procedural programmers laugh at them behind their back.  A good example of this confusion is the Perl CGI.PM module.  It's often advertised as having "both" styles of programming: you can call it as a normal procedure:

print header;

or as a fancy object:

print $q->header;

As if these were any different at all!  If anything, the second one is worse, since it accomplishes the same exact thing in a less readable, more confusing way.  Yet time after time, people read that "OOP is better", and so programmers use the second method, and assume that their programs are now 'better' because they used OOP.  If all the great men of history had beards, does growing a beard make you a great man?  Then why would simply putting your variable before the function call instead of an argument make your program great?

So I will try to simply things:

OOP is programming by abstraction.

It has nothing to do with syntax, inheritance, dynamic dispatch, private data types, or anything else random books and tutorials have told you.  You can meet all the criteria listed on the Wikipedia page, but have a crappy, non-OOP program.  Or, on the other hand, you can have a strictly procedural program without any of the OOP features at all, and still have it be highly abstract and 'object oriented'.

And so the first conclusion we can draw from this, however counterintuitive, is that using OOP doesn't require an OOP language.  Abstraction is a style of programming, not a syntax, and you can achieve that in anything from Java to C to Ada83 to Ada2012 to assembly language.  The only difference is that some languages have different degrees of syntax support to help automate some of the more common techniques.  But somewhere along the line, this concept of abstraction got lost in the shuffle.

Historically, the problem has been pundits trying to carefully cut narrow lines around what features are on the imaginary checklist a language needs to meet to be classified as OOP.  Such a list doesn't exist, yet people still try and accuse 'their' language of not needing feature 'x' to be object-oriented, while claiming some 'other' language requires feature 'y'.  For instance, many Ada proponents say Ada83 was object oriented, dynamic dispatch be damned, since it had the private data types that C lacked.  Others derided Ada95's lack of 'postfix' notation as being non-OOP.

But these arguments miss the point: all OOP features, from private data types to postfix notation to dynamic dispatch, are just syntax.  It's the abstraction that matters, and there are no points for style.  A god-awful mess in the purist, most elegant OOP language around is still a god-awful mess, whereas an robust, abstract program written in assembly language is still a robust, abstract program.  The key is that using a so-called "OOP" language usually makes writing good, abstract programs easier (but certainly in no way prevents you from writing garbage).

So then, how do we achieve this wonderful abstraction?  Perhaps we should start by seeing what "abstract" really means:

abstract
adj [ˈæbstrækt]
     1. having no reference to material objects or specific examples; not concrete

Or, in other words, something that is 'abstract' is a general idea or concept, and something that is 'concrete' is a specific mechanism or item that achieves an abstract concept.  For instance, 'having fun' is an abstract idea.  'Banging a girl' is a concrete action that achieves that abstract idea (well, sometimes).

But right off that bat, we can see that one person's concrete action is another person's abstract idea, and vice-versa.  Before we said 'having fun' was an abstract idea, but that itself could be considered a concrete action of the abstract idea of "how to spend a Friday night" (along with "drinking alone", "watching TV" or "reading a book").  Or, on the other hand, "banging a girl" could be considered an abstract idea that encompasses the concrete actions of "rough sex", "kinky sex", "date rape", and so on.

So then what does this have to do with computer programming?   Why should we make our programs "abstract" at all?  What does it buy us?  Why even bother?

The key here is change.  Writing some code is just the start of the (professional) programming lifecycle, that normally involves fixing, updating, tweaking, enhancing, and (hopefully) reusing that code for the next twenty or even thirty years.  There is no such thing as "throwaway" code; there is only shitty code that gets written with the hope it will be thrown away, but ends up getting hacked up and reused over and over, much to the chagrin of the original programmer.

And changing code is the ultimate programming sin.  Every time you change code, even if it's just one character in one file, you open yourself up to all sorts of programming woe:
  • Is it still going to work?
  • Will all the other units still work?
  • Will we introduce new bugs that will have to get fixed later?
  • How long will it take us to even be able to understand the code to change it?
  • Will there be other 'problem areas' we will have to fix once we open it up?
  • Is the new change backwards compatible, or will we need separate versioning? 
  • Will we have to retest the entire program?
  • How much overhead (reviews, TPS reports, paperwork) will it take?
Of course, there are plenty of programming situations in which you are not beholden to any of this.  That cute iPhone game that will be obsolete in a week probably won't see much reuse, or justify too many bug fixes.  But if you are writing the avionics code for a Boeing 777, even the smallest change will set off a ripple effect of weeks worth of effort.

As a more hand-on example, let's look a trivial program that prints random numbers:

void F (int N)
{
  for (int i=0; i<N; i++) printf("%f\n", rand());
}

Easy enough, no?  But if we examine it closely, we can see we've played fast and loose with our requirements.  "Print random numbers" is an abstract concept for sure, but that's not what this program is doing.  In actuality, we are printing random numbers uniformly distributed between 0 and 1, because that's what rand() does.  That's a concrete implementation of our abstract concept.

This means if our requirement changes, we have to change this code, and changing code is a no-no.  What if we want to print number between 1 and 100?  Or exponentially distributed?  Or normally distributed between 0 and 500?  We don't want to have to keep changing this unit over and over, so we need to make it abstract, by 'removing all references to specifics'.  We can do this quite easily using a function pointer:

void F (int N, float (*rng)(void))
{
   for (int i=0; i<N; i++) printf("%f\n", (*rng)());
}

So now, instead of 'hardcoding' the function that generates our random number, we simply pass it to our subprogram as an argument.  Now the program is abstract, because it "prints random numbers".  If we want uniformly distributed numbers, we can pass in rand(), or we can pass in other (custom) functions that generate different ranges with different distributions.  Note that no matter how many times the requirements change, this code never has to.  And code that doesn't change is code that never breaks.

Of course, using parameters to provide general behavior instead of hardcoding fixed functionality is nothing shocking, and certainly doesn't qualify as OOP.  In fact, the above code is unlikely to be of any practical use because of one tricky problem: functions normally need data (i.e. arguments), and our abstract subprogram doesn't have them to give (by design).

Note that above, our function pointer is to a subprogram taking zero arguments.  It follows from this that any random generator we want to use must also take zero arguments (or, more generally, they all must take the same arguments), which is unlikely to be the case.  An exponential random generator will need at least some 'lamda' value, more exotic things (power tail, normal, etc) will need several more, and if we want to generate them between a set range we will certain have to supply that as well.  But we can't pass them in as arguments, since the point is to specifically not know which generator we are using!

The solution to this problem is, of course, to abstract the arguments.  We want to create a record structure that is a 'black box' of arguments that come pre-filled in, and pass that as a whole to the function, which will know how to interpret them.  Now our subprogram looks like this:

void F (int N, float (*rng)(rngParms*), rngParms* p)
{
   for (int i=0; i<N; i++) printf("%f\n", (*rng)(p));
}

We pass in not only a pointer to the function we want to call, but the parameters we want to send to it.  Note that our program is still completely abstract, since we are not looking inside that black box of arguments.

The declaration of the rngParms type becomes tricky.  It needs to be variant, so that it can hold all the necessary parameters for any of the random generators.  We can do this by declaring different structures for each generator, and making a composite union.

struct Exponential_Parms
{
   float lambda;
}

struct Normal_Parms
{
   float lambda;
   float variance;
   float start;
   float end;
}

union rngParms
{
   Exponential_Parms e;
   Normal_Parms n;
}

Each function will be written to use the arguments it needs, while ignoring the others.  The calling function picks the arguments to use and the pointer to pass them too (which obviously must match), and the program remains indifferent towards all.

You've probably noticed that our trivial little program is starting to get much less trivial.  Pointers, complex unions, and indirect functions calls, and all we are doing is "printing random numbers"!  But this all goes back to reusability.  It was a good deal more work now, but think about how much more powerful this function is.  It's essentially "future-proofed".  If we need another set of random numbers in the future, this function never changes.  From now till the end of days, this file stays checked into the source tree, can be reused across any project, verbatim.  In fact, we never even have to recompile the damn thing!

So the real question becomes, is this OOP?  Most would say no, it's just good engineering design using a procedural language.  But that question misses the point.  OOP is just a syntax, a way to help automate good engineering design.  Lots of languages have special features so that passing around these pointers and variant records is much easier on you, but that's just icing on the cake.  Arguing about whether code is "object oriented" or "procedural" is like arguing if code is "Java" or "PHP".  There is no implication of quality, it simply is or is not.  But like we just saw, even derelict C code can be more abstract than most of the crummiest .NET code.  Code is either good or bad, and the way to tell if its bad is by having to change it.

Because at the end of the day, the language you use should just be personal preference.  After all, good code never has to be changed, so the only person that suffers by using a crappy language is you.  With abstract code, nobody ever has to go in and look at it (let alone change it), so nobody cares whether your brackets are in line or on their own line.  If you use a shitty untyped language, that's just more time you had to spend debugging and getting it right the first time, instead of drinking on your porch and judging the relative puffiness of clouds.

Of course, if you are writing bad code, then people will have to open it up endlessly to keep changing it.  Then it becomes an issue of where your brackets are, or how you indent, or whether the new guy out of college knows the language you used, or how you decipher the sparse comments that dyslexic coworker wrote, and things get much more complex.  The natural response is that "everything should be rewritten in that language I like", which makes about as much sense as saying you should buy new Snap-On wrenches to help rebuild your car's engine faster, since it seizes so often because you don't put oil in it!  You don't want to be able to fix it faster, you want it not to break in the first place!

So whichever side of the OOP fence you lie on or what language you use, take a long hard look at your code.  Is it abstract?  If I decide to change some arbitrary requirement, will I have to change the code at all?  If the answer is yes, then it doesn't matter what language you used:  It's abstraction that matters, not language.

Wednesday, May 29, 2013

The Vindication of Joel Spolsky

A long time ago, my friends I would often frequent bars and clubs with one express purpose: getting completely black-out drunk and going home with some easily-duped girl.  As it is when you are a group of twenty-something engineers going to a club, usually one of either two things happen: you don't get admitted at all, or, if you do, you are escorted out before the end of night due to an unacceptable B.A.C.  Ah, it was the blurst of times...

Around the same time, in one of the more notable posts by one of the more notable software bloggers, Joel Spolsky explained his stated policy to "never throw an exception of [his] own".  Those words were almost immediately pounced upon by programming world at large, some going as far to muse that perhaps his site had been hacked or he was under the influence of some sort of narcotic while writing.  He compared them unfavorably to a goto statement, and they compared him to an out-of-date stick-in-the-mud.  And the war between exceptions and status codes raged on.

This is sad, because it shows just how completely mystified software developers are when it comes to error handling. No, not just Mr. Spolsky, and no, not just the blogosphere, but pretty much everybody on both sides of the war.  It's no wonder why, after reading that post, PC software is mostly garbage.  Sadly, both groups are dead wrong about everything.  The correct answer when someone asks you if you prefer exceptions or status codes is neither.

Ironically, had all programmers involved spent more time indulging in vice instead of waxing pedantic about error handling mechanisms, the world would be a much better place.  Fewer bugs, more features, happier customers, happier programmers, happier managers, and more regretful young women trying to sneak out the next morning.  Everybody wins, except of course the regretful young women.

If you go all the way back to 1976 and read the seminal paper on exception handling [Exception Handling: Issues and a Proposed Notation], it groups the ways in which a subprogram can fail into two basic categories: Domain failures and Range failures.  A domain failure occurs when "an operation's inputs fail to pass certain tests of acceptability", whereas a range failure occurs when "an operation either finds it is unable to satisfy its output assertion, or decides it may not ever be able to satisfy its output assertion".  In the modern lexicon, we normally call them preconditions (conditions that must be met for the subprogram to start) and postconditions (things that must be true for the subprogram to have completely successfully).

Those are sort of abstract definitions for our purposes, so let's put things into a more relatable sense.  For me and my reprobate friends, the operation is, of course, Get_Laid.  The precondition is Admitted_To_Club, and the postcondition is P_In_V. The function body would look something like this pseudo-code:

begin
   while Is_Sober loop
      Consume_Martini;
   end loop;

   while Rejected loop
      Talk_To_Girl;
      Consume_Martini;
   end loop;

   Take_Girl_Home;
end;

Easy enough, right?  But anyone who's tried this particular approach knows it's an idealized version of a very complex task.  As mentioned before, we often either get stopped by the bouncer at the door, or tossed out mid-subprogram due to loud, uncouth, or generally lecherous and lascivious behavior.  If we never get into the club at all, a precondition has failed and the operation never starts at all.  If we get escorted out halfway through, a postcondition cannot be met and the operation ends early.

But what brought death into our programming World, and all our woe, is that the difference between domain failures and range failures is in the eye of beholder.  If we don't get admitted the club, then clearly we are not getting laid tonight, the output assertion cannot be met, and as such could be considered a range failure.  Similarly, we might say that staying reasonably sober is a precondition to going out, and that someone getting ejected means a domain failure has occurred.

Take, for instance, the venerable strtol() function, which converts a given string to its integer representation.  This has the obvious requirement that the supplied string represent a valid in-range integer for the subprogram to convert.  So, supposing that the client supplies a string that doesn't represent an integer, do we consider this a domain failure or a range failure?
  
On the one hand, you can take the overly-narrow view that an interface contract is something defined exclusively by the language.  The function declaration requires a const char *, and that's all there is to it (or, as a coworker once argued, "anything else is a comment, not a contract").  So long as the client supplies a valid pointer to an array of characters, the interface has been pedantically met, and the input assertions have passed.  If the string cannot be converted, you've got yourself a range failure.

Though, on the other hand, you could take the overly-broad view that an interface contract is something conceptual defined by the developer.  The domain of the function is not simply an array of characters, but characters representing a valid integer.  A language that is not expressive enough to establish this condition (apart from a comment in the header) is a failure of the language, not the code.  If the string cannot be converted, it's because a precondition wasn't met, and you've got yourself a domain failure.

But don't think for a moment that this is just a semantic debate.  Depending on your point of view, the same exact failure has completely opposite ramifications.
  
The consequences of a domain failure are quite simple: there is a programming error in the calling code.  It was the client programmer's responsibility to ensure the condition was met before the call, but that wasn't done, and so the calling subprogram is at fault and must be fixed.  Simply put, there is a bug in the calling subprogram.  Unless your code has become self-aware, the only thing to do about it is crack open the source, fix the flawed subprogram, recompile it, and redistribute it.  This means reporting the issue back to the caller is, generally speaking, useless.  After all, we know the code cannot be trusted to properly verify its inputs, so we certainly can't expect it to properly verify the outputs (though you would be shocked how often programmers expect this...).
  
The consequence of an output failure is a horse of a much different color: the subprogram failed because of some condition that could not be detected until the middle of the call itself.  In this case, the caller doesn't know the condition exists yet, and so it must be reported back so they can evaluate the failure and resolve it.  This is not necessarily a bug in the code, since presumably the calling subprogram can (potentially) fix the problem and try the operation again.  We are essentially returning a status back to the caller, which may be good, bad, or indifferent: the onus is on them to decide what to do.  The form of this status can be either a return value, an exception, an ON ERROR handler, a 'long jump', or any number of other forms.  The point is that we must alert the caller somehow so he may take corrective action.

And this is where things get very, very complex.  Suppose, as before, that someone passes a string to strtol that doesn't represent an integer.  Is that a bug in the calling code?  Or is that a condition that has to be reported back to them so they can fix it?  Is the program executing as expected, or do we have undefined behavior?  Are things ok or not?  How is it that the same condition can mean two totally opposite and opposing things?!

So we start to see two competing methodologies emerge.  We've got the pessimistic "assert style" domain failures, that should never occur in a properly written program and, if they do, indicate a static programming error that can never be handled at runtime ('handled' in the sense of 'fixed').  Then we've got the optimistic "status style" range errors, which involve returning information to the client (in some unspecified form) so that he can take corrective action and attempt the operation again.  And back in the primitive days of C, this was how programs were written: assert macros checked for domain errors and status codes indicated range errors.

But assert macros had the unpleasant side effect of dumping the entire program, all at once.  No warning, no chance to clean up, no ability to continue other tasks or log errors, continue in a reduced functionality mode, or event present an apologetic message to the user.  It just ends.  And while it's true that a domain error can never be fixed at runtime, we still might want to carefully avoid the error and keep other isolated portions of the program running, log some sort of error, continue with partial functionality, or other casualty action.

So if you are writing some general-purpose and reusable library or framework, you can't very well assert on anything since, after all, you don't know what action the application programmer wants to do.  If they are using your math library for keeping an aircraft in the air, an assert is a very bad idea.  So in the interest of generality and usability, most libraries and frameworks avoided asserts in favor of status codes.  Or, in other words, they turn all domain errors into range errors.  The intention wasn't that the client should always be able to "fix" the problem, the intention is to let the client decide the appropriate action which, in most cases, is just to assert.

But if the road to hell is paved with good intentions, it's also lined with status codes.  For whatever the reason, most programmers just followed the precedent set in the libraries, and kept returning status codes within their own programs.  Maybe it was from hasty, unplanned attempts at component-based development.  Maybe it was just cargo-cult mimicking of what other software already did.  Maybe it really was an honest attempt to write reliable code.  But in any case, the idea of a "domain error" became all but extinct.

So when C++ added exceptions, most just figured it was a better way to return a status code.  After all, one of annoying parts of using status codes is that you use up your one return value.  And low and behold, an exception lets you return both a status and a result, with a minimum of fuss.

But they were wrong.  The proper use of an exception is not for returning range errors, but a structured mechanism for returning domain errors.  It's like an assert with some semblance of control.  It's one (small) step up from a crash.  It's a better way to end a malfunctioning program, emphasis on malfunctioningAn exception is just a way to shut down a broke program.

The key to creating safe, bug-free programs that work is to reverse this trend: turn all range errors back into domain errors.  Looking for a better way to return status information to the client misses the forest for the trees: eliminate the need to return the information. The tricky part about this, however, is that it's not just some syntax icing to smear over your already-written subprogram: it normally involves a complete redesign of the program in question, if not the entire architecture of the system.

Let's revisit the code for our night of debauchery from above, and refactor it a little bit.  Currently, we are considering "getting thrown out of the club for being drunk and disorderly" as a range error.  The question is not "how do we report this?", but instead "how do we stop this from happening?"  The obvious way is the have a designated sober person to keep the alcohol intake of the rest of the group in check, so we can simply add a precondition to the start of the subprogram:

if not Group_Has_Designated_Chaperone then
    raise NO_CHAPERONE;
end if;

(Note, of course, these days you ought to use the 'Pre' aspect, but I write it like this to accommodate those still unfortunate enough to deal with C++.)

Note that we are still raising an exception, but it means something entirely different.  One of the preconditions for getting laid is to make sure the group has a designated sober person before we go out.  So long as this is true, there is no chance of us getting kicked out for being drunk, and everyone gets laid.

And no matter how many reasons you can come up with for not getting laid, there is always a way to avoid the situation.  Maybe the girls don't like our clothes?  Let's add a precondition of Dressed_To_The_Nines.  Maybe we can't find girls to talk to?  We have a precondition of Club_Has_Fat_Chicks.  Just keep adding preconditions until there is no possible way that the subprogram can fail to produce a result.  Our post condition (P_In_V) is always met!

Or, in other words, write a subprogram that can't ever possibly fail, so long as its preconditions are met.  Error handling is a proactive process, not a reactive one.  Engineering, software or otherwise, is about not having exceptional conditions.  It's about the system doing exactly what you expect, and nothing else.  A properly coded program should raise an exception at every possible point, but never ever have it happen.

This makes the error-handling debate a moot point, since there won't be any more errors to handle.  Once the client calls Get_Laid (with the appropriate conditions met), there is never any chance of it failing.  We can just call the subprogram, confident we will always get laid, and forget all about checking to make sure it did.  Moreover, if it does fail, recall from above that this indicates a static programming error, either in the client or the server, and so the only appropriate thing to do is clean up and end the program (or task, etc), which an unhandled exception will do for us automatically.

For a more practical example, consider the following code abortion:

begin
   loop
      read_next_byte;
      process_byte;
   end loop;
exception
   when End_Of_File => Put_Line "Success!";
end;

I suppose this is a valid way to code, but any project that tries is likely doomed.  If we want to read a file till its end, then reaching the end is clearly not an exceptional condition!  It's what's supposed to happen!  It's a required condition!  Contrast it with the following:

while not Is_EOF loop
   read_next_byte;
   process_byte;
end loop;
Put_Line("Success!");

Remember the difference between domain errors and range errors?  In the first case, reading past the end of the file is an ambiguous range error; it could mean the successful completion of the program, or it could mean we are stuck in an infinite loop because of some coding error.  In the second case, however, reading past EOF is always bug.  The task needs to shut down immediately, because the program is broken, and exceptions not only shut things down but tell you the line number of why.  It's as if the compiler is doing the debugging for you!

It's at this point that someone always tries to point out a situation where this isn't possible, and where you just have to use an exception.  These people are wrong.  That's not to say there are conditions where it's more convenient to use an exception, or where you can rationalize using an exception, or where you are forced to use an exception to interface with some other component, but that's a far cry from it being impossible.

A common one is a heap allocation. You use new to allocate a chunk of memory, but if there isn't enough room, you get a Storage_Error.  There's no way to see if there is enough room on the heap before you make the call, so apparently we have to return this to the caller to let him deal with it after the fact.

But that's not really true, is it?  The default heap has no mechanism to ascertain the remaining amount of space, but who says we have to use the default heap?  Is it impossible to write our own user-defined storage pool and add a Remaining_Bytes subprogram so that clients can ensure there is enough room before trying the allocation?  Not exactly easy, but then again who said writing safe code was easy?

Another supposed example is multithreading.  If we want to have multiple threads access our fancy new custom heap, adding the requirement to have enough room creates a nasty race condition: two threads both pass the precondition, and then one of them gets the last remaining few bytes, while the other fails.  (Note, however, that this is exactly what is supposed to happen: the exception alerts us to a static programming error as soon as it happens, instead of trying to diagnose the race condition by hand).  But this simply means our fancy heap must include a Lock_Heap and Unlock_Heap subprogram.

So yes, all this means that instead of writing this:

x := new Integer'(42);

it balloons into this:

Lock_Heap;
if Remaining_Bytes > Integer'Size/8 then
   x := new Integer'(42);
else
   Do_Something_Else;
end if;
Unlock_Heap;

Like before, an exception in the first case is ambiguous.  Does a Storage_Error indicate a misbehaving memory leak that should never happen, or is it a "successful" detection of a low memory condition?  In the first case, there's no way to tell.  Sometimes it's success, sometimes it's failure.  But it's a moot point in the second, because it never happens.  We don't react to low memory conditions, we prevent them before they happen.

So was Joel Spolsky right?  Far from it.  He's just a pragmatic programmer, fed up with decades of C++ programmers crying wolf, trying to fake success by misusing a mechanism designed to indicate failure.  But like he said, you do need to be able to read the code.  Exceptions are invisible, they do obfuscate the control flow, and they are a precarious way to return information to the user.  They are just like a goto, but that's okay, because the only place it's supposed to go to is directly to the end

So yes, use exceptions.  Throw everywhere you can, but don't ever catch. You'll be surprised by not only how reliable your programs start to become, but how much more successful you can be at picking up chicks.  And really, isn't that what it's all about?