Wednesday, May 27, 2015

Orderly Conduct


In the preceding post, we looked at all the various problems that make initializing all the "global state" of a program particularly tricky.  In essence, packages have to be initialized ("elaborated") before they can be accessed, but doing so may access code or data in other packages, which obviously have to be done prior.  The compiler can't do it all for you, so you have to work together to ensure you select an order that works.

Luckily, figuring all this out is not altogether difficult (or, at least, doesn't have to be).  Earlier we saw all the reasons why the compiler couldn't do this itself; it basically boiled down to lots of esoteric edge cases that could happen, but which don't often happen.  But since the LRM can't just decide to ignore the inconvenient edge cases, we are all stuck having to deal with the fallout, even for problems we never encounter.

But what if we had some mechanism to indicate to the compiler that our program didn't go in for any of these mutually recursive elaboration time dependency shenanigans?  By promising the compiler that all those complex edge cases can't occur in our code (and, of course, letting to compiler reject our code if we try), all the problems we spent so much time discussing suddenly vanish.

And so the easiest way through the elaboration swamp is around it.  That is, subject our units to more rigorous and restrictive checks (that are acceptable most of the time), and nipping elaboration problems in the bud. Of course, there still has to be a way to get something correct if you do need to utilize the complicated edge cases, so Ada arms you (get it?) with an arsenal of pragmas to accomplish both.

At the most basic level, every package can be classified into one of two fundamental groups:
  1. Units that don't contain any elaboration time code.
  2. Units that do.
We saw from before that only select things actually cause dynamic, arbitrary code to be executed during elaboration, notably initialization of 'module level' objects via functions calls from other units (there are of course others, but we restrict ourselves to objects for simplicity).  If a unit, for example, doesn't contain any module level objects at all, then there can't be any elaboration problems since it's not actually elaborating any code.

Now if we have two of these units that each have no elaboration order dependencies, it's clear that we can elaborate them in any order we please.  If neither A nor B have global variables that reference each other for their initialization, then I can just flip a coin when it comes time to pick which package goes first, because there's nothing in either that can fail.

Furthermore, if I take a third unit without any elaboration dependencies and add it into the mix, I again can pick any order I want.  A-B-C is just as good as C-B-A, as is any of the other permutations, since none of these units have any elaboration code that could fail.  Following this line of thought, given any 'N' number of units, none of which have any constructs that might cause access before elaboration, I can elaborate them in any order I please.

These types of units are classified in Ada as the surprisingly difficult to pronounce name preelaborable (pre-e-lab-or-a-bull).  Any unit that abides by a few select restrictions (e.g. global variables initialized from function calls) can be marked as being preelaborable, which places it squarely into the first group of units and essentially informs the compiler that the unit has zero elaboration time dependencies.  Consequently, the compiler forces all these units to the front of the elaboration line, and effectively 'flips a coin' as to what order they get done within that group, since they can never fail, and can even avoid having to check that they didn't.

As a general rule of thumb, there's rarely a good reason why your unit shouldn't be preelaborable.  The list of restrictions is actually fairly short:
  1. No package body initialization portion
  2. No module-level objects initialized from functions (or by default)
  3. Preelaborable units can only depend on other Preelaborable units.
And that's about it.  There are of course exceptions and caveats to these rules, and you can read about them in the LRM if you feel so inclined, but there's a very good chance that so long as you are following basic programming etiquette (i.e. packages provides types and operations that act on those types), most of your units should abide by these rules already.  That being the case, you can mark all your units with the preelaborate pragma and call it a day, confident that your program can never fail to start.

Before moving onto the next group of units, there are two other pragmas associated with preelaborable units that are worth discussing: Preelaborable_Initialization and Pure.

The first concerns private types.  We said before that one of the rules is that an object can't be "dynamically" initialized (no function calls, no other objects, no default initialization, etc); for example, if some module-level object was a controlled type, then its default initialization (or that of any of its components) would call its Initialize procedure, which puts us right back into the realm of arbitrary elaboration time code.  On the other hand, if a record is just a bunch of integers or other discrete types, then just leaving it uninitialized doesn't cause any harm.

But if our record type is private, we are in trouble; we can't see if that record is a controlled type (or perhaps contains other controlled types), or if it's just full of integers, or for that matter even if it's a null record with nothing.  We can't assume anything about a private type, so we must assume the worst.  We would have to assume that any private type is potentially trouble, and thus prohibit any preelaborable unit from having an uninitialized object of a private type.

But this is aggravating, since much of the time records are just discrete types with no elaboration concerns.  So to help mitigate this, the Preelaborable_Initialization pragma is available to allow a package to specify that the default initialization of private type does not have elaboration order concerns (this is in contrast to most other elaboration pragmas, which apply to packages as a whole).  With this applied, other units can have uninitialized objects of the private type and still stay preelaborable.

The second pragma, Pure, imposes all the same restrictions as Preelaborate, plus several more; call it preelaborate on steroids.  A pure unit has absolutely no saved state, which is important because it means that the procedures it contains are "true" functions (in the mathematical sense); i.e. the same inputs always supply the same exact outputs.  From an elaboration standpoint there's no difference, but it does allow the compiler to make certain important optimizations it couldn't otherwise.  For instance, if a 'sine' function is declared in a Pure package, the compiler knows that sin(0) will always be zero, no other side effects could occur, and thus cache the result for reuse.  Otherwise, it has to assume that the function might do something nefarious (print to the screen, log to a file, etc), and make the same call every time.

In any case, take together, these three pragmas create a closed group of preelaborable units, none of which have any elaboration order concerns, and all which depend only on other units in the group.  The compiler can go through and elaborate them all first in whatever way it wants, and even forgo the check to ensure it's correct.

But what about units that don't meet the criteria of being preelaborable?

What if, like before, we want to initialize a object in the body of 'A' to a value returned by a function in 'B'?  We are not preelaborable, but then again we aren't doing anything particularly egregious.  In this case, our previously hypothesized "tweak" of the LRM, that is to elaborate the spec directly before it's body, instead of just some point before its body, would be completely acceptable.  B's spec would have to come before A's body (because of the with clause), but we would also have to do B's body directly after B's spec (for an order of <B> [B] [A]), and all is well.

Ada actually has a pragma that essentially achieves this, but on a package-by-package basis: Elaborate_Body.  When applied to a package, the compiler ensures that the body of a unit is evaluated directly after the spec, without anything else in-between.  Applying this to packages gives you a nice, neat, orderly elaboration of a body right after its spec, so you can be confident that if you 'with' in a package marked as elaborate body, its body will be there when your package is elaborated.

All this leads to the general "rule of thumb" specified in the Ada 95 Rationale (et al): All packages should be marked as either Pure, Preelaborate, or Elaborate_Body, in that order of preference.

These impose decreasing levels of restrictions on units, but also an increasing chance of elaboration problems.  However, for the vast majority of the time, supposing you don't do anything fancy, these three pragmas will give you what you need.

Which begs one last question: what if we do want to do something "fancy"?

Consider the following code:

package A is
  one : integer := 1;  -- no dependencies
  function X return integer;
end A:

with B;
package body A is

  three : integer := B.two + 1; -- dependency on <B>

  function X return integer is
  begin
    return three;
  end X;

end A;

with A;
package B is
  two: integer := A.one + 1;  -- dependency on <A>
  ...(other stuff to make a body legal)
end B;

package body B is
  four : integer := A.X + 1;
  ... 
end B;

Now we have big fun.  Note the following dependencies:
  • The spec of A depends on nothing
  • The spec of B depends on the spec of A (via 'one')
  • The body of A depends the spec of B (via 'two')
  • The body of B depends on the spec and body of A (via A.X)
Perhaps most surprising is that this obviously convoluted code is, in fact, legal.  But we have a problem, because there is only a single elaboration order that will work:

<A> - <B> - [A] - [B]

That is, we need 'one' to exist so we can create 'two', which has to exist so we can make 'three', which has to exist so we can return it from A.X to create 'four'.

But none of our rules of thumb work for this.  We are clearly not preelaborate, but we also can't elaborate the bodies directly after the spec!  Now we have the dreaded edge case: multiple legal elaboration orders, not all of which are correct, that we must specify by hand.  To do so, we have two more pragmas:

Elaborate
Elaborate_All

Unlike the previous pragmas, which the programmer applied to the package he was creating, these pragmas are put amongst the with statements to apply to units he's referencing.  They ensure that the unit called out in the pragma is elaborated before the current unit, such as:

with A;
pragma Elaborate(A);
package body B is....

This instructs the compiler that it must select an order in which the body of A is elaborated before the current unit (B).  Given that small addition, the compiler now has the additional requirement that [A] must come before [B], which along with the original rules gives us a legal (albeit strange) program. (Note that in the above example, you would have to add another seemingly redundant with clause to [B]).

But from a practical standpoint, this is tougher than it looks.  Sure, we can go through and add Elaborate pragmas everywhere, but most real code is far more complex and contains many more units.  What if, in the above, A.X called out to other units doing other things, which themselves called other things, and so on?  This has massive scalability problems.

We aim to put software together from reusable components, so often we can't change package A (i.e. it's a COTS library).  Plus, this violates our inherent sense of encapsulation, because we shouldn't need to peek into the body of A and start mucking with things based on what we see.  But there's not much we can do, since the person writing the body for A can't possibly know that sometime in the future, some errant unit B was going to call it's function at elaboration time, instead of at run time.  And what if A calls a procedure in C that calls something from D?  Must we open up every single unit in the entire call tree?

For these reasons, Ada95 added the "Elaborate_All" pragma, which is essentially a recursive form of Elaborate; instead of just elaborating the body of the unit you specify, it elaborates that unit and all the bodies of units on which it depends, all the way down.  Now the package you depend on is a true "black box", and you can be assured that your single pragma in the client will make the entire subsystem available  (in most cases, Elaborate_All is the better choice than Elaborate, which is for the most part obsolete).

Though just because Ada has this ability doesn't necessarily mean you should utilize it.  Of course there are situations where this is desired, if not necessary, but for the most part, adding Elaborate_All pragmas is a sign of bad design.  Avoid the problem altogether and redesign the code such that your packages are Preelaborate (or at least Elaborate_Body).

But at the same time, don't just rely on GNAT's static checking crutch.  Take elaboration into account as you write the code, not simply because it's the right thing to do or because your code will get better, but because most of the time you won't find the problems until it's far too late.  Do you really want to go back and add 500 pragmas to 500 packages since it didn't occur to Joe the C Programmer that this would ever be a problem?  And before you let Joe the C Programmer slander Ada for requiring such pedantic verbosity in the first place, go and Google "static initialization order fiasco"; C++ has the same problem as Ada, except it's so problematic and unsolvable that it's even got its own cute name containing the word fiasco!

So go forth, enlightened one, and banish elaboration circularities and access elaboration exceptions back from whence they came!  Every time your program starts, a slight smile should creep to your lips, since now you know it's no accident how it happened.

No comments:

Post a Comment