Thursday, August 23, 2012

Everything you learned about programming is wrong

Pop quiz, hotshot: How big is an integer?

If you came of age anytime in the last thirty years and did even the least bit of programming, this is likely ingrained into your skull alongside Star Wars quotes, your high school locker combination, and the invulnerability code for DOOM (iddqd).  The sky is blue, water is wet, women have secrets, and an integer ranges from +/- 2.2 billion (give or take).

Okay, now for the harder question: why?

Ever since I can remember, this was a nonsense question.  The first thing you did when you learned a programming language was find the page of the manual that described the primitive data types: the building blocks of the language.  You then represented your data within these types, created larger, composite types out of them, fashioned algorithms around them, and you didn't ask silly questions about why the ranges were what they were.  There is no why!

Pretty much every single language I ever used was like this.  C/C++ is like this.  Java is like this.  Assembly language like this.  SQL is like this.  Scheme is like this.  Hell, even GW-freaking-BASIC is like this.  This is just how programming is done.  You get three basic integral types (byte, word, dword), two floats (single and double precision), and force your program to fit into what the compiler gives you.

But consider that while variables are normally compiled into blocks of memory, it is quite likely that your program is not dealing with blocks of memory.  You are dealing with things like engine speeds, the contents of a text document, pictures, the number of bullets a video game character has left, whether a bridge is raised or lowered, a video stream, the current color of a traffic light, and an infinite amount of other "real life" things.  Our whole business is using numbers to represent things that are not numbers: why does the compiler get to dictate what those things are?

Suppose you are writing a program that monitors the speed of an engine.  If you grew up learning the languages listed above, then you would likely do something like this (assuming we use whole numbers):

int Engine_Speed;

Of course, if you are a C programmer, you are required to name the variable 'q' so that no future programmer can ever decipher or maintain your original work; but I digress.  The point is that we have integral data, so we use whatever data type is most compatible.

But this is wrong.  Not just in syntax but in concept. At a high level, we are representing engine speed.  Engine speed does not have a range of +/- 2.2 billion, like an C-style int does.  The speed of our engine is not an integer, and it never will be.  A integer is a meaningless, unitless number that represents nothing.  It's just that the compiler happens to be running on machine with a 32-bit data bus, which makes it convenient for the compiler writer since each source-level statement corresponds to a similar machine opcode.

But this "one-size-fits-all" data typing is fundamentally flawed.  You are the programmer, you should get to pick the data types!  Our engine can't spin at 2.2 billion RPMs, so in what universe does it sense to pick a data type with that range?  Because it's easier on the compiler?  Seriously?

Okay, now lets suppose we also need to represent the state of the engine.  Like any good C programmer, we also pick an int because, um, well, that's just what we've always done:

int Engine_Speed;
int Engine_Status;

(Also, again, in a real C program the status would be named 'x7' to spite those with the gall to want to read you code later).  Our status now has the same range of +/- 2.2 billion, just like speed, which clearly doesn't correctly represent our engine status of "on" or "off".  So we just sort of come to a gentleman's agreement that 'zero' will be off, because the 'o' in off looks sort of like a 0, and that anything else will be on (which also starts with an 'o', but we can't be concerned with that!  There is code to write!)

But come on!  Our engine is not 'zero' or 'non-zero', or even 'true' or 'false': it's on or off.  These are the different values our variable can have, on and off, and nothing else.

The fact that the variables we are using to represent 'things' don't correctly represent those 'things' is bad enough, but it just gets worse from there.  We are using the same type, so the compiler will tacitly assume they represent the same data.  We could, for instance, call Set_Engine_Status(Engine_Speed), which makes no sense at all, or Increase_Engine_Speed(Engine_Status), which will either stop the engine or overdrive it, depending on what arbitrary value we decided to use for the 'on' state (remember, it's any non-zero, so it could be 2.2 billion).  Both such errors (which are easily copy and pasted) would be revealed at best as subtle logic errors, or at worst as a wrecked engine.

And it's this epiphany I had when I started using Ada: everything I had learned about data types, from my days hacking GW-BASIC, to my college courses, to the early part of my professional career, was just wrong.  Bet-on-the-wrong-horse wrong.  Back-to-the-drawing-board wrong.  Fundamentally flawed.   The idea that you should have to force the hundreds (if not thousands) of different types of data your program has into the five built-in ones that the compiler shoves down your throat is a miserable way to write a program. It's like the scene in 'My Cousin Vinny': Breakfast?  You think?

And this is C's (and C++'s, and Java's, and Basic, and .NET's) real Achilles Heel.  We can argue about brackets, pointers, and the preprocessor all day long, but at the end of the day C will never be able to escape its fubar type system.  Historically, C was designed to do systems programming, i.e. Unix, along with the requisite debuggers, compilers, and what-have-you.  But that's the key: when you are doing systems programming, you are working in the problem domain of registers, blocks of memory, and CPU data buses.  If you are writing a debugger, then you do want your data types to correspond to the physical hardware.  And that's why C is still perhaps the best systems programming language.

But it's just not suitable for everyday programming, and neither is any other "general-purpose" language that sets you up with a list of built-in types to use, because my programs have more than five types of data and I'm sure yours do too.  A data type is not something the compiler gives you, it's something you give the compiler.

Which brings us to Ada:

Ada doesn't have any built-in types.

Not a single one.  Every type is a user-defined type.  When you want a variable, you first declare a type for that specifies what the data is.  For instance, in Ada, we would implement the previous example like so:

type Speed is range 0 .. 10_000;
Engine_Speed : Speed;

type Status is (On, Off);
Engine_Status : Status;


Syntax aside, the difference is that we didn't just pick an arbitrary data type that was 'close enough', or that we could rationalize by squinting our eyes hard enough. We defined our own types that are not just easier to read, but also properly ranged checked and type checked (unlike the ultimately useless typedef and enum that C provides).

The primary benefit it saves us the trouble of range checking them.  We can safely assume that the engine speed will never be 2 billion, because it can't.  If we ever try to increase the speed above 10,000 or below 0, an exception is raised and the program ends (or, less often, handles it).  If we try and do something silly at compile time, such as assign it 999,999, we will get an error because it's not appropriate.  If we mix up status and speed somewhere, we will get an error, because speed is not compatible with status, and vice-versa.

So really, why are you still fooling yourself?  Why do you still believe that the age of an employee is an 'integer', or that the cost of a widget is a 'float', or that an image is nothing but an array of chars?  These rationalizations are wrong, wrong, wrong!  It's an outdated, antiquated way of thinking from the days of assembly language.  Just because two types have values that are implemented the same in assembly language doesn't mean they are the same type up at the level that programmers need to think at.  Apples are not oranges, and shame on you for letting your compiler browbeat you into thinking they are.

This is why Ada works, and everything else doesn't.  That's why on a cross-country flight, it never occurs to you that the only thing keeping your ass in the air is software.  That's why you never have to reboot your cars engine.  And that's why if your language of choice presents you with a list of data types and says "choose", you need to turn around and find a new language.






Wednesday, August 22, 2012

Ada: It's not just a different C

Rarely does a week pass at the large, bureaucratic nightmare of a defense contractor at which I work where a coworker doesn't opine the uselessness of Ada.  "What a waste!" they claim. "Everything would be so much easier if we just used C!"

And they are dead right.

(Emphasis on dead)

See, our little group of programmers write safety-critical, avionics-type software, so of course we use Ada (which you probably already know is a programming language typically associated with safety-critical software and embedded systems).  But most of the new programmers (and even a lot of the older ones) grew up learning to program in a world full of brackets and asterisks, and so they view Ada as some sort of programming masochism.  Strict rules and extensive verbosity simply for the sake of being strict and verbose.

The thing is that while we use Ada, we use Ada wrong: we use it like C.  Back when I was still running up my parent's phone bills dialing up BBS's, a bunch of C programmers got the Ada mandate dumped in their lap and, like any good C programmer, were far too prideful to admit that maybe they didn't know best how to do something.  So they stuffed their square peg of C programming into the round hole of Ada, and a whole generation of new programmers learned Ada by their bad example.
 
Recently, after days of tie-loosened, sleeves-rolled-up debugging, we found the bug that several of us had been searching for: a simple copy/paste error.  The API for VxWorks is of course done in C, which means that when you create a queue you have to give it the size each element of that queue will be.  But we use Ada, not C, so a programmer had of course gone to the good time and trouble to create Ada wrappers around the VxWorks calls.  And, of course, they were exact duplicates of the C calls, only done in Ada.  And, of course, nobody had noticed when some errant programmer was too busy trying to bang the new college intern to remember to change the size of the queue element, and so each message was too big, and everything just silently failed.

Not unexpectedly, this revived the continuous stupid debate that we always have: we tripled our development time, because we had to write all the wrapper code and train all the programmers on how to use Ada, and still spent the man-weeks of debugging time trying to solve the problems that would have been the same had we just used C in the first place.

Clearly the problem here is not that Ada is junk, it's that we use it the wrong way.  The "Ada" way to do this is make it a generic on the type of element and build the initialization into the creation (well, the right way is to just use a protected object, but I digress).  That way, as soon as the errant programmer tried to compile the code, it would have rejected it because of a type mismatch.  And if anything goes wrong, it raises an exception so that instead of spending weeks debugging it, it prints out "Hey stupid: line 123".

And this sort of crap is all over the place (not just at my company, but at others).  Everything passed around as either a float or an integer.  Exceptions "handled" by supressing them.  Passing arrays by pointer and length.  Case-style blocks performed on tag values.  Programmers going out of their way to try and defeat every single feature and safeguard Ada provides.  And the worst part?  Without anyone to say different, it's tacitly assumed that this is the right way to program.

Whenever we have this debate, I take the opportunity to lambaste the low-quality and general lack of information about using Ada.  Google for 'Ada', and you will get the Dental, Disabled, and Diabetic associations of America before you get near the language.  What little you do find is almost always out of date, completely sterile, but most importantly suffers from the same flaw as my coworkers: it teaches you how to use Ada like C.  One of the Ada books on my shelf mentions that new-fangled "type system" near the back, as an "advanced feature" not really germane to the subject, but is more than happy to demonstrate the Integer type in chapter one.  Really?  Really?

So should we really be surprised that nobody uses Ada?  Books and tutorials teach just enough for a reluctant C programmer to get by for a few years until he is eventually promoted to manager and then shitcans Ada for C altogether.  Because the fact is they're right: if you use Ada like C, you might as well save some budget and just use C in the first place.

But if you use Ada like Ada, well, then it's a whole new ballgame.  Your programs will magically just start to, you know, work.  Those hours you spend staring at hex values through a debugger can be better spent staring at the college intern through her white mesh shirt.  You will have less code to write, less to test, and less to maintain.  You will focus on actually writing new features and new programs, and not just marching through a seemingly endless list of bugs.

The key is that Ada is not just a better C.  Strong typing is a totally different paradigm from weak typing, like switching from procedural or OOP, or from polling to event-driven.  You can't just dip your toes in the shallow end and expect to pick it up as you go.  You need to jump in the deep end without any reservations.  You need to completely rethink the way you write software.  Everything you learned about programming is wrong.

And when I casually said that to my C fanboy coworkers, they called my bluff.  If the lack of appropriate, relatable literature was all that was stopping Ada from ruling the programming world, then why not educate everyone myself?  So I came home, had a martini or two, and started a blog.  A blog full of nothing but my opinionated, truculent, perhaps misguided views on programming in Ada.  I didn't write the LRM, I don't work for AdaCore, and I have no claim to be an expert other than getting drunk and dared by coworkers. 

But I have always found that the best programming books, the ones that stay on my shelf long after the content is obsolete, are the ones that put things in context.  The one's that tell you why, and not just what.  Books like "Peter Norton's Assembly Language Book for the IBM PC" or "Michael Abrash's Graphics Programming Black Book" still teach me more twenty years later than anything you will find at your local bookstore today.  With any luck, agree or disagree, you will at least find something worth considering in the forthcoming posts.

Just remember though: opinions are like assholes. Everyone's got one, and yours stinks.