This paper was written while the author was with YOURDON Inc.
This document has been made available by courtesy of the author.

It has been re-typed by Phil Budne, who is to blame for any transcription errors.

A Little Implementation Language

by
P.J. Plauger

ABSTRACT

A language is described that was implemented on a PDP-11 computer for writing system-level code for the PDP-11 family of minicomputers. The Little Implementation Language LIL offers a number of features that facilitate writing structured, high-level code with no sacrifice in efficiency over assembly language. The discussion ends with a harsh evaluation of its future usefulness.

INTRODUCTION

There is a broad gap between the fanciest macro assemblers and the simplest system implementation languages (SILs), a gap that needs filling. No matter now elaborate an assembler is, it is still just an assembler, with all the limitations on productivity attendant on writing code at too detailed (an unstructured) a level. SILs on the other hand, seem to impose a layer of insulation between programmer and machine that causes more trouble than benefit in an environment where program size and execution time are very important, as in mini- and microcomputers. What is desired, then, is a language that lets one talk about the matters that concern systems level programmers, but at a level that permits economic expression and encourages the well structured coding style we now know to be important.

It is no surprise that such languages are not common. People who think in terms of high-level languages have traditionally been problem-oriented; people who write systems have been reluctant to yield control of code generation to a language processor. Is is only recently that, thanks to the success of some of the some of the earlier SILs, the use of compilers has earned some grudging respect from systems writers. The cobbler's children are always the last to get shoes.

This paper describes a compiler, for the PDP-11 family of computers, that lies in this region. It can be considered a narrative assembler, in the sense that the users knows rather precisely that code is generated, but can specify it with an economy of notation. It can also be considered a little implementation language, because it deals with multi-line, nested control structures rather than, at a time code, and because it uses data type attributes to help direct the generation of instructions.

The LIL language was defined in Backus-Naur Form, which was used by a compiler-compiler to produce a parser. The parser was combined with semantic routines written in the SIL called C (developed by Dennis Ritchie) running under the UNIX operating system. Significant portions of the LIL compiler were subsequently rewritten in LIL to prove its effectiveness, and the resulting language was used to help write a small PDP-11/10 operating system at Bell Labs. The compiler is one-pass and produces object code that may be freely intermixed with that of C or the UNIX assembler.

In its final form LIL probably most closely resembles Wirth's PL/360, in that it is a structured, machine dependent language with data types. Its control flow syntax, however, is modeled closely after that of C, for the simple reason that C was successful and there was no good excuse for introducing a new notation. A number of other ideas were carried over from an earlier (unstructured) language written by A. G. Fraser and P. D. Jensen for the GTE Tempo I.

SALIENT FEATURES

What makes LIL unique, and uniquely suited for minicomputer programming, is the way its various design features work together to do what is wanted without also doing many things unwanted. But to understand the language as a whole, it is necessary first to understand its parts. Here are it's principal concepts, taken separately;

The data types of the language correspond rather directly to the addressing modes of the PDP-11. (For another machine, a different set of modes is defined.) This there are the bases types register (R), memory reference (M), and immediate (I), and the derived types indexed (I[R] or [M]R), autoincrement ([R]++) and autodecrement (--[R]). The latter two modes support pushdown stacks and array walking in the PDP-11. Any of the above can further be made indirect ([I[R]], [M], [[R]], etc.), or of type byte.

Arithmetic expressions describe the data manipulation code to be generated. Thus

R3+X;

generates an instruction that adds X to R3. This need not be an add instruction; If X is known at compile time to have the value +1, an inc is generated; if X is -1, a dec is used. Similarly,

R3=X;

generates a clr if X is zero, rather than a mov. Such decisions ensure that the best instruction is used whenever there is a choice, thereby encouraging the use of parameters in place of flags with ``magic'' values like zero and one. It means that the user needs to know only a handful of operators instead of a page full of opcodes, a service that would be even more valuable on more irregular machines such as the IBM 360.

The compiler will not, however, accept an arbitrary combination of operands s with an operator. While it is not a rigorous rule, it is generally true that at most one instruction will ever be generated for each infix operator. That way the users is assured that his code will not puff up unmanageably in translation, so he can remain aware of the limitations of the machine and still have a uniform notation for describing what is possible. (The ``at most'' part of the rule leaves the door open for selective optimization).

An interesting rule is applied to multiple operator expressions, One cannot say

R3:=X+Y+Z;

since that implies that the right hand side is computed someplace else, then moved into R3. At the lowest level there is no ``someplace else.'' So all operators including the move operators = and ->, are given equal weight; an expression is evaluated strictly left to right using the leftmost operator repeatedly. Thus

R3=X+Y+Z;

is the same as

R3=X;R3+Y;R3+Z;

and long sequences of code invoking a single accumulation can be written naturally on one line

R0=X+Y[R1=i**1] + (R4=Z) -> W

Note the use of subexpressions inside brackets and parentheses to alter the left to right flow long enough to set up other operands. This line is equivalent to

R0-X;R1=i;R1**1;	# shift R1 left one
R0+Y[R1];R4=Z;
R0+RY;R0->W		# shift same as W=R0

Conditional expressions determine the conditional branches to be generated in control flow statements. These can be relational

if (R0<0)R2=-R2;	# negate R2 if<0

or present conditions

if (carry){msp+1;carries+1;}

or expressions

if (i<MAXI && control>0) control+1;

The conditional operators && and || bind in the usual manner but guarantee strict left to right testing with no unnecessary (or unwanted) evaluation. Thus

while(0<-R0&&R0<=MAXI&&A[R0]~= target)
        R0+1;

will never reference A[R0] if R0 is out of range.

The control flow statements each control a single statement, but as can be seen above that ``statement'' may be an arbitrary reference in braces. if...else is provided as well as loops of all flavors (tests at top, bottom, or in the middle). There is a goto, but labels are introduced only in conjunction with groups;

linecount{0;}
doread{R0=READCODE;SYSREAD:RTS:}

This encourages one to think through the scope of code headed by a label, whether it be a data area, a subroutine, or even a block of code jumped to. Note, by the way, how constants are merely specified in line, so that special instructions need no special treatment.

One final feature worth particular attention is the compile time expression. Enclosing an expression in double quotes instructs the compiler to do the arithmetic right away instead of generating run-time code. Thus

R0="A-2"[R1];

generates

mov A-2(R1),R0

and not a subtract instruction in addition to the move. Conditional expressions can even be used at compile time to conditionally generate code:

if ("MACHINE<35")
        R0**1**1**1;
else
        R0**3;

Only one of the two lines is actually compiled, and no branches are generated.

OPTIMIZATION

LIL makes a limited attempt to optimize the code produced, principally in conjunction with tests and branches, since that is the only code that it tries seriously to control. It will pick the best of a choice of instructions or addressing modes, but it would never change the three single shifts in the last example into one triple shift. It assumes in general that the user knows what he is doing and does not want to be protected from himself.

Some optimization is performed just to encourage more uniform notation and to avoid the need for extra operators. So

if (a>0)

must generate a tst but

if (a+b>0)

need not, since the condition code is known to be set by the immediately preceding add. And, by the way,

if (>)

is legal and assumes the condition code is properly set.

Conditional branches in the PDP-11 can only be specified to a 512-word region around the current location; beyond that a conditional jump (of the opposite sense) over a two word jump must be used. LIL maintains a 256-word ``delay line'' for the emitted code to detect as many short branches as possible. (It does not catch them all.) Beyond that, it strives to produce the most economical set of branches for a given test and eliminates most of the branches to branches that occur in structured programming. Often the code produced is shorter than a hand coder would dare produce, because of those savings.

Optimization, then is basically of the ``peephole'' type only. This seems to be all that is necessary or desirable at the systems level.

EVALUATION

The use of LIL makes for very readable code that is much easier to write, maintain and modify than assembly language. Despite its considerable power in some dimensions it is indeed a little language -- the conventions can be summarized in a few coherent rules and the compiler itself comprises less than a thousand lines of code. More important, it requires no runtime support and permits the user to write code at a high level with no sacrifice in space or speed. For many minicomputer and microcomputer software development projects it appears to be well suited.

LIL is, however, a failure.

Its stiffest competition at Bell Labs is the language C, which is higher level, and machine independent. Every time it looked like C was too expensive to use for a particular project, LIL was considered. But almost every time, it proved easier (and more rewarding) to improve C, or its runtime support, or the hardware, than to invest time in yet another language.

LIL is superior to assembly code, to be sure, but there are few remaining crannies where C has not penetrated. Since C compiles to assembly language, why not just use the assembler to handle the few hundred lines of intractable code? It has to be around anyway.

Why not indeed? A machine independent language is always superior -- even for writing machine dependent code (it's easier to find trained programmers) -- so long as the overhead can be endured. Its is clear now that writing straightforward code and then measuring it is the formula for the best end product. At worst there will be 5-15 per cent overhead, which is seldom critical. Once system writers become mature enough to recognize this basic truth, they gravitate naturally toward machine independent SILs.

If there is a role at all for languages like LIL, it will be in microcomputers and very small minicomputer configurations. Not so much because the need is more legitimate but because of the desirability of a sil such as C will be more slowly recognized. Otherwise it looks like the little implementation language is an idea whose time as come -- and gone.

REFERENCES

D.M. Ritchie and K.L. Thompson, The UNIX Time-Sharing System. Communications of the ACM, July 1974.
N. Wirth, PL/360, A programming Language for the 360 Computers. Journal of the ACM, January 1968.