          ClariON-TIME Conference - Johannesburg, July 28/29 1994

                     David Bayliss - Memory Management

(Transcribed by Rob Mousley 100075,772 who accepts no responsibility for errors!  The ClariON-TIME
conference was the third South African Developers Conference.  David Bayliss and Barry Lynch (MD,
Clarion Europe) were guests of honour.)


CDD has been much heralded as the first business or database language that produced applications as
efficient as those generated by a C compiler.  If you run the Clarion compiler and dis-assemble the object,
you will find highly efficient native machine code.  Which only really leaves one real question...  How come
my program runs so damn slowly?!

We have people that claim that version 3 programs run slower than 2.1 and certainly nowhere  near the
speed we claim.  So, why not?  There's two answers to that one:  Memory issues and coding issues.  Of
these, the memory issues are by far the most dramatic.  The difference between a program that fits in
memory and one that doesn't can easily be 2 orders of magnitude.  That's 100 times.  The difference
between a program where the fields are drawn one by one onto the screen and one where it snaps onto
the screen as you press the button.  

Now its these issues that I'm going to be tackling in this session.  Once you've got that, there's another 2
or 3 times speed improvement that you can get and that's what I cover in the 'Squeezing the Last Drop' talk.

Intention

My intention is to use the next 50 minutes to provide you with as much information as possible to understand
the memory issues involved in writing a Clarion program.  I'm not going down to the 'Hold down the Alt key
and press F5' level because that's a design issue.   There are no magic wands.  What I'm here to do is to
try to give you information so that you can then solve these problems in whatever product you are using. 


I am not a trained speaker.  I am a programmer, the same as you are, and I'm a long way from home.  And
past experience has shown that I won't be speaking the same language as you are...  So if you don't
understand what I'm saying, if I'm going too fast for you, please raise your hand!

Efficiency

What is efficiency?  Well there is a whole number of things.  One is the speed of the program.  Now that
is how fast the Clarion statements in your program actually execute.  But there's something else - and that's
the speed of your application.  How fast your program achieves its desired goal.  That's not necessarily the
same the thing.  You can have a really sluggish program, but providing you hide it behind when the user
is typing, he'll never know.  

Size of Executable?

Some people have the need to ship on floppies or something so they might judge 'efficiency' by the size
of the .exe.

Productivity?

How fast is your program ready to ship?  But also, how easy is it to upgrade your program?  That can be
key to the productivity in a changing or real-time market.  

Utilisation of Resources?

Its quite possible to write a program that executes extremely quickly, that lets the user do what he wants
extremely quickly, but is very very unpopular because it completely slugs the rest of the system.  You know,
when your program runs, the rest of the people take a tea-break because you've taken the network.  

Direction

Ok, so how do we tackle this?  Well you can tackle it at the application level - that assumes that you already
have your application.  You can't touch it, but there are still things that you can do to alter how efficiently it
executes.  Then, down at the module level.  This is really how you position the procedures within your
application.  And that can have a big effect on memory issues.  I'm tackling the last two, the data level and
coding level in 'Squeezing the Last Drop'.  You can actually halve the size of your application by choosing
the right data types rather than the wrong data types.  

Compile Options

This is the way you set your project file up - and can have a radical effect on the speed at which your
program goes.  The first one to look at is the Debug options.  Debug literally means 'I want to debug my
program'.  Don't have debug on unless you do want to debug your program.  It makes it go a lot lot slower. 
If you have a problem that you have an application you want to debug,  but when you put the debug
information on the thing becomes so slow, it can't load into memory etc, you can use Min debug.  That
enables you to trace through your code at a statement level and it enables you to get at global variables. 
It just doesn't enable you to see local variables.  So ok its not ideal, but its better than a poke in the eye with
a sharp stick!  The other thing well worth noticing is that you can turn these options on and off at a module
level.  So if you have some big modules and you know they work, hard-wire debug off in those modules so
that when you set debug on, you won't have the whole lot re-compile, you'll only have those that haven't
been hard-wired and there'll be much less debug information generated.

One of the great banes of my life is that most people look at the code generated by the compiler in the
debugger.  When you turn debug on the compiler stops optimising.  If you actually want to see what the
code quality is like, compile without debug on and in the TopSpeed TechKit there's a thing called TSDA
which is just a dis-assembly utility and that actually shows the kind of code the compiler generates.  (That's
just a 'Be nice to your compiler writer' thing really!)

The next really big issue is run-time checks.  Now these are really there as a kind of automated debugging
system.  When you're running your program the run-time checks keep on looking over your shoulder to see
if you're doing anything silly.  This is a good thing for development, and personally, I think its a very good
thing for the Beta phase as well.   We put these on for Beta 2.  In Beta 3 I think we didn't for various
commercial reasons.  But its a good thing to have on during the Beta phase.  But they do make your
application much, much bigger and much, much slower.  Stack overflow checking.  What does that do? 
Basically, if you have a procedure, then it takes up a certain amount of stack space.  If you call another
procedure, it takes up more stack space.  You only have a fixed amount of stack.  I said this before, but I'll
say it again.  Some people have said 'Hey 3009 has really weird behaviour'.  That's just because it doesn't
have quite enough stack set in the project system.  If you're getting abnormal behaviour, just try bumping
that up a little bit.  In CW we now have a stack threshold pragma.  What that allows you to do is say that
above a certain size, local data is no longer allocated from the stack, its allocated from the heap.  So that's
a good way of achieving small stacks, but without getting stack overflow.  

Null parameter checking.  This makes your code run about three times slower than it otherwise would.  But
it means that if you're using omitted parameters, it will check that you really are not using a parameter that
has been omitted.  It will also check for you that you haven't accessed a memo before you've opened the
file.  Not everyone knows this, but before you OPEN a file, a memo has not been allocated.  So if you write 
to an item of a memo before you've opened the file, you're probably writing into your interrupt table in fact. 

Array Index checking.  Believe it or not that checks your array indexing, that's if you have a dimensioned
structure or a dimensioned variable and you access element i, it checks that i is within the bounds of the
array.  That also works for array parameters, but again its slow.  

Zero divide and rounding.  Now these are not quite run-time checks and they're things that you may want
actually in your final application.  Now CPD2.1 allowed you to divide by zero.  That is not a standard thing
to do in a compiled language and CDD up to 3008 didn't allow it.  In 3009 you now have a zero_divide
pragma which defaults to on and that allows you to divide by zero.  The result you get is zero.  But it does
generate significantly worse division code.  So if you're not the sort of person who likes to divide by zero,
turn that off.  Also something called logical rounding.  Its a basic truth that in Clarion, if you assign a REAL
number to an integer it truncates. So if you have 24.8 and assign it, the result will be 24.  Now there is a
particular case where if what you have is 24.999999997 or something, you probably got that thinking that
what you were getting was 25.  So if you set the logical rounding pragma on, if the thing is very close it will
actually round up.  Otherwise it will truncate.  Again, this is basically to give business programmers the
matches they expect, but it does eat up CPU cycles.  So if you don't need that, don't use it.  Again, I think
that one defaults to on.  (Editor's note - In fact you have to add it to the PR file yourself.  Add the line 

#pragma define(logical_round=>on)

into your PR file.)

Those are also issues if you've ever hit the compiler limitations, the isl.ARGS has overflowed.  Basically
anything that makes your code worse, will also make it less likely to compile.  So things like turning off the
compile options can help you there.  

Ok, it has been a traditional truth of all optimising compilers that optimisers don't work.  Its also generally
considered to be a truth of optimising compilers that the compiler has to sweat to optimise.  This is not true
of the Clarion system.  It is an inherently optimising compiler.  The safest code and the easiest code to
produce is optimised code.  If you turn optimisations off, in fact we do the optimisations anyway and then
back-patch it to make it look as though we didn't.  The only reason we put it in was we shipped our first
optimising compiler when Microsoft C 5 came out.  And it was just a known fact that the Microsoft optimiser
just screwed your code.  So the Americans said to us 'Oh yeah sure, you've got an optimising compiler, how
do we turn the optimisations off?'  'You don't!'  'But we need our code to work, how do we switch the
optimisations off?'  'You don't need to, it works anyway!'  'Yeah, we need to turn the optimisations off'.  So
we've given them a switch to turn their optimisations off, but you really shouldn't use it.  So, its reliable and
fast with the optimisations on, leave alone!

Ok, I'm now going into your memory models.

One of the key decisions that you can make is the actual memory model that you're going to have your
application in.  I'm just going to go though what these memory models are and suggest what you should
use them for.  The first one is the static memory model.  What is that?  Well it means that your code comes
in from your .exe and stays fixed in memory.  It means your data comes in from your executable and stays
fixed in memory.  You do, however have a virtual heap.  You may say 'Hey but Clarion doesn't have a heap'. 
Well, yes it does.  Your queues are effectively heap accesses.  So even in static memory model, your
queues can be paged on and off of disk.  Also if you're doing any graphical stuff, which means using GUI2
or if you're doing PCX or GIF presentations, those will go via the virtual memory manager.  So you do have
the capability of using more than 640K even in static model.  But your fixed data, that means your global
data really, and your code all has to be able to fit in memory in one go.  This is the fastest model bar none. 
Everything runs in real memory.  You only need an XT to run this.  But you get something called 'graceless
degradation'.  Which means that as your application gets bigger, its the fastest system, its the fastest
system, its the fastest system, Ugh! Its stopped working!  And once you've hit that, there is absolutely
nothing you can do about it.  So the kind of thing you would use static memory model for, and for this its
brilliant, is if you have one of these command line batch things, where I've got a dirty great database and
I'm sucking a few records in overnight.  You should try to use the static memory model for that.  Static
model is so good that even if you have a program that is up all the time, but you run something every now
and then, it can be worth programming that 'every now and then' bit into a static model and RUNning it. 
Static is really the best but it doesn't really work for highly visual things, just because it won't all fit in
memory.

Which brings us onto the overlay model.  In the overlay model, the real change is the code doesn't have
to reside in memory all the time.  The system will page in for you those items of code that it is about to
execute.  When memory fills up, it will find the bit of code that was accessed least recently and throws it
away.  It executes in real memory, so it executes fast.  Data is not mobile, but it is not loaded in when you
start up your program.  Now this can lead to an interesting effect that when you start up your program at
the beginning of the day, it runs really fast because there's no data in memory.  But every time you touch
an item of global data, particular instances of that are reports and files, then it comes into memory, but once
its in memory it stays fixed.  So the amount of real memory you have available can get smaller and smaller
and as the day goes on your program will get slower and slower.  That may sound like a really dumb thing
to have done, but the argument is that you'll have a multi-module application for which most of the time your
user spends his or her whole day just going into one or two modules, so you only load in the data for those
and the other 5 modules only happen at Christmas and then they don't mind it going slow anyway because
they're drunk!  

This system is heavily adaptable.  You can use it on an 8086 right thru to an 80486.  You can get it to run
in about 400K of real memory if you really need to.  It is, however, the slowest of all options.  Before you
do any procedure call, the system has to check whether that code is on disk, or in memory and it will then
haul it off disk if it needs to.  As I say, it will run on an 8086.  The other quite neat thing about it is that its free
- it comes in the box.  Also, I assume you've all used it, but there's no twitting around, you just click the
button that says overlay model and it just works.

Protected Model:  (Its actually called extended model or something...)  One obvious point is that you have
to pay for this.  It will only run on a 286 or above, with suitable BIOS.  What that tends to mean is it runs on
80% of all 286's, I've only heard of two 386's it doesn't run on and those were slightly cranky old sx's.  It
basically works.  There are memory manager issues, in particular DOS 6.2.  When you do run mem
optimise, by default it sets everything so that you can't run DOS Extender programs.  (The MEMAKER
program just configures EMM386 with the NOEMS option.  You just say to it, don't put that on.  It asks you
some cryptic question like 'Do you want to run...' and it gives you some horrendous name like '...extended
mode executable protected programs' or something so most people say 'No, I wouldn't know what one of
those is' and say No.  You just have to remember to say Yes.  Its just that if the user configured his system,
there's a high probability that he'll have configured it so that your program won't work.  Its just a 'User-
beware'.  The XSTATS program provided with CDD can be run on the client's machine to check his setup. 
In most cases it will tell you what's wrong.  There are one or two things they can do which even fox that thing
and it just won't run.  But that tends to be if they've got a network, some CD ROM driver, an old version of
QEMM, some HIMEM.SYS they found on the back of a lorry or something.  Generally it works.  I should
point out that I'm from R&D so I'm by nature sceptical.  These things a marketing guy will say always work,
a Tech support guy will say 'Well it works nearly all the time'.  I say 'Well, it works but be careful!')  So you
have to be careful. Because what they do is optimise the amount of real memory, by turning off the ability
to get at extended memory.  So what does the protected mode system do for you?

Well, code moves, but data moves as well.  

Stability:  As I say, there are some memory managers under which it won't work initially, but you do have
this thing where if its worked before, it'll carry on working.  The difficult bit is in firing up.  Once its fired up
the DOS Extender is absolutely, extremely stable.  You've probably all come across these XTRACE.TXT
files and you think they're a horrible thing.  In fact they're a really good thing, because they enable you to
find all your bugs during the Beta phase.  Because its blatantly obvious that something's gone wrong
because the whole thing crashes and you get an XTRACE file.  In an overlay system, if you corrupt memory
it will probably keep going for a while, and then somewhere down the line it slowly graunches to a halt and
you  don't know why.  With a DOS Extender system it tells you as soon as something goes wrong. So once
you've got a program which compiles and runs and has been in Beta under the DOS Extender, it is then
very solid.  

CW is basically a DOS Extender system.  Windows is in fact a DOS Extender, amoungst other things.  You
should still be thoughtful about your memory management in a DOS Extender machine.  This is particularly
true under Windows.  Your customer may have an 8 Mhz machine, but he may have 8 different applications
running at the same time, which means really he has a 1 Mbyte machine.  

Catastrophic Degradation.  Your system could be on the edge of a catastrophe.  What I said earlier - the
way that an overlay system works - its the same way the DOS Extender works.  When it runs out of
memory, it looks for the piece of code that was least recently used, throws it away and loads in the one
you're next going to use.  The idea is that this gives you an optimal solution.  Because if you're in some kind
of a loop and for example I'm calling 4 different procedures, as you call one procedure if  its not in memory,
it hauls it in, as it calls the next one it hauls it in, as it calls the next one and so on.  So after one iteration of
the loop, all your procedures are in memory and then you can just keep going.  So its an optimal solution
- you do the minimum amount of swapping.  But...

Now consider.  I have enough space in memory for four procedures.  But I have five procedures in my loop. 
So I go through the first four and load them into memory.  I then go into the fifth.  Right, I need to page
something out, what was the one used the longest time ago?  The one at the top of the loop, so I page it
out, page the fifth one in, fine.  Go back up to the top of the loop, now which one do I need?  The one that's
first which I've just paged out, so I page it back in which means I have to page something else out.  The
least recently used on is the second one, ...   You can prove that the one you have just paged out is the one
that you're going to need next.  So you can go from having a system that is optimally brilliant to a situation
that is optimally catastrophic.  We had one person who reported that his Clarion application ran ten times
slower if he installed one mouse driver rather than another.  It turned out that the one mouse driver took
3K more space than the other one and this was just enough to put him over that critical edge.  So what I'm
saying is that you should try to keep your application as lean as possible.  Even if you don't need it now, I
can guarantee that the one time you'll hit system thrashing is just when you've added the final bug fix just
before you ship the product!  That's just the way it always happens.  (It actually happened to Lotus when
they were developing, I think it was, 2.0 of their spreadsheet.  They had developed all these different things
individually, they all worked fine.  They put them into one package, it all worked fine, they'd done the press
announcement, they'd done a ship.  Someone found there was a problem with the intro screen.  They'd
spelt one of the names wrong and left a character out.  They put the character in and the thing started
thrashing when they fired up the application.)

So keep lean and mean all the time!

How?

I've just explained it, but a critical thing to understand is that your application has a 'working set'.  A working
set is the number of procedures you are actively calling at any one time.  This will typically be in a loop. 
(Clarion code is sufficiently fast that if you're executing it straight, it will execute faster than you can type it,
so you must be in a loop, if you've got something that's going slowly.)  And what you have to do is minimise
that working set.  You just minimise the number of procedures that you want in any one go.  

A Key Issue is that in the DOS Extender system and in the overlay system and in CW (kind of), things are
paged on a per module basis.  Now that means that if you have a module with ten procedures in it, and you
call one of those ten procedures, you are bringing all of them into memory at the same time.  So what you
want to do is to minimise the number of procedures in the modules in your critical loop.  Is it better, then to
have only one procedure per module instead of the default four?  Well it works out that if you have the right
four in the module, you get better performance.  

Call chains.  What I'm basically saying here is, try to know what procedures your procedures call.  The
reason for this is, you may be making one procedure call, but if that's making three other procedure calls,
you could be hauling in an awful lot more than you expect.  So this really comes down to the concept of a
tree.  You know when you bring up your Designer you have your nice little tree showing the way things are
called.  What you want to do is basically optimise it so that one part of a tree ends up in one module.  The
chances are that if you're calling the top item of the tree, you're going to be calling one of the branches. 
And this is how the default number four came about.  We kind of guestimated that if you've just written a
procedure and you're calling three other ones, then probably the next thing you're going to write is those
three other ones.  So those will probably end up in the right module.  Now that sounds like a dreadful
heuristic, but that's all we could do.  Now, we have in fact come up with something better than that which
I'll come on to in a minute.

You have the module view - so you have no excuse.  You have the ability to move around in your
application, where your procedures live.  So what you want to do is to get a particular tree into one module
if you possibly can.  Now we have actually done this for you.  You have one of the re-populate modes that
I think is called 'optimised re-populate'.  That analyses your tree structure and it just pulls treelets into
modules given the number of procedures that you allow it.  That is the best we can do statically to help you. 
But you have more information than that.  We only know what procedures could be called.  You know what,
in a typical situation, will be called.  So you can use our optimised repopulate, but what you should do is
think about what loops do I need fast, and what procedures are called, and just pull those ones in.  The kind
of thing that tends to trip people up.  If you've got a report, say, where every day you use this report in mode,
but, once a month you do 29 special things with this report.  Now those 29 special things you should really
have in another module.  You don't want them in your main module, because most of the time its just taking
up space.  

Module based programming.  You can actually go one step further but this actually slightly alters the design
of your application.  At the moment, an application is seen very much at a global level and the fact that you
split it up into source files is just 'You know, I scattered them'.  But Clarion does have full support for
modular based programming in the same way that Modula 2 does.  And if you're designing a large
application, it's well worth considering making use of this facility.  You have the ability to put static data
actually in a module.  Which can then be accessed by the procedures in that module.  Now the good thing
about that is that you know that in any procedure in that module, you'll always be accessing your own local
data.  So there'll be no paging happen to bring that data in.  And you know that you'll all be using data in
the same segment.  So what typically happens here - I've pulled in a part of my procedure tree.  I'm
executing my procedures in that procedure tree which I've very carefully moved into the same module and
because they're all talking about the same thing I've actually put my data in that same module.  So I don't
have to have my main data in memory as well.  So I've actually done is to split my application into
completely logical sections in the source files.  And I effectively have only one source file in memory at a
time.  So what that means is that you obviously want as many of your procedure calls to be in the same
module as possible.  If you're going to call into other modules, always try to call into the same module.

(Tape Break)

We have a few different optimise algorithms but there is still no substitute for you knowing what is called.
We cannot predict how your program runs.  

As well as having local data, you can, in fact, have local MAPs as well.  So this means that you can hide
certain support procedures down in your module and guarantee that they are only called down in that
module.  

In CW, one of the optimise things is it will actually analyse through what procedures are called from where
and it will say 'Hey you don't need this in your global MAP' and it will actually put it into a local MAP if it can
get away with it.  Now that's also a really good thing because it means you can change the prototype and
of course your whole application doesn't have to be re-compiled, only that particular module.  So you really
are splitting the whole thing up.  Which is where I come into programming in the large.  This is more of an
issue when Team Developer comes along because you can have 100 people working on one application,
with 20 or 30 thousand procedures in there.  I now change one of my prototypes and everybody's stuff has
to re-compile!  That is clearly catastrophic.  So you do need to consider moving over to a system where only
a certain number of source files know about a certain set of data and only a certain set of source files know
about a certain set of procedures.  Of course you still have global data for things which are genuinely global
but I'm trying to get away from this idea that its either totally global or its totally local.  You do have the ability
to share global data, but in only a small module of your application.  

A logical extension of that is that once you've got it all into one local map with local data, you can turn it into
a DLL.  So then you can take that thing and ship it separately to the rest of your application.  

One of the common misconceptions with the TopSpeed technology is 'Smart Linking'.  What smart linking
does, is it will link into your application all the procedures that could be called given any set of data.  So if
you drive what parts of your application run by command line or by user input, the system has to link in all
possible procedures and all possible data.  A particular and rather unfortunate instance of this are
SCREENs and also REPORTs.  Because they are data.  As soon as you touch any SCREEN it has to haul
in all the code for all possible SCREENs.  (That is actually no longer true because it was so horrendous
that I fixed it, and I'll come on to how I've done that.)  But generally, that is the truth.  So be careful, if you
have lots of options which you think can't be called, if for some set of data it could be called, then I have
to link it in.  If you're going for small .EXE file size, or you don't want to haul in great gobs of library that you
don't need, you have to consider what parts of the Clarion system you call.  For these purposes, it falls into
3 or 4 different categories.  There is what I call Clarion-proper, which is if you're using the Clarion language
pretty much like a C compiler.  So you're doing arithmetic, tou're calling procedures, you're doing string
work.  That's all the stuff up to and including Chapter 6 in your LRM.  The overhead for using that is
negligible.  You will easily get a 30K program if you're using that kind of stuff.  So that's where Clarion said
you've got tiny tiny programs.  you can get genuinely tiny programs if you stay behind Chapter 6.  

Then there's what I call the Clarion library.  That's the different kinds of bits & bobs they've given you.  So
there's DATE(), there's COMMAND(), there's RUN(), some of the string functions like SUB(), INRANGE(),
that kind of thing.  There are certain library calls and basically for each one of those you call, you will get
a slight increase in your .EXE size.    But you only get it once.  Once you've called it its then free, it doesn't
get linked in again.  Now those things, they're the stuff of Chapter 13, Chapter 7, another special case of
it is queues.  Once you've done any queue operation, you get all of the queue stuff in.  

Now here's the heavy stuff.  You then have the 4GL features.  Now in these I'm including file drivers,
REPORTs, SCREENs.  File drivers are an instance where, as soon as you link in any file driver into your
application, you get the whole file driver coming in.  There is no smart linking across file drivers whatsoever. 
 And for each different file driver you bring in you will get a very heavy overhead.  For example, with the
Clarion driver you get about 50K, the dBaseIII driver you get about 200-300K.  So although you can do
multiple driver applications, in an overlay system, if you're accessing Clarion files at one time and then go
on and access dBase files if you're say reading a Clarion record, writing a dBase record in a loop, you're
going to be paging these dirty great file drivers in and out of memory and that can be very slow.  

The Clipper driver is about 120K.  The way to find out how big any specific driver is, is just go and look at
the .DLL and that is basically it.  

REPORTs also come in one lump.  A program that just OPENs a REPORT structure, that's

REPORT
END
  CODE
  OPEN(REPORT)

will be 110K.  But 75K of that is actually shared with the screen library.  So the cost of your first REPORT 
if you already have a SCREEN is about 45K.  Now, there was a stage where if you touched any SCREEN
structure, your program instantly became about 350K.  We have now reduced that you it depends what
fields you use in the SCREEN.  One warning, there are some procedures that you might think are library
procedures, but they're not.  A TYPE() will be very small, if you do a SHOW() then you can easily get 150K
of .EXE straight away, because SHOW() brings in the SCREEN library.  

SCREEN structure costs.  Your first SCREEN structure, if you have just a string in it, will cost you 150K. 
(Although, to reiterate, 75K of that is shared with REPORTs.)  

Your first field that does input, will cost you another 70K.  So a SCREEN with an ENTRY field will take you
up to about 220K.

From then on, RADIOs cost you 7K, (remember that's for your first RADIO, only.  You don't take another
7K for each subsequent RADIO) your first TEXT box costs you 13K, your first GRAPHIC (by that I really
mean if you throw your SCREEN into graphics mode with a GRAPHIC attribute on the SCREEN) is 15K,
your first IMAGE is 18K, your first LIST box is 22K, your first MENU or PULLDOWN is 24K, and your first
HELP attribute is another 22K, plus it will haul in the LIST box.  So if you've used the LIST box that's not
significant, but if you haven't used a LIST, with your HELP you get the LIST box overhead as well.

