Issue #010
May, 1996


Contents:

Introduction to Templates Part 2 - Class Templates
Introduction to Stream I/O Part 5 - Streambuf
Using C++ as a Better C Part 10 - General Initializers
Performance - Per-class New/Delete


INTRODUCTION TO TEMPLATES PART 2 - CLASS TEMPLATES

To continue our introduction of C++ templates, we'll be saying a few
things about class templates in this issue.  Templates are a part of
the language still undergoing major changes, and it's tricky to figure
out just what to say.  But we'll cover some basics that are well
accepted and in current usage.

A skeleton for a class template, and definitions of a member function
and a static data item, looks like this:

        template <class T> class A {
                void f();
                static T x;
        };

        template <class T> void A<T>::f()
        {
                // stuff
        }

        template <class T> T A<T>::x = 0;

T is a placeholder for a template type argument, and is bound to that
argument when the template is instantiated.  For example, if I say:

        A<double> a;

then the type value of T is "double".  The binding of template
arguments and the generation of an actual class from a template is a
process known as "instantiation".  You can view a template as a
skeleton or macro or framework.  When specific types, such as double,
are added to this skeleton, the result is an actual C++ class.

Template arguments may also be constant expressions:

        template <int N> struct A {
                // stuff
        };

        A<-37> a;

This feature is useful in the case where you want to pass a size into
the template.  For example, a Vector template might accept a type
argument that tells what type of elements will be operated on, and a
size argument giving the vector length:

        template <class T, int N> class Vector {
                // stuff
        };

        Vector<float, 100> v;

A template argument may have a default specified (this feature is not
widely available as yet):

        template <class T = int, int N = 100> class Vector {
                // stuff
        };

        Vector<float, 50> v1;           // Vector<float, 50>
        Vector<char> v2;                // Vector<char, 100>
        Vector<> v3;                    // Vector<int, 100>

To see how some of these basic ideas fit together, let's actually
build a simple Vector template, with set() and get() functions:

        template <class T, int N = 100> class Vector {
                T vec[N];
        public:
                void set(int pos, T val);
                T get(int pos);
        };

        template <class T, int N> void Vector<T, N>::set(int pos, T val)
        {
                if (pos < 0 || pos >= N)
                        ; // give error of some kind

                vec[pos] = val;
        }

        template <class T, int N> T Vector<T, N>::get(int pos)
        {
                if (pos < 0 || pos >= N)
                        ; // give error of some kind

                return vec[pos];
        }

        // driver program

        int main()
        {
                Vector<double, 10> v;
                int i = 0;
                double d = 0.0;

                // set locations in vector

                for (i = 0; i < 10; i++)
                        v.set(i, double(i * i));

                // get location values from vector

                for (i = 0; i < 10; i++)
                        d = v.get(i);

                return 0;
        }

Actual values are stored in a private vector of type T and length N.
In a real Vector class we might overload operator[] to provide a
natural sort of interface such as an actual vector has.

What would happen if we said something like:

        Vector<char, -1000> v;

This is an example of code that is legal until the template is
actually instantiated into a class.  Because a member like:

        char vec[-1000];

is not valid (you can't have arrays of negative or zero size), this
usage will be flagged as an error when instantiation is done.

The process of instantiation itself is a bit tricky.  If I have 10
translation units (source files), and each uses an instantiated class:

        Vector<unsigned long, 250>

where does the code for the instantiated class's member functions go?
The template definition itself resides most commonly in a header file,
so that it can be accessed everywhere and because template code has
some different properties than other source code.

This is an extremely hard problem for a compiler to solve.  One
solution is to make all template functions inline and duplicate the
code for them per translation unit.  This results in very fast but
potentially bulky code.

Another approach, which works if you have control over the object file
format and the linker, is to generate duplicate instantiations per
object file and then use the linker to merge them.

Yet another approach is to create auxiliary files or directories
("repositories") that have a memory of what has been instantiated in
which object file, and use that state file in conjunction with the
compiler and linker to control the instantiation process.

There are also schemes for explicitly forcing instantiation to take
place.  We'll discuss these in a future issue.  The instantiation
issue is usually hidden from a programmer, but sometimes becomes
visible, for example if the programmer notices that object file sizes
seem bloated.


INTRODUCTION TO STREAM I/O PART 5 - STREAMBUF

In previous issues we talked about various ways of copying files using
stream I/O, some of the ways of affecting I/O operations by specifying
unit buffering or not and tying one stream to another, and so on.

Another way of copying input to output using stream I/O is to say:

        #include <iostream.h>

        int main()
        {
                int c;

                while ((c = cin.rdbuf()->sbumpc()) != EOF)
                        cout.rdbuf()->sputc(c);

                return 0;
        }

This scheme uses what are known as streambufs, underlying buffers used
in the stream I/O package.  An expression:

        cin.rdbuf()->sbumpc()

says "obtain the streambuf pointer for the standard input stream
(cin) and grab the next character from it and then advance the
internal pointer within the buffer".  Similarly,

        cout.rdbuf()->sputc(c)

adds a character to the output buffer.

Doing I/O in this way is lower-level than some other approaches, but
correspondingly faster.  If we summarize the four file-copying methods
we've studied (see issues #008 and #009 for code examples of them),
from slowest to fastest, they might be as follows.

Copy a character at a time with >> and <<:

        cin.tie(0);
        cin.unsetf(ios::skipws);

        while (cin >> c)
                cout << c;

Copy using get() and put():

        ifstream ifs(argv[1], ios::in | ios::binary);
        ofstream ofs(argv[2], ios::out | ios::binary);

        while (ifs.get(c))
                ofs.put(c);

Copy with streambufs (above):

        while ((c = cin.rdbuf()->sbumpc()) != EOF)
                cout.rdbuf()->sputc(c);

Copy with streambufs but explicit copying buried:

        ifstream ifs(argv[1], ios::in | ios::binary);
        ofstream ofs(argv[2], ios::out | ios::binary);

        ofs << ifs.rdbuf();

A table of relative times, for one popular C++ compiler, comes out
like so:

        >>, <<                  100

        get/put                 72

        streambuf               62

        streambuf hidden        43

Actual times will vary for a given library.  Perhaps the most critical
factor is whether functions that are used in a given case are inlined
or not.  Note also that if you are copying binary files you need to be
careful with the way copying is done.

Why the time differences?  All of these methods use streambufs in some
form.  But the slowest method, using >> and <<, also does additional
processing.  For example, it calls internal functions like ipfx() and
opfx() to handle unit buffering, elision of whitespace on input, and
so on.  get/put also call these functions.

The fastest two approaches do not worry about such processing, but
simply allow one to manipulate the underlying buffer directly.  They
offer fewer services but are correspondingly faster.


USING C++ AS A BETTER C PART 10 - GENERAL INITIALIZERS

In C, usage like:

        int f() {return 37;}

        int i = 47;

        int j;

for global variables is legal.  Typically, in an object file and an
executable program these types of declarations might be lumped into
sections with names like "text", "data", and "bss", meaning "program
code", "data with an initializer", and "data with no initializer".

When a program is loaded by the operating system for execution, a
common scheme will have the text and data stored within the binary
file on disk that represents the program, and the bss section simply
stored as an entry in a symbol table and created and zeroed
dynamically when the program is loaded.

There are variations on this scheme, such as shared libraries, that
are not our concern here.  Rather, we want to discuss the workings of
an extension that C++ makes to this scheme, namely general initializers
for globals.  For example, I can say:

        int f() {return 37;}

        int i = 47;

        int j = f() + i;

In some simple cases a clever compiler can compute the value that
should go into j, but in general such values are not computable at
compile time.  Note also that sequences like:

        class A {
        public:
                A();
                ~A();
        };

        A a;

are legal, with the global "a" object constructed before the program
"really" starts, and destructed "after" the program terminates.

Since values cannot be computed at compile time, they must be computed
at run time.  How is this done?  One way is to generate a dummy
function per object file:

        int f() {return 37;}

        int i = 47;

        int j; // = f() + i;

        static void __startup()
        {
                j = f() + i;
        }

and a similar function for shutdown as would be needed for calling
destructors.  Using a small tool that will modify binaries, and an
auxiliary data structure generated by the compiler, it's possible to
link all these __startup() function instances together in a linked
list, that can be traversed when the program starts.

Typically this is done by immediately generating a call from within
main() to a C++ library function _main() that iterates over all the
__startup() functions.  On program exit, similar magic takes place,
typically tied to exit() function processing.  This approach is used
in some compilers but is not required; the standard mandates "what"
rather than "how".

Some aspects of this processing have precedent in C.  For example,
when a program starts, standard I/O streams stdin, stdout, and stderr
are established for doing I/O.

Within a given translation unit (source file), objects are initialized
in the order of occurrence, and destructed in reverse order (last in
first out).  No ordering is imposed between files.

Some ambitious standards proposals have been made with regard to
initialization ordering, but none have caught on.  The draft standard
says simply that all static objects in a translation unit (objects that
persist for the life of the program) are zeroed, then constant
initializers are applied (as in C), then dynamic general initializers
are applied "before the first use of a function or object defined in
that translation unit".

Calling the function abort() defined in the standard library will
terminate the program without destructors for global static objects
being called.  Note that some libraries, for example stream I/O, rely
on destruction of global class objects as a hook for flushing I/O
buffers.  You should not rely on any particular order of
initialization of global objects, and using a startup() function
called from main(), just as in C, still can make sense as a program
structuring mechanism for initializing global objects.


PERFORMANCE - PER-CLASS NEW/DELETE

Some types of applications tend to use many small blocks of space for
allocating nodes for particular types of data structures, small
strings, and so on.  In issue #002 we talked about a technique for
efficiently allocating many small strings.

Another way of tackling this problem is to overload the new/delete
operators on a per-class basis.  That is, take over responsibility for
allocating and deallocating the storage required by class objects.
Here is an example of what this would look like for a class A:

        #include <stddef.h>
        #include <stdlib.h>

        class A {
                int data;
                A* next;
        #ifdef USE_ND
                static A* freelist;             // pointer to free list
        #endif
        public:
                A();
                ~A();
        #ifdef USE_ND
                void* operator new(size_t);     // overloaded new()
                void operator delete(void*);    // overloaded delete()
        #endif
        };

        #ifdef USE_ND
        A* A::freelist = 0;

        inline void* A::operator new(size_t sz)
        {
                // get free node from freelist if any

                if (freelist) {
                        A* p = freelist;
                        freelist = freelist->next;
                        return p;
                }

                // call malloc() otherwise

                return malloc(sz);
        }

        inline void A::operator delete(void* vp)
        {
                A* p = (A*)vp;

                // link freed node onto freelist

                p->next = freelist;
                freelist = p;
        }
        #endif

        A::A() {}

        A::~A() {}

        #ifdef DRIVER
        const int N = 1000;
        A* aptr[N];

        int main()
        {
                int i;
                int j;

                // repeatedly allocate / deallocate A objects

                for (i = 1; i <= N; i++) {
                        for (j = 0; j < N; j++)
                                aptr[j] = new A();
                        for (j = 0; j < N; j++)
                                delete aptr[j];
                }

                return 0;
        }
        #endif

We've also included a driver program.  For this example, that recycles
the memory for object instances, the new approach is about 4-5X faster
than the standard approach.

When new() is called for an A type, the overloaded function checks the
free list to see if any old recycled instances are around, and if so
one of them is used instead of calling malloc().  The free list is
shared across all object instances (the freelist variable is static).
delete() simply returns a no-longer-needed instance to the free list.

This technique is useful only for dynamically-created objects.  For
static or local objects, the storage has already been allocated (on
the stack, for example).

We have again sidestepped the issue of whether a failure in new()
should throw an exception instead of returning an error value.  This
is an area in transition in the language.

There are other issues with writing your own storage allocator.  For
example, you have to make sure that the memory for an object is
aligned correctly.  A double of 8-byte length may need to be aligned,
say, on a 4-byte boundary for performance reasons or to avoid
addressing exceptions ("bus error - core dumped" on a UNIX system).
Other issues include fragmentation and support for program threads.


ACKNOWLEDGEMENTS

Thanks to Nathan Myers, Eric Nagler, David Nelson, Terry Rudd,
Jonathan Schilling, and Clay Wilson for help with proofreading.


SUBSCRIPTION INFORMATION / BACK ISSUES

To subscribe to the newsletter, send mail to majordomo@world.std.com
with this line as its message body:

subscribe c_plus_plus

Back issues are available via FTP from:

        rmii.com /pub2/glenm/newslett

or on the Web at:

        http://www.rmii.com/~glenm

There is also a Java newsletter.  To subscribe to it, say:

subscribe java_letter

using the same majordomo@world.std.com address.

-------------------------

Copyright (c) 1996 Glen McCluskey.  All Rights Reserved.

This newsletter may be further distributed provided that it is copied
in its entirety, including the newsletter number at the top and the
copyright and contact information at the bottom.

Glen McCluskey & Associates
Professional C++ Consulting
Internet: glenm@glenmccl.com
Phone: (800) 722-1613 or (970) 490-2462
Fax: (970) 490-2463
FTP: rmii.com /pub2/glenm/newslett (for back issues)
Web: http://www.rmii.com/~glenm
