VACETS Regular Technical Column
The VACETS Technical Column is contributed by various members , especially those of the VACETS Technical Affairs Committe. Articles are posted regulary on [email protected] forum. Please send questions, comments and suggestions to [email protected]

VACETS Regular Technical Column

The VACETS Technical Column is contributed by various members , especially those of the VACETS Technical Affairs Committe. Articles are posted regulary on [email protected] forum. Please send questions, comments and suggestions to [email protected]

September 3, 1996

The UNIX run-time environment

Knowledge of the UNIX runtime environment is a fundamental requirement that UNIX programmers and/or non-programmers should be familiar with. Knowing how a program is started, how it terminates, and the runtime memory layout of its execution will (i) allow programmers to program more efficiently and (ii) give non-programmers a better understanding of the UNIX runtime environment as compared to other operating systems. In this article we first present the aforementioned topics; a brief discussion of different object file formats is then followed; and we conclude with a discussion of program loading.

Note that since the UNIX environment is closely tied to the C programming language, materials present in this article are particular to C and/or C++ programs. Of course, this does not mean that the same implication can't be applied to other languages such as Lisp, FORTRAN, Ada, etc.

Runtime Process

Program start up

In order for the UNIX kernel to execute a program, there must be an entry point in the program to tell the kernel where to begin execution. In a C program this entry point is the 'main' routine, as known by most C programmers. What most C programmers does not know, however, is that there lies a special start up routine that get called before 'main'. This special routine is responsible for obtaining the command-line arguments and environment variables from the kernel and providing them to the main routine, i.e., via the 'argc', 'argv', and/or an optional list of environment variables as arguments to 'main'. (For a C++ program, the start up routine also handles the global constructor and destructor lists.) The GNU compiler/linker, for example, defines this special start up function as '_start' in the file 'crt1.o'. At the last stage of the compilation process, the link editor links this file (along with other start up object files) with the program in addition to any libraries that might be needed to resolve unresolved references.

Program termination

There are various way in which a program can be terminated. The most common way to exit a program is via 'return' from the main routine. Alternately, 'exit', '_exit', or 'abort' can be called, within any function, to terminate the program. Of the different ways to terminate a program, only calling 'exit' or 'return' will perform cleaning up upon exit, i.e., closing any outstanding open file descriptors.

Runtime Data Structure

UNIX object and executable files come in various flavors. There are currently a handful of different formats being used by various operating systems. Some of the common formats include: COFF (Common Object File Format), which is used on SunOS 4.0.x and an extended version of it (XCOFF) is used on AIX; a.out is used through various flavors of UNIX including SunOS 4.1.x and Linux versions prior to 1.2.13; and ELF (Executable and Linkable Format), which has become the de-facto format and is currently used on Solaris and many other operating systems.

Although each format is internally different, there lies a common notion known as 'segment' that each format possesses. Segment, often called section in ELF, is an area in an object file that encapsulates a particular type of data (e.g., symbol table entries, global variables, etc.). An a.out format, for example, contains three major segments: BSS (Block Started by Symbol), text, and data. BSS segment does not take up spaces in an object file; it is used to hold variables that have not yet been initialized. Text segment refers to the actual code as written and translated into machine instructions. And data segment is where all the initialized global and static variables reside.

There are two segments that are fundamental to each format mentioned above; namely, text segment and data segment. Text segment, as mentioned, is where machine instructions are located. Data segment is where global and statics variables are being defined. At runtime, the data segment is broken down into three constituent parts known as static, stack, and heap data. Static data refers to global or 'static' variables whose storage spaces are determined at compile-time, e.g.,

int int_array[100];

int main() { static float float_array[100]; double double_array[100]; char *pchar;

pchar = (char *)malloc(100);

/* .... */

return (0); }

where both int_array and float_array are static data. Stack data, on the other hand, refers to variables that exist within a scope of a function; that is, stack data refers to memory allocates at runtime for local (automatic) variables, e.g., double_array in the above example. Heap data is data that dynamically allocates at runtime (e.g., pchar above). This data remain in memory so long as it either being freed explicitly or until the program terminates.

Runtime Loading

To execute a program, the kernel maps all segments in an executable file directly into virtual memory. The kernel also does additional work in assigning different permission to memory regions based on the location of each segment. That is, for a segment that remains unchanged (e.g., text segment) the kernel assigns read-only permission to that particular block of memory. Similarly, for a segment that is bound to change (e.g., data segment) the kernel assigns read and write permission to that block of memory.

For a program that is dynamically linked, the kernel has to perform an additional work of maintaining a single copy of the library in virtual memory to be shared by different process of the same program. That is, multiple copies of a program can be run simultaneous and yet they all share one common library.

Conclusion

In this article, we have brief described the runtime environment of the UNIX operating system. We first examine the runtime process which is required to start and terminate a program. Next, we survey various object file formats that are commonly used in today modern operating systems. Knowing the format, in turn, provides us an easy way to see how an executable file is mapped into memory.

Suggested Reading

Peter van der Linden, ``Expert C Programming: Deep C Secrets,'' SunSoft Press, 1994. An excellent (and not to mention entertaining) book on C. This book should not be used as reference, but rather in conjunction with, say, K&R.

W. Richard Stevens, ``Advanced Programming in the UNIX Environment,'' Addison Wesley, 1992. Everything you will ever want to know about programming in the UNIX environment. A must-have book for UNIX programmers.

Executable and Linkable Format Specification. This is only useful if you want to know the format of an ELF file.

Gintaras Gircys, ``Understanding and Using COFF,'' O'Reilly & Associates, 1988. Only useful if you need to know about COFF.

Various man-pages. In particular, a.out(5), coff(5), elf(3).

Nguyen Trung

[email protected]

For discussion on this column, join [email protected]

Copyright © 1996 by VACETS and Nguyen Trung

Other Articles

How high can you suck?

Asynchronous Transfer Mode (ATM) - An analogy

Other Links

VACETS Home Page

VACETS Electronic Newsletter

VACETS FTP Site