From:
anon-389701
Views: 144
Comments: 0
Professional Linux Programming ,capitalism and friedman free ebooks, regional library, lockport public library new york, library nantucket ma
From:
anon-390435
Views: 336
Comments: 0
Assembly Language Step-by-step: Programming with DOS and Linux (with CD-ROM) ,cosmos southern maryland library, world library publicatons, pines library system covington, philadelphia public library
Burberry Bags
(6 months ago)
I'm definitely enjoying the information. All <a href="http://http://fashion-style-snob.blogspot.com/">Burberry Handbags </a> is created with only the very best materials and top-quality engineering and design. Pandora charms style is the look that many fashionable folks are targeting.
<a href="http://style-snob-club.blogspot.com/">Burberry Outlet</a>
<a href="http://style-snob-zone.blogspot.com/">Gucci Handbags</a>
<a href="http://http://women-style-snob-club.blogspot.com/">Louis Vuitton Outlet</a>
<a href="http://fashion-style-snob.blogspot.com/">Chanel Outlet</a>
<a href="http://women-style-snob-club.blogspot.com/">True Religion Jeans</a>
<a href="http://blog.style-snob.com/">Christian Louboutin Boots</a>
Burberry Bags
(6 months ago)
Burberry Sport series of [url=http://www.burberry1856.net/]Burberry Bags[/url] are different from the previous concept, its design inspired by Burberry concept of motion, reflecting the brand innovation and rejuvenation of the spirit.[url=http://www.burberry1856.net/]Burberry On Sale[/url] As Christopher Bailey said: "the campaign is an extension of a human nature! We truly want to explore Burberry combines technical, functional, and sport-related and absorb all the elements of modern and innovative design.
Burberry Bags
(6 months ago)
Burberry Sport series of <a href="http://www.burberry1856.net/">Burberry Bags</a> are different from the previous concept, its design inspired by Burberry concept of motion, reflecting the brand innovation and rejuvenation of the spirit.<a href="http://www.burberry1856.net/">Burberry On Sale</a> As Christopher Bailey said: "the campaign is an extension of a human nature! We truly want to explore Burberry combines technical, functional, and sport-related and absorb all the elements of modern and innovative design.
Slide 1: Introduction
Linux has always provided a rich programming environment, and it has only grown richer. Two new compilers, egcs and pgcs, joined the GNU project’s gcc, the original Linux compiler. In fact, as this book went to press, the Free Software Foundation, custodians of the GNU project, announced that gcc would be maintained by the creators and maintainers of egcs. A huge variety of editors stand alongside the spartan and much-maligned vi and emacs’ marvelous complexity. Driven largely by the Linux kernel, GNU’s C library has evolved so dramatically that a new version, glibc (also known as libc6) has emerged as the standard C library. Linux hackers have honed the GNU project’s always serviceable development suite into powerful tools. New widget sets have taken their place beside the old UNIX standbys. Lesstif is a free, source-compatible implementation of Motif 1.2; KDE, the K Desktop Environment based on the Qt class libraries from TrollTech, answers the desktop challenge posed by the X Consortium’s CDE (Common Desktop Environment).
What This Book Will Do for You
In this book, we propose to show you how to program in, on, and for Linux. We’ll focus almost exclusively on the C language because C is still Linux’s lingua franca. After introducing you to some essential development tools, we dive right in to system programming, followed by a section on interprocess communication and network programming. After a section devoted to programming Linux’s user interface with both text-based and graphical tools (the X Window system), a section on specialized topics, including shell programming, security considerations, and using the GNU project’s gdb debugger, rounds out the technical discussion. We close the book with three chapters on a topic normally disregarded in programming books: delivering your application to users. These final chapters show you how to use package management tools such as RPM, how to create useful documentation, and discuss licensing issues and options. If we’ve done our job correctly, you should be well prepared to participate in the great sociological and technological phenomenon called “Linux.”
Intended Audience
Programmers familiar with other operating systems but new to Linux get a solid introduction to programming under Linux. We cover both the tools you will use and the environment in which you will be working.
Slide 2: 2
Linux Programming UNLEASHED
Experienced UNIX programmers will find Linux’s programming idioms very familiar. What we hope to accomplish for this group is to highlight the differences you will encounter. Maximum portability will be an important topic because Linux runs on an ever-growing variety of platforms: Intel i386, Sun Sparcs, Digital Alphas, MIPS processors, Power PCs, and Motorola 68000-based Macintosh computers. Intermediate C programmers will also gain a lot from this book. In general, programming Linux is similar to programming any other UNIX-like system, so we start you on the path toward becoming an effective UNIX programmer and introduce you to the peculiarities of Linux/UNIX hacking.
Linux Programming Unleashed, Chapter by Chapter
This is not a C tutorial, but you will get a very quick refresher. You will need to be able to read and understand C code and understand common C idioms. Our selection of tools rarely strays from the toolbox available from the GNU project. The reason for this is simple: GNU software is standard equipment in every Linux distribution. The first seven chapters cover setting up a development system and using the standard Linux development tools: • gcc • make • autoconf • diff • patch • RCS • emacs The next section introduces system programming topics. If you are a little rusty on the standard C library, Chapter 9 will clear the cobwebs. Chapter 10 covers Linux’s file manipulation routines. Chapter 11 answers the question, “What is a process?” and shows you the system calls associated with processes and job control. We teach you how to get system information in Chapter 12, and then get on our editorial soapbox in Chapter 13 and lecture you about why error-checking is A Good Thing. Of course, we’ll show you how to do it, too. Chapter 14 is devoted to the vagaries of memory management under Linux.
Slide 3: INTRODUCTION
3
We spend four chapters on various approaches to interprocess communication using pipes, message queues, shared memory, and semaphores. Four more chapters show you how to write programs based on the TCP/IP network protocol. After a general introduction to creating and using programming libraries in Chapter 24 (including the transition from libc5 to libc6), we cover writing device drivers and kernel modules in Chapter 25, because considerable programming energy is spent providing kernel support for the latest whiz-bang hardware device or system services. User interface programming takes up the next eight chapters. Two chapters cover character-mode programming; first the hard way with termcap and termios, and then the easier way using ncurses. After a quick introduction to X in Chapter 28, Chapter 29 focuses on using the Motif and Athena widget sets. Programming X using the GTK library is Chapter 30’s subject, followed by Qt (the foundation of KDE) in Chapter 31, and Java programming in Chapter 32. For good measure, we also cover 3D graphics programming using OpenGL. The next section of the book covers three special-purpose topics. Chapter 34 examines bash shell programming. We deal with security-related programming issues in Chapter 35, and devote Chapter 36 to debugging with gdb. The book ends by showing you the final steps for turning your programming project over to the world. Chapter 37 introduces you to tar and the RPM package management tool. Documentation is essential, so we teach you how to write man pages and how to use some SGML-based documentation tools in Chapter 38. Chapter 39, finally, looks at the vital issue of software licensing.
Slide 4: 4
Slide 5: The Linux Programming Toolkit
PART
I
IN THIS PART
• Overview 7 13 • Setting Up a Development System • Using GNU cc 39 53 • Project Management Using GNU make • Creating Self-Configuring Software with autoconf 65 • Comparing and Merging Source Files • Version Control with RCS 103 115 85
• Creating Programs in Emacs
Slide 7: Overview
by Kurt Wall
CHAPTER 1
IN THIS CHAPTER
• The Little OS That Did • The Little OS That Will • A Brief History of Linux • Linux and UNIX 9 10 10 8 8 9
• Programming Linux
• Why Linux Programming?
Slide 8: 8
The Linux Programming Toolkit PART I
Linux has arrived, an astonishing feat accomplished in just over eight years! 1998 was the year Linux finally appeared on corporate America’s radar screens.
The Little OS That Did
It began in March 1998, when Netscape announced that they would release the source code to their Communicator Internet suite under a modified version of the GNU project’s General Public License (GPL). In July, two of the world’s largest relational database vendors, Informix and Oracle, announced native Linux ports of their database products. In August, Intel and Netscape took minority stakes in Red Hat, makers of the marketleading Linux distribution. IBM, meanwhile, began beta testing a Linux port of DB/2. Corel Corporation finally ported their entire office suite to Linux and introduced a line of desktop computers based on Intel’s StrongARM processor and a custom port of Linux. These developments only scratch the surface of the major commercial interest in Linux. Note
As this book went to press, Red Hat filed for an initial public offering (IPO) of their stock. It is a delicious irony that a company that makes money on a free operating system is going to become a member of corporate America.
I would be remiss if I failed to mention Microsoft’s famous (or infamous) Halloween documents. These were leaked internal memos that detailed Microsoft’s analysis of the threat Linux posed to their market hegemony, particularly their server operating system, Windows NT, and discussed options for meeting the challenge Linux poses.
The Little OS That Will
As a server operating system, Linux has matured. It can be found running Web servers all over the world and provides file and print services in an increasing number of businesses. An independent think tank, IDG, reported that Linux installations grew at a rate of 212 percent during 1998, the highest growth rate of all server operating systems including Windows NT. Enterprise-level features, such as support for multi-processing and large file-system support, continue to mature, too. The 2.2 kernel now supports up to sixteen processors (up from four in the 2.0 series kernels). Clustering technology, known as Beowulf, enables Linux users to create systems of dozens or hundreds of inexpensive, commodity personal computers that, combined, crank out supercomputer level processing speed very inexpensively compared to the cost of, say, a Cray, an SGI, or a Sun.
Slide 9: Overview CHAPTER 1
9
On the desktop, too, Linux continues to mature. The KDE desktop provides a GUI that rivals Microsoft Windows for ease of use and configurability. Unlike Windows, however, KDE is a thin layer of eye candy on top of the operating system. The powerful command-line interface is never more than one click away. Indeed, as this book went to press, Caldera Systems released version 2.2 of OpenLinux, which contained a graphical, Windows-based installation procedure! No less than four office productivity suites exist or will soon be released: Applixware, Star Office, and Koffice, part of the KDE project, are in active use. Corel is finishing up work on their office suite, although WordPerfect 8 for Linux is already available. On top of the huge array of applications and utilities available for Linux, the emergence of office applications every bit as complete as Microsoft Office establishes Linux as a viable competitor to Windows on the desktop.
1
OVERVIEW
A Brief History of Linux
Linux began with this post to the Usenet newsgroup comp.os.minix, in August, 1991, written by a Finnish college student:
Hello everybody out there using minixI’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones.
That student, of course, was Linus Torvalds and the “hobby” of which he wrote grew to what is known today as Linux. Version 1.0 of the kernel was released on March 14, 1994. Version 2.2, the current stable kernel release, was officially released on January 25, 1999. Torvalds wrote Linux because he wanted a UNIX-like operating system that would run on his 386. Working from MINIX, Linux was born.
Linux and UNIX
Officially and strictly speaking, Linux is not UNIX. UNIX is a registered trademark, and using the term involves meeting a long list of requirements and paying a sizable amount of money to be certified. Linux is a UNIX clone, a work-alike. All of the kernel code was written from scratch by Linus Torvalds and other kernel hackers. Many programs that run under Linux were also written from scratch, but many, many more are simply ports of software from other operating systems, especially UNIX and UNIX-like operating systems. More than anything else, Linux is a POSIX operating system. POSIX is a family of standards developed by the Institute of Electrical and Electronic Engineers (IEEE) that define a portable operating system interface. Indeed, what makes Linux such a high quality UNIX clone is Linux’s adherence to POSIX standards.
Slide 10: 10
The Linux Programming Toolkit PART I
Programming Linux
As Linux continues to mature, the need for people who can program for it will grow. Whether you are a just learning to program or are an experienced programmer new to Linux, the array of tools and techniques can be overwhelming. Just deciding where to begin can be difficult. This book is designed for you. It introduces you to the tools and techniques commonly used in Linux programming. We sincerely hope that what this book contains gives you a solid foundation in the practical matters of programming. By the time you finish this book, you should be thoroughly prepared to hack Linux.
Why Linux Programming?
Why do people program on and for Linux? The number of answers to that question is probably as high as the number of people programming on and for Linux. I think, though, that these answers fall into several general categories. First, it is fun—this is why I do it. Second, it is free (think beer and speech). Third, it is open. There are no hidden interfaces, no undocumented functions or APIs (application programming interfaces), and if you do not like the way something works, you have access to the source code to fix it. Finally, and I consider this the most important reason, Linux programmers are part of a special community. At one level, everyone needs to belong to something, to identify with something. This is as true of Windows programmers as it is of Linux programmers, or people who join churches, clubs, and athletic teams. At another, more fundamental level, the barriers to entry in this community are based on ability, skill, and talent, not money, looks, or who you know. Linus Torvalds, for example, is rarely persuaded to change the kernel based on rational arguments. Rather, working code persuades him (he often says “Show me the code.”). I am not supposing or proposing that Linux is a meritocracy. Rather, one’s standing in the community is based on meeting a communal need, whether it is hacking code, writing documentation, or helping newcomers. It just so happens, though, that doing any of these things requires skill and ability, as well as the desire to do them. As you participate in and become a member of Linux’s programming community, we hope, too, that you will discover that it is fun and meaningful as well. I think it is. In the final analysis, Linux is about community and sharing as much as it is about making computers do what you want.
Slide 11: Overview CHAPTER 1
11
Summary
This chapter briefly recounted Linux’s history, took a whirlwind tour of the state of Linux and Linux programming today, and made some reasonable predictions about the future of Linux. In addition, it examined Linux’s relationship to UNIX and took a brief, philosophical look at why you might find Linux programming appealing.
1
OVERVIEW
Slide 12: 12
Slide 13: Setting Up a Development System
CHAPTER 2
by Mark Whitis
IN THIS CHAPTER
• Hardware Selection 14 15 • Processor/Motherboard
• User Interaction Hardware: Video, Sound, Keyboard, and Mouse 19 • Keyboard and Mouse 23
• Communication Devices, Ports, and Buses 24 • Storage Devices 29 30 33
• External Peripherals • Complete Systems • Laptops 34 34
• Installation
Slide 14: 14
The Linux Programming Toolkit PART I
Hardware Selection
This section is, of necessity, somewhat subjective. Choice of a system depends largely on the developer’s individual needs and preferences. This section should be used in conjunction with the Hardware Compatibility HOWTO, as well as the more specialized HOWTO documents. The latest version is online at http://metalab.unc.edu/LDP/HOWTO/ Hardware-HOWTO.html; if you do not have Net access, you will find a copy on the accompanying CD-ROM or in /usr/doc/HOWTO on most Linux systems (if you have one available). The Hardware HOWTO often lists specific devices that are, or are not, supported, or refers you to documents that do list them. This section will not try to list specific supported devices (the list would be way too long and would go out of date very rapidly) except where I want to share specific observations about a certain device based on my own research or experience. Internet access is strongly recommended as a prerequisite to buying and installing a Linux system. The latest versions of the HOWTO documents can be found on the Net at Linux Online (http://www.linux.org/) in the Support section. The Projects section has many useful links to major development projects, including projects to support various classes of hardware devices. If you do not have Net access, the HOWTO documents are on the accompanying Red Hat Linux CDs (Disc 1 of 2) in the /doc/HOWTO directory.
Considerations for Selecting Hardware
I will try to give you an idea of what is really needed and how to get a good bang for your buck rather than how to get the most supercharged system available. You may have economic constraints or you may prefer to have two or more inexpensive systems instead of one expensive unit. There are many reasons for having two systems, some of which include the following: • To have a separate router/firewall • To have a separate “crash and burn” system • To have a system that boots one or more other operating systems • To have a separate, clean system to test installation programs or packages (RPM or Debian) if you are preparing a package for distribution • To have a separate untrusted system for guests if you are doing sensitive work • To have at least one Linux box to act as a server that runs Linux 24 hours a day Most of the millions of lines of Linux code were probably largely developed on systems that are slower than the economy systems being sold today. Excessive CPU power can be detrimental on a development station because it may exacerbate the tendency of some
Slide 15: Setting Up a Development System CHAPTER 2
15
developers to write inefficient code. If, however, you have the need and economic resources to purchase a system more powerful than I suggest here, more power to you (please pardon the pun). A good developer’s time is very valuable, and the extra power can pay for itself if it saves even a small percentage of your time. The suggestions in this chapter will be oriented toward a low- to mid-range development workstation. You can adjust them upward or downward as appropriate. I do not want you to be discouraged from supporting the Linux platform, in addition to any others you may currently support, by economic considerations. Basic development activities, using the tools described in this book, are not likely to demand really fast CPUs; however, other applications the developer may be using, or even developing, may put additional demands on the CPU and memory. Editing and compiling C programs does not require much computing horsepower, particularly since make normally limits the amount of code that has to be recompiled at any given time. Compiling C++ programs, particularly huge ones, can consume large amounts of computing horsepower. Multimedia applications demand more computing power than edit and compile cycles. The commercial Office suites also tend to require large amounts of memory. If you like to use tracepoints to monitor variables by continuous single stepping, that could heavily consume CPU cycles. Some people will recommend that you choose a system that will meet your needs for the next two or three years. This may not be a wise idea. The cost of the computing power and features you will need a year from now will probably drop to the point where it may be more cost effective for you to buy what you need today, and wait until next year to buy what you need then. If you do not replace your system outright, you may want to upgrade it piecemeal as time passes; if that is the case, you don’t want to buy a system with proprietary components.
2
SETTING UP A DEVELOPMENT SYSTEM
Processor/Motherboard
One of the most important features of a motherboard is its physical form factor, or its size and shape and the locations of key features. Many manufacturers, particularly major brands, use proprietary form factors, which should be avoided. If you buy a machine that has a proprietary motherboard and you need to replace it due to a repair or upgrade, you will find your selection limited (or non-existent) and overpriced. Some manufacturers undoubtedly use these proprietary designs to lower their manufacturing cost by eliminating cables for serial, parallel, and other I/O ports; others may have more sinister motives. The older AT (or baby AT) form factor motherboards are interchangeable, but have very little printed circuit board real estate along the back edge of the machine on which to
Slide 16: 16
The Linux Programming Toolkit PART I
mount connectors. The case only has holes to accommodate the keyboard and maybe a mouse connector. The newer ATX standard has many advantages. Although an ATX motherboard is approximately the same size and shape as a baby AT motherboard (both are about the same size as a sheet of 8-1/2”×11” writing paper), the ATX design rotates the dimensions so the long edge is against the back of the machine. An ATX case has a standard rectangular cutout that accommodates metal inserts, which have cutouts that match the connectors on a particular motherboard. The large cutout is large enough to easily accommodate the following using stacked connectors: • 2 serial ports • 1 parallel port • keyboard port • mouse port • 2 USB ports • VGA connector • audio connectors Also, ATX moves the CPU and memory where they will not interfere with full-length I/O cards, although some manufacturers still mount some internal connectors where they will interfere. Many case manufacturers have retooled. More information about the ATX form factor can be found at http://www.teleport.com/~atx/. Figure 2.1 illustrates the physical difference between AT and ATX form factors. FIGURE 2.1
AT versus ATX motherboard form factors.
IO CPU Memory
AT
ATX
Onboard I/O
A typical Pentium or higher motherboard will have two serial, one parallel, one keyboard, one mouse, IDE, and floppy ports onboard; all of which are likely to work fine with Linux. It may have additional ports onboard that will have to be evaluated for compatibility, including USB, SCSI, Ethernet, Audio, or Video.
Slide 17: Setting Up a Development System CHAPTER 2
17
Processor
For the purposes of this section, I will assume you are using an Intel or compatible processor. The use of such commodity hardware is likely to result in a lower-cost system with a wider range of software available. There are a number of other options available, including Alpha and Sparc architectures. Visit http://www.linux.org/ if you are interested in support for other processor architectures. Cyrix and AMD make Pentium compatible processors. There have been some compatibility problems with Cyrix and AMD processors, but these have been resolved. I favor Socket 7 motherboards, which allow you use Intel, Cyrix, and AMD processors interchangeably. There are also some other companies that make Pentium compatible processors that will probably work with Linux but have been less thoroughly tested. IDT markets the Centaur C6, a Pentium compatible processor, under the unfortunate name “Winchip,” which apparently will run Linux, but I don’t see the Linux community lining up to buy these chips. IBM used to make and sell the Cyrix chips under its own name in exchange for the use of IBM’s fabrication plant; these may be regarded simply as Cyrix chips for compatibility purposes. Future IBM x86 processors will apparently be based on a different core. The Pentium II, Pentium III, Xeon, and Celeron chips will simply be regarded as Pentium compatible CPUs. There have been some very inexpensive systems made recently that use the Cyrix MediaGX processor. These systems integrate the CPU, cache, Video, Audio, motherboard chipset, and I/O onto two chips. The downside is that you cannot replace the MediaGX with another brand of processor and that the video system uses system memory for video. This practice slightly reduces the available system memory and uses processor/memory bandwidth for screen refresh, which results in a system that is about a third slower than you would expect based on the processor speed. The advantages are the lower cost and the fact that all Media GX systems are basically the same from a software point of view. Therefore, if you can get one Media GX system to work, all others should work. Video support for the Media GX is provided by SuSE (go to http://www.suse.de/XSuSE/ XSuSE_E.html for more info) and there is a MediaGX video driver in the KGI. Audio support has not been developed at the time of this writing, although it may be available by the time this book is published. My primary development machines have been running Linux for a couple years on Cyrix P150+ processors (equivalent to a 150MHz Pentium) and upgrading the processor is still among the least of my priorities. Given current processor prices, you will probably want to shoot for about twice that speed, adjusting up or down based on your budget and availability.
2
SETTING UP A DEVELOPMENT SYSTEM
Slide 18: 18
The Linux Programming Toolkit PART I
The Linux community seems to be waiting with interest to see the processor being developed by Transmeta, the company that hired Linus Torvalds and some other Linux gurus (including my friend, Jeff Uphoff). The speculation, which is at least partially corroborated by the text of a patent issued to the company, is that this processor will have an architecture that is optimized for emulating other processors by using software translators and a hardware translation cache. It is suspected that this chip may be a very good platform for running Linux. Linux might even be the native OS supported on this chip under which other operating systems and processor architectures are emulated.
BIOS
For a basic workstation, any of the major BIOS brands (AWARD, AMIBIOS, or Phoenix) may suffice. The AMI BIOS has some problems that complicate the use of I/O cards that have a PCI-to-PCI bridge such as the Adaptec Quartet 4 port ethernet cards. The AWARD BIOS gives the user more control than does AMIBIOS or Phoenix. A flash BIOS, which allows the user to download BIOS upgrades, is desirable and is standard on most modern systems. Older 386 and 486 systems tend not to have a flash BIOS and may also have the following problems: • An older BIOS that may not be Y2K compliant • May not support larger disk drives • May not support booting off of removable media
Memory
64MB is reasonable for a typical development system. If you are not trying to run X windows, you may be able to get by with only 8MB for a special purpose machine (such as a crash and burn system for debugging device drivers). Kernel compile times are about the same (less than1.5 minutes) with 32MB or 64MB (although they can be much longer on a system with 8MB). If you want to run multimedia applications (such as a Web browser), particularly at the same time you are compiling, expect the performance to suffer a bit if you only have 32MB. Likewise, if you are developing applications that consume lots of memory, you may need more RAM. This page was written on a system with 32MB of RAM but one of the other authors’ primary development system has ten times that amount of memory to support Artificial Intelligence work.
Enclosure and Power Supply
Select an enclosure that matches your motherboard form factor and has sufficient drive bays and wattage to accommodate your needs. Many case manufacturers have retooled
Slide 19: Setting Up a Development System CHAPTER 2
19
their AT form factor cases to accommodate the ATX motherboard; if you order an AT case, you may receive a newer ATX design with an I/O shield that has cutouts for AT keyboard and mouse ports. For most applications, Mini-Tower, Mid-Tower, or FullTower cases are likely to be the preferred choices. For some applications you may want server or rack mount designs.
NOTE
The power supply connectors are different for AT and ATX power supplies.
2
SETTING UP A DEVELOPMENT SYSTEM
If you are building a mission-critical system, be aware that some power supplies will not restore power to the system after a power outage. You may also be interested in a miniredundant power supply; these are slightly larger than a normal ATX or PS/2 power supply but some high end cases, particularly rack mount and server cases, are designed to accommodate either a mini-redundant or a regular ATX or PS/2 supply.
User Interaction Hardware: Video, Sound, Keyboard, and Mouse
The devices described in this section are the primary means of interacting with the user. Support for video cards and monitors is largely a function of adequate information being available from the manufacturer or other sources. Monitors usually require only a handful of specifications to be entered in response to the Xconfigurator program, but support for a video card often requires detailed programming information and for someone to write a new driver or modify an existing one. Sound cards require documentation and programming support, like video cards, but speakers need only be suitable for use with the sound card itself.
Video Card
If you only need a text mode console, most VGA video adapters will work fine. If you need graphics support, you will need a VGA adapter that is supported by Xfree86, SVGAlib, vesafb, and/or KGI. Xfree86 is a free open-source implementation of the X Windowing System, which is an open-standard-based windowing system that provides display access to graphical applications running on the same machine or over a network. Xfree86 support is generally
Slide 20: 20
The Linux Programming Toolkit PART I
necessary and sufficient for a development workstation. For more information, visit http://www.xfree86.org/. For drivers for certain new devices, check out XFcom (formerly XSuSE) at http://www.suse.de/XSuSE/. SVGAlib is a library for displaying full screen graphics on the console. It is primarily used for a few games and image viewing applications, most of which have X Windowing System versions or equivalents. Unfortunately, SVGAlib applications need root privileges to access the video hardware so they are normally installed suid, which creates security problems. GGI, which stands for Generic Graphics Interface, tries to solve the problems of needing root access, resolve conflicts between concurrent SVGAlib and X servers, and provide a common API for writing applications to run under both X and SVGAlib. A part of GGI, called KGI, provides low-level access to the framebuffer. GGI has also been ported to a variety of other platforms so it provides a way of writing portable graphics applications, although these applications are apparently limited to a single window paradigm. Documentation is very sparse. This package shows future promise as the common lowlevel interface for X servers and SVGAlib and a programming interface for real-time action games. OpenGL (and its predecessor GL) has long been the de facto standard for 3D modeling. OpenGL provides an open API but not an open reference implementation. Mesa provides an open source (GPL) implementation of an API very similar to OpenGL that runs under Linux and many other platforms. Hardware acceleration is available for 3Dfx Voodoo–based cards. For more information on Mesa, visit http://www.mesa3d.org/. Metrolink provides a licensed OpenGL implementation as a commercial product; visit http://www.metrolink.com/opengl/ for more information. Frame buffer devices provide an abstraction for access to the video buffer across different processor architectures. The Framebuffer HOWTO, at http://www.tahallah.demon.co.uk/programming/ HOWTO-framebuffer-1.0pre3.html, provides more information. Vesafb provides frame buffer device support for VESA 2.0 video cards on Intel platforms. Unfortunately, the VESA specification appears to be a broken specification that only works when the CPU is in real mode instead of protected mode, so switching video modes requires switching the CPU out of protected mode to run the real mode VESA VGA BIOS code. Such shenanigans may be common in the MS Windows world and may contribute to the instability for which that operating system is famous. KGIcon allows the use of KGI supported devices as framebuffer devices.
Slide 21: Setting Up a Development System CHAPTER 2
21
*Tip
Some companies offer commercial X servers for Linux and other UNIXcompatible operating systems. Among them are Accelerated-X (http://www.xigraphics.com/) and Metro-X (http://www.metrolink.com/).
AGP (Accelerated Graphics Port) provides the processor with a connection to video memory that is about four times the speed of the PCI bus and provides the video accelerator with faster access to texture maps stored in system memory. Some AGP graphics cards are supported under Linux.
2
SETTING UP A DEVELOPMENT SYSTEM
*Tip
To determine which video cards and monitors are supported under Red Hat run /usr/X11/bin/Xconfigurator --help as root on an existing system.
You will probably want at least 4MB of video memory to support 1280×1024 at 16bpp (2.6MB). You will need 8MB to support 1600×1200 at 32bpp. Some 3D games might benefit from extra memory for texture maps or other features if they are able to use the extra memory. The X server will use some extra memory for a font cache and to expand bitmap. If you want to configure a virtual screen that is larger than the physical screen (the physical screen can scroll around the virtual screen when you move the cursor to the edge) be sure to get enough memory to support the desired virtual screen size. The FVWM window manager will create a virtual desktop that is by default four times the virtual screen size, and will switch screens if you move the cursor to the edge of the screen and leave it there momentarily; instead of using extra video memory, this feature is implemented by redrawing the whole screen. The X server may use system memory (not video memory) for “backing store” to allow it to redraw partially hidden windows faster when they are revealed. If you use high resolution or pixel depth (16bpp or 32bpp) screens, be aware that backing store will place additional demands on system memory. There are some distinct advantages to installing a video card that supports large resolution and pixel depth in your Linux system. If you intend to make good use of the X server, this can be invaluable. Since Linux can easily handle many different processes at
Slide 22: 22
The Linux Programming Toolkit PART I
once, you will want to have enough screen real estate to view multiple windows. A video card that can support 1280×1024 resolution will satisfy this nicely. The other advantage to a good video card is the pixel depth. Not only do the newer window managers run more smoothly with the better pixel depth, it is also very useful if you want to use your system for graphics work. Your monitor also has to be able to support the resolution of your video card—otherwise you could not take full advantage of the capabilities your system offers. (The following section discusses monitor selection in more detail.) It is very important that you check the specifications of your hardware when deciding which video card/monitor combination to use so that the two will work well together. Also, it is always important to check out the hardware compatibility lists for Linux.
Monitor
Almost any monitor that is compatible with your video card will work under Linux if you can obtain the specifications, particularly the vertical and horizontal refresh rates or ranges supported and the video bandwidth. Note that bigger is not always better. What matters is how many pixels you can put on the screen without sacrificing quality. I prefer the 17” monitor I have on my development machine at one office to the very expensive 20” workstation monitor that sits next to it. I prefer many 15” monitors to their 17” counterparts. If you have trouble focusing up close or want to sit very far away from your monitor, you may need a large monitor, but otherwise a quality smaller monitor closer to your head may give you equal or better quality at a lower price. As discussed in the preceding section, the monitor and video card selections are very closely related. It is good to test your monitor selection for clarity. One of the main contributing factors to the clarity of a monitor is the dot pitch—the smaller the spacing between pixels, the better. However, this can boost the price of a monitor. The other issue here, again, is related to the video card. One monitor tested with different video cards can have quite different results. A video card aimed more for business use (such as a Matrox Millenium G200) will often produce a crisper image than a video card that is intended for game use (such as Diamond V550). This is because some cards are optimized for good 2D, 3D, or crisp text, but are not optimized for all three. I recommend running your monitor at as close to 60Hz as you can even if it can run at 70Hz or higher. In some cases a monitor may look better at 70Hz, particularly if you are hyped up on massive doses of caffeine and your monitor has short persistence phosphors, but I find that usually it looks better at 60Hz. The reason for this is the ubiquitous 60Hz interference from power lines, transformers, and other sources. Not only can this interference be picked up in the cables and video circuitry but it also affects the electron beams in the monitor’s cathode ray tube (CRT) directly. Shielding is possible but expensive and
Slide 23: Setting Up a Development System CHAPTER 2
23
is not likely to be found in computer video monitors. If your image is visibly waving back and forth, this is likely to be your problem. If the beat frequency (the difference between the two frequencies) between the 60hz interference and the refresh rate is close to zero, the effect will slow and become imperceptible. But if the beat frequency is larger you will have instabilities that will be either very perceptible or more subtle but irritating. So a beat frequency of 0.1Hz (60Hz versus 60.1Hz) is likely to be fine but a beat frequency of 10Hz (60Hz versus 70Hz) is likely to be very annoying. Some countries use a frequency other than 60Hz for their power grid; in those countries, you would need to match the refresh rate to the local power line frequency to avoid beat frequency problems. Incidentally, some monitors deliberately make the image wander around the screen slightly at a very slow rate to prevent burn-in; as long as this is very slow, it is imperceptible (your own head movements are likely to be far greater). The video configuration in Linux gives you much latitude in how you want to set up your hardware. It is important to remember to have your settings within the specified ranges for your hardware. Pushing the limits can result in poor performance or even the destruction of your hardware.
2
SETTING UP A DEVELOPMENT SYSTEM
Sound Cards
Linux supports a variety of sound cards, particularly Sound Blaster compatible (but not all sound cards that claim to be compatible are—some use software assisted emulation), older ESS chip-based cards (688 and 1688), Microsoft Sound System– based cards, and many Crystal (Cirrus Logic) based cards. Consult the Hardware Compatibility HOWTO document, Four Front Technologies Web site (at http://www.4front-tech.com/), or the Linux kernel sources (browsable on the Net at http://metalab.unc.edu/ linux-source/) for more information. Four Front Technologies sells a package that includes sound drivers for many cards that are not supported by the drivers shipped with the kernel. Most newer sound cards seem to be PnP devices. Support for PnP cards is available using the ISAPnP utilities mentioned above or the Four Front drivers.
Keyboard and Mouse
USB keyboards and mice are not recommended at this time; see “USB and Firewire (IEEE 1394),” later in this chapter for more details. Normal keyboards that connect to a standard AT or PS/2 style keyboard port should work fine, although the unusual extra features on some keyboards may not work. Trackball, Glidepoint, and Trackpad pointing devices that are built in to the keyboard normally have a separate connection to a serial or PS/2 mouse port and may be regarded as separate mouse devices when considering
Slide 24: 24
The Linux Programming Toolkit PART I
software support issues. Normal PS/2 and serial mice are supported, including those that speak Microsoft, Mouse Systems, or Logitech protocols. Mouse support is provided by the gpm program and/or the X server. Many other pointing devices, including trackballs, Glidepoints, and Trackpads will work if they emulate a normal mouse by speaking the same communications protocol; some special features of newer trackpads, such as pen input and special handling of boarder areas, may not work. Many X applications require a three-button mouse, but gpm and the X server can be configured to emulate the extra middle button by chording both buttons on a two-button mouse.
Communication Devices, Ports, and Buses
This section contains information on various devices that provide communications channels. These channels can be used to communicate with other computers and with internal or external peripherals. The high-speed buses that connect expansion cards to the processor are included here. Neither the ISA bus nor the PCI bus will be covered in detail, although ISA Plug and Play devices and PCMCIA cards will have their own subsection since there are some special considerations. Plain ISA and PCI cards should work fine as long as there is a driver that supports that specific card. Most IDE controllers will work; for other IDE devices, see “Storage Devices,” later in this chapter. Devices that connect to a parallel (printer) port are discussed in their separate categories.
Modems
Most modems, with the exception of brain-dead winmodem types, modems that use the proprietary Rockwell Protocol Interface (RPI), or modems that depend on a software component for their functionality will work fine with Linux. Be aware, however, that there is a real difference between the more expensive professional models and the cheaper consumer grade models. Almost any modem will perform well on good quality phone lines, but on poor quality lines the distinction will become significant. That is why you will see people on the Net who are both pleased and extremely dissatisfied with the same inexpensive modems. It requires much more sophisticated firmware and several times as much processing power to resurrect data from a poor quality connection as it does to recover data from a good connection. Serious developers are likely to want a dedicated Internet connection to their small office or to their home. Some more expensive modems can operate in leased line mode. This allows you to create a dedicated (permanent) 33.6Kbps leased line Internet connection
Slide 25: Setting Up a Development System CHAPTER 2
25
over a unconditioned 2 wire (1 pair) dry loop. This can be handy if ISDN and xDSL are not available in your area. A dry loop is a leased telephone line with no line voltage, ringing signal, or dial tone that permanently connects two locations. It is sometimes referred to as a “burglar alarm pair.” These lines are very inexpensive for short distances. The average person working in a telco business office has no clue what these terms mean. Expect to pay $200 or more for a modem that supports this feature. Your chances of finding a pair of leased line modems that will work at 56K are not very good since only modems with a digital phone line interface are likely to have the software to handle 56K answer mode. I used a pair of leased line capable modems for a couple years over a wire distance of two or three miles, at a cost of about $15 per month; more information on how to set this up is available on my Web site (http:// www.freelabs.com/~whitis/unleashed/). It is also possible to run xDSL over a relatively short distance dry loop (I now use MVL, a variant of DSL which works better on longer lines and provides 768Kbps, on the same dry loop) even though xDSL is intended to be used with one of the modems located in the central office; this costs about $13,000 for 16 lines and the equipment is not, as far as I know, readily available in configurations that are economically viable for a small number of lines. If you can spread the capital cost over many lines, xDSL can be very economical compared to ISDN or T1 lines. In my example, a dry loop costs $15 per month and provides a 768K connection versus $75 per month for an ISDN line or $400 per month for a T1 line (these charges are for local loop only and do not include IP access). If you want to support incoming (dial-in or answer mode) 56K connections, you will need a modem with a digital phone line interface. Normally, ISPs use expensive modem racks that have a T1 line interface for this purpose, which is only economically viable if you are supporting dozens of lines. You might be able to find a modem that functions both as an ordinary modem and as an ISDN terminal adapter and can produce 56K answer mode modulation over an ISDN line. If you want to set up a voice mail or interactive voice response (IVR) system, you will probably want a modem that is capable of voice operation and is compatible with the vgetty software. Check the Mgetty+Sendfax with Vgetty Extensions (FAQ) document for voice modem recommendations. For fax operation, your software choices include HylaFAX, mgetty+sendfax, and efax. A modem that supports Class 2.0 FAX operation is preferred over one that can only do Class 1 fax. Class 1 modems require the host computer to handle part of the real time fax protocol processing and will malfunction if your host is too busy to respond quickly. Class 2.0 modems do their own dirty work. Class 2 modems conform to an earlier version of the Class 2.0 specification, which was never actually released as a standard.
2
SETTING UP A DEVELOPMENT SYSTEM
Slide 26: 26
The Linux Programming Toolkit PART I
The mgetty+sendfax and efax packages come with Red Hat 5.2. HylaFAX comes on the Red Hat Powertools CD. All three packages can be downloaded off the Internet. HylaFAX is more complicated to set up but is better for an enterprise fax server since it is server based and there are clients available for Linux, Microsoft Windows, MacOS, and other platforms. Table 2.1 summarizes fax capabilities. TABLE 2.1 FAX SUPPORT Class 1
HylaFax Sendfax Efax Yes No Yes
Class 2
Yes Yes Yes
Class 2.0
Yes Yes Support untested
Network Interface Cards
The Tulip chips are considered by many to be the best choice for a PCI-based ethernet card on a Linux system. They are fairly inexpensive, fast, reliable, and documented. There have been some problems lately, however. There have been frequent, often slightly incompatible, revisions to newer chips. The older chips, which were a safer choice, were discontinued (this is being reversed) and the line was sold to competitor Intel, and there was a shortage of cards. Many of these problems may be corrected by the time this book is released, however; check the Tulip mailing list archives for more details. If you need multiple ethernet interfaces in a single machine, Adaptec Quartet cards provide four Tulip-based ethernet ports on a single machine. One of my Web pages gives more information on using the Quartets under Linux. For an inexpensive ISA 10MB/s card, the cheap NE2000 clones usually work well. These cards tie up the CPU a bit more than more sophisticated designs when transferring data, but are capable of operating at full wire speed. (Don’t expect full wire speed on a single TCP connection such as an FTP transfer, however—you will need several simultaneous connections to get that bandwidth.) 3Com supports their ethernet boards under Linux, and Crystal (Cirrus Logic) offers Linux drivers for their ethernet controller chips. Most WAN card manufacturers also seem to provide Linux Drivers. SDL, Emerging Technologies, and Sangoma provide Linux drivers.
Slide 27: Setting Up a Development System CHAPTER 2
27
SCSI
Linux supports most SCSI controllers, including many RAID controllers, host adapters, almost all SCSI disks, most SCSI tape drives, and many SCSI scanners. Some parallel port–based host adapters are notable exceptions. Advansys supports their SCSI adapters under Linux; the drivers that ship with the kernel were provided by Advansys. The Iomega Jaz Jet PCI SCSI controller, which may be available at local retailers, is actually an Advansys controller and is a good value. It is a good idea not to mix disk drives and slow devices such as tape drives or scanners on the same SCSI bus unless the controller (and its driver) and all of the slow devices on the bus support a feature known as “disconnect-reconnect”; it is rather annoying to have your entire system hang up for 30 seconds or more while the tape rewinds or the scanner carriage returns. The SCSI HOWTO has more information on disconnect-reconnect.
2
SETTING UP A DEVELOPMENT SYSTEM
*Warning
Beware of cheap SCSI controllers, particularly those that do not use interrupts. In my limited experience with boards of this type, they often did not work at all or would cause the system to hang for several seconds at a time. This may be due to bugs in the driver for the generic NCR5380/NCR53c400 driver although in at least on case the card was defective. The SCSI controllers I had trouble with came bundled with scanners or were built-in on certain sound boards.
USB and Firewire (IEEE 1394)
USB and Firewire support are being developed. USB support is provided by a package called UUSBD. It is apparently possible to use a USB mouse if you have a supported USB controller (although you will need to download and install the code before you can run X) but keyboards don’t work at the time of this writing. It is probably too early to plan on using either of these on a development system except for tinkering. Links to these projects are on linux.org under projects.
Serial Cards (Including Multiport)
Standard PC serial ports are supported, on or off the motherboard. Very old designs that do not have a 16550A or compatible UART are not recommended but those are likely to be pretty scarce these days.
Slide 28: 28
The Linux Programming Toolkit PART I
Most intelligent multiport serial cards are supported, often with direct support from the manufacturer. Cyclades, Equinox, Digi, and GTEK are some of the companies that support their multiport boards under Linux. Equinox also makes an interesting variation on a serial port multiplexor that supports 16 ISA Modems (or cards that look exactly like modems to the computer) in an external chassis. Most dumb multiport serial cards also work, but beware of trying to put too many dumb ports in a system unless the system and/or the ports are lightly loaded. Byterunner (http://www.byterunner.com) supports their inexpensive 2/4/8 port cards under Linux; unlike many dumb multiport boards, these are highly configurable, can optionally share interrupts, and support all the usual handshaking signals.
IRDA
Linux support for IRDA (Infrared Data Association) devices is fairly new, so be prepared for some rough edges. The Linux 2.2 Kernel is supposed to have included IRDA support, but you will still need the irda-utils even after you upgrade to 2.2. The IRDA project’s home page is at http://www.cs.uit.no/linux-irda/. I suspect that most laptops that support 115Kbps SIR IRDA may emulate a serial port and won’t be too hard to get working.
PCMCIA Cards
Linux PCMCIA support has been around for a while and is pretty stable. A driver will need to exist for the particular device being used. If a device you need to use is not listed in the /etc/pcmcia/config file supplied on the install disks for your Linux distribution, installation could be difficult.
ISA Plug and Play
Although some kernel patches exist for Plug and Play, support for PnP under Linux is usually provided using the ISAPnP utilities. These utilities do not operate automatically, as you might expect for plug and play support. The good news is that this eliminates the unpredictable, varying behavior of what is often referred to more accurately as “Plug and Pray.” You run one utility, pnpdump, to create a sample configuration file with the various configurations possible for each piece of PnP hardware, and then you manually edit that file to select a particular configuration. Red Hat also ships a utility called sndconfig, which is used to interactively configure some PnP sound cards. Avoid PnP for devices that are needed to boot the system, such as disk controllers and network cards (for machines that boot off the network).
Slide 29: Setting Up a Development System CHAPTER 2
29
Storage Devices
Linux supports various storage devices commonly used throughout the consumer computer market. These include most hard disk drives and removable media such as Zip, CD-ROM/DVD, and tape drives.
Hard Disk
Virtually all IDE and SCSI disk drives are supported under Linux. Linux even supports some older ST506 and ESDI controllers. PCMCIA drives are supported. Many software and hardware RAID (Reliable Array of Independent Disks) configurations are supported to provide speed, fault tolerance, and/or very large amounts of disk storage. A full Red Hat 5.2 with Powertools and Gnome sampler installation and all source RPMs installed, but not unpacked, will take about 2.5GB of disk space.
2
SETTING UP A DEVELOPMENT SYSTEM
Removable Disks
More recent versions of the Linux kernel support removable media including Jaz, LS120, Zip, and other drives. Using these drives as boot devices can be somewhat problematic. My attempts to use a Jaz disk as a boot device were thwarted by the fact that the drive apparently destroyed the boot disk about once a month; this may have just been a defective drive. Problems with the LS120 included being unable to use an LS120 disk as a swap device because of incompatible sector sizes. Also be warned that there are software problems in writing a boot disk on removable media on one computer and using it to boot another living at a separate device address (for example, an LS120 might be the third IDE device on your development system but the first on the system to be booted).
CD-ROM/DVD
Almost all CD-ROM drives will work for data, including IDE, SCSI, and even many older proprietary interface drives. Some parallel port drives also work, particularly the Microsolutions Backpack drives (which can be used to install more recent versions of Red Hat). Some drives will have trouble being used as an audio CD player due to a lack of standardization of those functions; even fewer will be able to retrieve “red book” audio (reading the digital audio data directly off of an audio CD into the computer for duplication, processing, or transmission). Linux has support for many CD changers. The eject command has an option to select individual disks from a changer. I found that this worked fine on a NEC 4x4 changer. Recording of CD-R and CD-RW disks is done using the cdrecord program. The UNIX
Slide 30: 30
The Linux Programming Toolkit PART I
CD-Writer compatibility list at http://www.guug.de:8080/cgi-bin/winni/lsc.pl gives more information on which devices are compatible. Be warned that due to limitations of the CD-R drives, writing CDs is best done on very lightly loaded or dedicated machines; even a brief interruption in the data stream will destroy data, and deleting a very large file will cause even fast machines to hiccup momentarily. There are GUI front ends for burning CD’s available, including BurnIT and X-CD-Roast.
Tape Backup
A wide variety of tape backup devices are supported under Linux, as well as various other types of removable media. Linux has drivers for SCSI, ATAPI (IDE), QIC, floppy, and some parallel port interfaces. I prefer to use SCSI DAT (Digital Audio Tape) drives exclusively even though they can cost as much as a cheap PC. I have used Conner Autochanger DAT drives, and although I could not randomly select a tape in the changer under Linux, each time I ejected a tape the next tape would automatically be loaded. Other autochangers might perform differently.
*Warning
I caution against the use of compression on any tape device; read errors are common and a single error will cause the entire remaining portion of the tape to be unreadable.
External Peripherals
The devices in this section are optional peripherals that are normally installed outside the system unit. From a software perspective, the drivers for these devices usually run in user space instead of kernel space.
Printer
Printer support under Linux is primarily provided by the Ghostscript package (http://www.ghostscript.com/). Support for Canon printers is poor, probably due to Canon’s failure to make technical documentation available. Canon has refused to make documentation available for the BJC-5000 and BJC-7000 lines (which are their only inkjet printers that support resolutions suitable for good quality photographic printing). Most HP printers (and printers that emulate HP printers) are supported, due to HP
Slide 31: Setting Up a Development System CHAPTER 2
31
making documentation available, except for their PPA-based inkjet printers, for which they will not release the documentation. The Canon BJC-5000, Canon BJC-7000, and HP PPA based printers are all partially brain dead printers that apparently do not have any onboard fonts and rely on the host computer to do all rasterization. This would not be a problem for Linux systems (except for the unusual case of a real time system log printer) since Ghostscript is normally used as a rasterizer and the onboard fonts and other features are not used. Some printers may be truly brain dead and not have any onboard CPU; these might use the parallel port in a very nonstandard manner to implement low level control over the printer hardware. The HP720, HP820Cse, and HP1000 are PPA based printers. Partial support, in the form of a ppmtopba conversion utility, is available for some PPA printers based on reverse engineering. Some Lexmark inkjet printers might be supported, but many others are Windows-only printers. I have used a Lexmark Optra R+ laser printer with an Ethernet interface with Linux. It supports the LPD protocol so it is simply set up as a remote LPD device. A Linux box can act as a print server for Windows clients or act as a client for a Windows printer by using the Samba package. A Linux box can act as a print server for MacOS clients by using the Netatalk package. A Linux box running the ncpfs package can apparently serve as a print server for NetWare 2.x, 3.x, or 4.x clients with bindery access enabled, or print to a remote Netware printer. HP printers with JetDirect ethernet interfaces support LPD and will work as remote printers under Linux. Ghostscript can run on almost every operating system that runs on hardware with enough resources to function as a rasterizer. A single ghostscript driver (or PBM translator) is sufficient to support a printer on virtually every computer, including those running every UNIX-compatible operating system, MacOS, OS/2, and Windows 3.1, Windows 95, Windows 98, Windows NT, and many others. Ghostscript can coexist with, replace, or already is the native printing rasterizer (if any) on these operating systems and can integrate with the queuing system on almost all of these. Ghostscript can produce PBM (Portable BitMap) files. The use of a PBM translator can avoid various copyright issues since it does not have to be linked into a GPLed program. Therefore, the failure of printer manufacturers to provide Ghostscript drivers or PBM translators is reprehensible.
2
SETTING UP A DEVELOPMENT SYSTEM
TIP
More detailed information on printing under Linux can be found in the Linux Printing HOWTO.
Slide 32: 32
The Linux Programming Toolkit PART I
Scanners
Support for scanners is a bit sparse, although close to 100 different models from a couple dozen manufacturers are supported by the SANE package; manufacturers who not only fail to provide drivers themselves but also withhold documentation are culpable for this state of affairs. There have been various projects to support individual or multiple scanners under Linux. These have been eclipsed by the SANE package(http://www.mostang.com/sane/) which, no doubt, benefited from its predecessors. The name is a play on, and a potshot at, TWAIN, which passes for a standard in the Microsoft world. In TWAIN, the driver itself paints the dialog box that appears when you request a scan. This is not a “sane” way of doing things. It interferes with non-interactive scanning (such as from a command line, Web cgi, or production scanning applications), interferes with network sharing of a device, and interferes with making drivers that are portable across many platforms. SANE is able to do all of these things. In SANE, the driver has a list of attributes that can be controlled, and the application sets those attributes (painting a dialog box or parsing arguments as necessary). SANE has been ported to a variety of platforms including about 18 different flavors of UNIX and OS/2. SANE provides a level of abstraction for the low level SCSI interfaces, and abstractions are being worked on for a few other OS specific features (such as fork()) which interfere with portability to some platforms. SANE has not been ported to the Windows and MAC platforms, although there is no reason this can’t be done. Some have questioned the need to do this because the manufacturers ship drivers for these operating systems with most scanners. However, once SANE has been ported to these operating systems and a TWAIN to SANE shim has been written, there will be no legitimate reason for anyone to ever write another TWAIN driver again as long as the port and shim are distributed under license agreements that allow scanner manufacturers to distribute the software with their products.
Digital Cameras
There are programs to handle many hand-held digital cameras which will run Linux. Cameras that support compact flash or floppy disk storage of standard JPEG images should also work using those media to transfer the image data. A new application called gPhoto (http://gphoto.fix.no/gphoto/) supports about ten different brands of digital cameras. Some digital cameras may also be supported under the SANE library. There are software drivers for a variety of Frame Grabbers, TV tuners, and the popular Quickcam cameras available on the Net. Consult the relevant section of the Hardware Compatibility HOWTO for links to these resources.
Slide 33: Setting Up a Development System CHAPTER 2
33
Home Automation
I will give brief mention to a few gadgets that can be used to control the real world. There are a couple of programs to control the X10 CM11A (usually sold as part of the CK11A kit) computer interface module. The X10 system sends carrier current signals over your household or office power lines to control plug in, wall switch, or outlet modules that switch individual devices on or off. The X10 carrier current protocol is patented but well documented; the documentation for the computer interface is available on the Net. The CM11A may be superseded by the CM14A by the time this gets into print. Nirvis systems makes a product called the Slink-e, which is an RS-232 device used to control stereo and video gear using infrared, Control-S, S-link/Control-A1, and ControlA protocols. It can also receive signals from infrared remotes; this would allow you to write applications that record and replay remote control signals or respond to remote controls (handy for presentations). There is no Linux driver available yet, as far as I know, but the documentation is available from their Web site at http://www.nirvis.com/. Among other things, this unit can control a Sony 200 disk CD changer and not just queue up CD’s, but actually poll the track position and the disk serial number (other brands of CD players apparently cannot do this); the company supplies a Windows based CD player application that works with the Internet CD Database. The folks at Nirvus have already done the reverse engineering on some of the protocols.
2
SETTING UP A DEVELOPMENT SYSTEM
Complete Systems
A number of companies specialize in preinstalled Linux systems. VA Research and Linux Hardware Solutions are two popular examples; consult the hardware section at Linux.org for a much more complete list of these vendors. Corel Computer Corp has versions of their Netwinder systems (which use StrongARM CPUs) with Linux preinstalled. These are fairly inexpensive systems aimed at the thin client and Web server market. Cobalt Networks offers the Qube, a Linux-based server appliance in a compact package that uses a MIPS processor. It appears that SGI will be supporting Linux on some of their MIPS based workstations. A few of the major PC brands have recently announced that they will be shipping some of their servers or workstations reconfigured with Linux, including Dell and Hewlett Packard. Compaq is now marketing a number of their systems to the Linux community, although apparently they are not available with Linux preinstalled. IBM has announced that they will be supporting Linux but it will apparently be up to the authorized reseller to preinstall it. Rumor has it that many other PC brands will announce preinstalled Linux systems by the time this book is printed.
Slide 34: 34
The Linux Programming Toolkit PART I
Laptops
Support for laptops is a bit tricky because laptops have short development cycles, often use very new semiconductors, and the manufacturers rarely provide technical documentation. In spite of this, there is information on the Net concerning using Linux on approximately 300 laptop models. Consult the Hardware Compatibility HOWTO document for links to pages that have the latest information on support for specific laptop models and features.
*Note
Linux supports a number of electronic pocket organizers. 3Com’s PalmPilot is the most popular and best supported.
Linux supports Automatic Power Management (APM). There can be problems with suspend/resume features not working correctly; there can be problems with the graphics modes being restored properly for X (you may have better luck if you switch to a text console before suspending) and you may need a DOS partition for the suspend to disk feature to work. Some laptops do not allow you to have the floppy and the CD-ROM present simultaneously, which can make installation tricky (although most newer models probably support booting off of CD-ROM).
Installation
Installation of Red Hat Linux, which is included on 2 CD’s in the back of this book, is covered in The Official Red Hat Linux Installation Guide, which is available in HTML on the Net at ftp://ftp.reddat.com/reddat/reddat-5.2/i386/doc/rhmanual/ or on the enclosed Red Hat Linux CD-ROM in the directory /doc/rhmanual/. If you wish to use a different distribution, consult the documentation that came with that distribution. I recommend making a complete log of the machine configuration, the choices made during the installation, and all commands needed to install any packages you may have installed later. This is a nuisance at first but becomes very valuable when you want to install a second system, upgrade, or reinstall a system after a crash or a security compromise. Copy this log file offline and/or offsite or make printouts periodically. I normally log this information in a file called /root/captains-log as executable shell commands, as shown in Listing 2.1. If I edit a file, I record the diffs as a “here document” (see the
Slide 35: Setting Up a Development System CHAPTER 2
bash man page) piped into “patch.” One very important thing to log is where you downloaded code from; I do this as an ncftp or lynx -source command.
35
LISTING 2.1
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
SAMPLE CAPTAINS LOG
First, lets introduce some of the commands we will be using. The commands marked with “***” will be covered in detail in later chapters. Refer to the bash man page or the man page cat - copies its input to its output diff - compares two files *** patch - applies the changes in a diff *** ncftp - ftp client program lynx - text mode web broswer tar - pack and unpack tar archives cd - change current directory make - drives the compilation process *** echo - display its arguments echo hello, world - says “hello world” These are some examples of shell magic, see the bash man page for more details: # - marks a comment line foo=bar - set variable foo equal to bar export FOO=bar - similar, but subprocesses will inherit value echo $(foo) - substitute $(foo) into xxx | yyy - pipe output of command xxx into command yyy xxx >yyy - redirect the output of command xxx to file yyy xxx >>yyy - same, but append to file yyy xxx <yyy - redirect input of command xxx from file yyy xxx\ - Line continuation character “\” yyy - .. continuation of above line, i.e xxxyyy xxx <<\...EOF... - “here document” - runs the program xxx line1 - .. taking input from the following line2 - .. lines in the script up to the line ...EOF... - .. which begins with “...EOF...”;
2
SETTING UP A DEVELOPMENT SYSTEM
### ### Gnozzle ### # This is a sample captains-log entry to install # a ficticious package called gnozzle # datestamp produced using “date” command: # Mon Feb 22 21:39:26 EST 1999 continues
Slide 36: 36
The Linux Programming Toolkit PART I
LISTING 2.1
CONTINUED
# download it cd /dist ncftp -r -D ftp://ftp.gnozzle.com/pub/gnozzle-0.63.tar.gz # or... #lynx -source http://www.gnozzle.com/gnozzle-0.63.tar.gz \ >gnozzle-0.63.tar.gz # Here we unpack the tarball, after first checking # the directory structure cd /usr/local/src tar ztvf gnozzle-0.63.tar.gz tar zxvf gnozzle-0.63.tar.gz cd gnozzle-0.63/ # Here we create a permanent record of changes we # made using a text editor as a patch command. In this # case, we changed the values of CC and PREFIX in the # file Makefile. The patch has one hunk which spans # lines 1 through 7. # # the following patch was made like this: # cp Makefile Makefile.orig # emacs Makefile # diff -u Makefile.orig Makefile # beware of mangled whitespace (especially tabs) when # cutting and pasting. patch Makefile <<\...END.OF.PATCH... --- Makefile.orig Mon Feb 22 21:12:41 1999 +++ Makefile Mon Feb 22 21:13:14 1999 @@ -1,7 +1,7 @@ VERSION=0.63 -CC=pcc +CC=gcc CFLAGS=-g -PREFIX=/usr +PREFIX=/usr/local BIN=$(PREFIX)/bin LIB=$(PREFIX)/bin MAN=$(PREFIX)/man/man1 ...END.OF.PATCH... # Here we build the program and install it make clean make make -n install # see what it would do first make install # Here we create a new file with a couple lines of text cat >/etc/gnozzle.conf <<\...EOF...
Slide 37: Setting Up a Development System CHAPTER 2
gnozzlelib=/usr/local/lib/gnozzle allow ...EOF... # Here, we append a couple lines to the magic file, # which is used by the some commands to # guess the type of a file, to add the characteristic # signature of a gnozzle data file. cat >>/usr/share/magic <<\...EOF... # gnozzle 0 long FEDCBA98 Gnozzle data file ...EOF...
37
2
SETTING UP A DEVELOPMENT SYSTEM
### ### Here are some more commands which are useful to create ### a logfile of everything which was done, from which you ### can extract pertinent details for the captains log. ### Their effect may not be apparent unless you have a ### Linux box up and running and try them. # # the script command runs another shell with all output # # redirected to a file. Not to be confused with a # # “shell script” which is a sequence of commands # to be executed by the shell. Do not include “script” # commands in a “shell script”. # script install_101.log # PS4=+++ # set -v -x # ... # # do diffs so they can easily be extracted from log: # diff -u Makefile.orig Makefile | sed -e “s/^/+++ /” # ... # ^D (control-D - end script, and shell ) # fgrep +++ install_101.log | sed -e s/^+++//
If you purchased a machine that has Windows 98 preinstalled, you will want to boot Windows and examine the resource settings (IO, IRQ, and DMA) for all installed hardware. This information can be very valuable during the Linux installation. Doing so may, however, prevent you from refusing the terms of the Windows 98 License Agreement and returning it for a refund. After you have completed the installation program, you may need to do some other things that are outlined in the Post Installation section of the Red Hat Manual. There are two steps that I normally do first, however. First, I reorganize the disk layout to undo the concessions I made to accommodate the limitations of the Red Hat install program. You may or may not wish to do this; there are problems with the install program during upgrades as well. Second, I use a script to disable all unwanted daemons (I tell the install
Slide 38: 38
The Linux Programming Toolkit PART I
program to enable all daemons to preserve the information about the starting sequence). Disabling unnecessary services is one of the most important and simplest things you can do to secure your computer from attack. The scripts to accomplish both of these tasks are available on my Linux Web pages. After installation, you may wish to upgrade or install software packages that Red Hat does not include because of export controls or licensing reasons. You may wish to upgrade Netscape to a version that supports 128-bit encryption. You may want to install Adobe Acrobat Reader software to handle PDF files. You may wish to install SSH to permit secure, encrypted remote logins, file transfers, and remote program execution; use of SSH may require a license fee for some commercial uses. You may wish to upgrade the Web server to support SSL. And you probably will want to download any upgrades, particularly security related ones, from the Red Hat FTP site. Next, you may wish to install any additional applications you know you will need. To locate RPM versions of these applications, consult the RPM database at http://rufus.w3.org/. If you are concerned about security, you may not want to install any binary packages except from a few well trusted sources; instead, inspect and then install from source RPM’s or the original source tarballs (archives created with the tar program).
Summary
Careful selection of hardware will simplify installation. As more manufacturers are forced by the marketplace to act more responsibly by releasing documentation for their products or, better yet, direct support for Linux, this will be less of an issue. Also, as the distributions become more robust the support for more types and makes of hardware are being supported. As a general rule of thumb it is always a good practice to confirm support for your hardware using the sources available on the Internet, the HOWTO’s, and the SuSE’s hardware database. This can save you many headaches and frustrations during your install. When installing your development system, it is your turn to further document your system, lest you find yourself reinventing the wheel. Once you have one system up and running, you may wish to experiment with hardware that has less stable support.
Slide 39: Using GNU cc
by Kurt Wall
CHAPTER 3
IN THIS CHAPTER
• Features of GNU cc • A Short Tutorial 40 43 40
• Common Command-line Options • Optimization Options • Debugging Options • GNU C Extensions 47 48 49
Slide 40: 40
The Linux Programming Toolkit PART I
GNU cc (gcc) is the GNU project’s compiler suite. It compiles programs written in C, C++, or Objective C. gcc also compiles Fortran (under the auspices of g77). Front-ends for Pascal, Modula-3, Ada 9X, and other languages are in various stages of development. Because gcc is the cornerstone of almost all Linux development, I will discuss it in some depth. The examples in this chapter (indeed, throughout the book unless noted otherwise), are based on gcc version 2.7.2.3.
Features of GNU cc
gcc
gives the programmer extensive control over the compilation process. The compilation process includes up to four stages: • Preprocessing • Compilation Proper • Assembly • Linking
You can stop the process after any of these stages to examine the compiler’s output at that stage. gcc can also handle the various C dialects, such as ANSI C or traditional (Kernighan and Ritchie) C. As noted above, gcc happily compiles C++ and Objective C. You can control the amount and type of debugging information, if any, to embed in the resulting binary and, like most compilers, gcc can also perform code optimization. gcc allows you to mix debugging information and optimization. I strongly discourage doing so, however, because optimized code is hard to debug: Static variables may vanish or loops may be unrolled, so that the optimized program does not correspond line-for-line with the original source code. includes over 30 individual warnings and three “catch-all” warning levels. gcc is also a cross-compiler, so you can develop code on one processor architecture that will be run on another. Finally, gcc sports a long list of extensions to C and C++. Most of these extensions enhance performance, assist the compiler’s efforts at code optimization, or make your job as a programmer easier. The price is portability, however. I will mention some of the most common extensions because you will encounter them in the kernel header files, but I suggest you avoid them in your own code.
gcc
A Short Tutorial
Before beginning an in-depth look at gcc, a short example will help you start using gcc productively right away. For the purposes of this example, we will use the program in Listing 3.1.
Slide 41: Using GNU cc CHAPTER 3
41
LISTING 3.1
1 2 3 5 6 7 8 9 10 11
CANONICAL PROGRAM
TO
DEMONSTRATE gcc USAGE
/* * Listing 3.1 * hello.c – Canonical “Hello, world!” program 4 #include <stdio.h>
4
*/
int main(void) { fprintf(stdout, “Hello, Linux programming world!\n”); return 0; }
To compile and run this program, type
$ gcc hello.c -o hello $ ./hello Hello, Linux programming world!
The first command tells gcc to compile and link the source file hello.c, creating an executable, specified using the -o argument, hello. The second command executes the program, resulting in the output on the third line. A lot took place under the hood that you did not see. gcc first ran hello.c through the preprocessor, cpp, to expand any macros and insert the contents of #included files. Next, it compiled the preprocessed source code to object code. Finally, the linker, ld, created the hello binary.
gcc
3
USING GNU
You can re-create these steps manually, stepping through the compilation process. To tell to stop compilation after preprocessing, use gcc’s -E option:
CC
$ gcc -E hello.c -o hello.cpp
Examine hello.cpp and you can see the contents of stdio.h have indeed been inserted into the file, along with other preprocessing tokens. The next step is to compile hello.cpp to object code. Use gcc’s -c option to accomplish this:
$ gcc -x cpp-output -c hello.cpp -o hello.o
In this case, you do not need to specify the name of the output file because the compiler creates an object filename by replacing .c with .o. The -x option tells gcc to begin compilation at the indicated step, in this case, with preprocessed source code. How does gcc know how to deal with a particular kind of file? It relies upon file extensions to determine how to process a file correctly. The most common extensions and their interpretation are listed in Table 3.1.
Slide 42: 42
The Linux Programming Toolkit PART I
TABLE 3.1 Extension
.c .C, .cc .i .ii .S, .s .o .a, .so
HOW gcc INTERPRETS FILENAME EXTENSIONS Type
C language source code C++ language source code Preprocessed C source code Preprocessed C++ source code Assembly language source code Compiled object code Compiled library code
Linking the object file, finally, creates a binary:
$ gcc hello.o -o hello
Hopefully, you will see that it is far simpler to use the “abbreviated” syntax we used above, gcc hello.c -o hello. I illustrated the step-by-step example to demonstrate that you can stop and start compilation at any step, should the need arise. One situation in which you would want to step through compilation is when you are creating libraries. In this case, you only want to create object files, so the final link step is unnecessary. Another circumstance in which you would want to walk through the compilation process is when an #included file introduces conflicts with your own code or perhaps with another #included file. Being able to step through the process will make it clearer which file is introducing the conflict. Most C programs consist of multiple source files, so each source file must be compiled to object code before the final link step. This requirement is easily met. Suppose, for example, you are working on killerapp.c, which uses code from helper.c. To compile killerapp.c, use the following command:
$ gcc killerapp.c helper.c -o killerapp
goes through the same preprocess-compile-link steps as before, this time creating object files for each source file before creating the binary, killerapp. Typing long commands like this does become tedious. In Chapter 4, “Project Management Using GNU make,” we will see how to solve this problem. The next section will begin introducing you to the multitude of gcc’s command-line options.
gcc
Slide 43: Using GNU cc CHAPTER 3
43
Common Command-line Options
The list of command-line options gcc accepts runs to several pages, so we will only look at the most common ones in Table 3.2. TABLE 3.2 Option
-o FILE
gcc
COMMAND-LINE OPTIONS Description
Specify the output filename; not necessary when compiling to object code. If FILE is not specified, the default name is
a.out.
-c -DFOO=BAR
Compile without linking. Define a preprocessor macro named FOO with a value of BAR on the command-line. Prepend DIRNAME to the list of directories searched for include files. Prepend DIRNAME to the list of directories searched for library files. By default, gcc links against shared libraries. Link against static libraries. Link against libFOO. Include standard debugging information in the binary. Include lots of debugging information in the binary that only the GNU debugger, gdb, can understand. Optimize the compiled code. Specify an optimization level N, 0<=N<= 3. Support the ANSI/ISO C standard, turning off GNU extensions that conflict with the standard (this option does not guarantee ANSI-compliant code). Emit all warnings required by the ANSI/ISO C standard. Emit all errors required by the ANSI/ISO C standard. Support the Kernighan and Ritchie C language syntax (such as the old-style function definition syntax). If you don’t understand what this means, don’t worry about it. Suppress all warning messages. In my opinion, using this switch is a very bad idea!
continues
-IDIRNAME
-LDIRNAME
3
USING GNU
-static -lFOO -g -ggdb
CC
-O -ON -ansi
-pedantic -pedantic-errors
-traditional
-w
Slide 44: 44
The Linux Programming Toolkit PART I
TABLE 3.2 Option
-Wall
CONTINUED
Description
Emit all generally useful warnings that gcc can provide. Specific warnings can also be flagged using -W{warning}. Convert all warnings into errors, which will stop the compilation. Output a make-compatible dependency list. Show the commands used in each step of compilation.
-werror
-MM -v
We have already seen how -c works, but -o needs a bit more discussion. -o FILE tells gcc to place output in the file FILE regardless of the output being produced. If you do not specify -o, the defaults for an input file named FILE.SUFFIX are to put an executable in a.out, object code in FILE.o, and assembler code in FILE.s. Preprocessor output goes to standard output.
Library and Include Files
If you have library or include files in non-standard locations, the -L{DIRNAME} and -I{DIRNAME} options allow you to specify these locations and to insure that they are searched before the standard locations. For example, if you store custom include files in /usr/local/include/killerapp, then in order for gcc to find them, your gcc invocation would be something like
$ gcc someapp.c -I/usr/local/include/killerapp
Similarly, suppose you are testing a new programming library, libnew.so (.so is the normal extension for shared libraries— more on this subject in Chapter 24, “Using Libraries”) currently stored in /home/fred/lib, before installing it as a standard system library. Suppose also that the header files are stored in /home/fred/include. Accordingly, to link against libnew.so and to help gcc find the header files, your gcc command line should resemble the following:
$gcc myapp.c -L/home/fred/lib -I/home/fred/include -lnew
The -l option tells the linker to pull in object code from the specified library. In this example, I wanted to link against libnew.so. A long-standing UNIX convention is that libraries are named lib{something}, and gcc, like most compilers, relies on this convention. If you fail to use the -l option when linking against libraries, the link step will fail and gcc will complain about undefined references to “function_name.”
Slide 45: Using GNU cc CHAPTER 3
45
By default, gcc uses shared libraries, so if you must link against static libraries, you have to use the -static option. This means that only static libraries will be used. The following example creates an executable linked against the static ncurses. Chapter 27, “Screen Manipulation with ncurses,” discusses user interface programming with ncurses:
$ gcc cursesapp.c -lncurses -static
When you link against static libraries, the resulting binary is much larger than using shared libraries. Why use a static library, then? One common reason is to guarantee that users can run your program—in the case of shared libraries, the code your program needs to run is linked dynamically at runtime, rather than statically at compile time. If the shared library your program requires is not installed on the user’s system, she will get errors and not be able to run your program. The Netscape browser is a perfect example of this. Netscape relies heavily on Motif, an expensive X programming toolkit. Most Linux users cannot afford to install Motif on their system, so Netscape actually installs two versions of their browser on your system; one that is linked against shared libraries, netscape-dynMotif, and one that is statically linked, netscape-statMotif. The netscape “executable” itself is actually a shell script that checks to see if you have the Motif shared library installed and launches one or the other of the binaries as necessary.
3
USING GNU
Error Checking and Warnings
boasts a whole class of error-checking, warning-generating, command-line options. These include -ansi, -pedantic, -pedantic- errors, and -Wall. To begin with, -pedantic tells gcc to issue all warnings demanded by strict ANSI/ISO standard C. Any program using forbidden extensions, such as those supported by gcc, will be rejected. -pedantic-errors behaves similarly, except that it emits errors rather than warnings. -ansi, finally, turns off GNU extensions that do not comply with the standard. None of these options, however, guarantee that your code, when compiled without error using any or all of these options, is 100 percent ANSI/ISO-compliant.
gcc
CC
Consider Listing 3.2, an example of very bad programming form. It declares main() as returning void, when in fact main() returns int, and it uses the GNU extension long long to declare a 64-bit integer. LISTING 3.2
1 2 3
NON-ANSI/ISO SOURCE CODE
/* * Listing 3.2 * pedant.c - use -ansi, -pedantic or -pedantic-errors continues
Slide 46: 46
The Linux Programming Toolkit PART I
LISTING 3.2
4 5 6 7 8 9 10 11
CONTINUED
*/ #include <stdio.h> void main(void) { long long int i = 0l; fprintf(stdout, “This is a non-conforming C program\n”); }
Using gcc pedant.c -o pedant, this code compiles without complaint. First, try to compile it using -ansi:
$ gcc -ansi pedant.c -o pedant
Again, no complaint. The lesson here is that -ansi forces gcc to emit the diagnostic messages required by the standard. It does not insure that your code is ANSI C[nd]compliant. The program compiled despite the deliberately incorrect declaration of main(). Now, -pedantic:
$ gcc -pedantic pedant.c -o pedant pedant.c: In function `main’: pedant.c:9: warning: ANSI C does not support `long long’
The code compiles, despite the emitted warning. With -pedantic- errors, however, it does not compile. gcc stops after emitting the error diagnostic:
$ gcc -pedantic-errors pedant.c -o pedant pedant.c: In function `main’: pedant.c:9: ANSI C does not support `long long’ $ ls a.out* hello.c helper.h killerapp.c hello* helper.c killerapp* pedant.c
To reiterate, the -ansi, -pedantic, and -pedantic-errors compiler options do not insure ANSI/ISO-compliant code. They merely help you along the road. It is instructive to point out the remark in the info file for gcc on the use of -pedantic: “This option is not intended to be useful; it exists only to satisfy pedants who would otherwise claim that GNU CC fails to support the ANSI standard. Some users try to use `-pedantic’ to check programs for strict ANSI C conformance. They soon find that it does not do quite what they want: it finds some non-ANSI practices, but not all—only those for which ANSI C requires a diagnostic.”
Slide 47: Using GNU cc CHAPTER 3
47
Optimization Options
Code optimization is an attempt to improve performance. The trade-off is lengthened compile times and increased memory usage during compilation. The bare -O option tells gcc to reduce both code size and execution time. It is equivalent to -O1. The types of optimization performed at this level depend on the target processor, but always include at least thread jumps and deferred stack pops. Thread jump optimizations attempt to reduce the number of jump operations; deferred stack pops occur when the compiler lets arguments accumulate on the stack as functions return and then pops them simultaneously, rather than popping the arguments piecemeal as each called function returns.
O2 level optimizations include all first-level optimization plus additional tweaks that involve processor instruction scheduling. At this level, the compiler takes care to make sure the processor has instructions to execute while waiting for the results of other instructions or data latency from cache or main memory. The implementation is highly processor-specific. -O3 options include all O2 optimizations, loop unrolling, and other processor-specific features.
3
USING GNU
Depending on the amount of low-level knowledge you have about a given CPU family, you can use the –f{flag} option to request specific optimizations you want performed. Three of these flags bear consideration: -ffastmath, -finline-functions, and –funroll-loops. –ffastmath generates floating-point math optimizations that increase speed, but violate IEEE and/or ANSI standards. –finline-functions expands all “simple” functions in place, much like preprocessor macro replacements. Of course, the compiler decides what constitutes a simple function. –funroll-loops instructs gcc to unroll all loops that have a fixed number of iterations that can be determined at compile time. Inlining and loop unrolling can greatly improve a program’s execution speed because they avoid the overhead of function calls and variable lookups, but the cost is usually a large increase in the size of the binary or object files. You will have to experiment to see if the increased speed is worth the increased file size. See the gcc info pages for more details on processor flags.
CC
NOTE
For general usage, using -O2 optimization is sufficient. Even on small programs, like the hello.c program introduced at the beginning of this chapter, you will see small reductions in code size and small increases in performance time.
Slide 48: 48
The Linux Programming Toolkit PART I
Debugging Options
Bugs are as inevitable as death and taxes. To accommodate this sad reality, use gcc’s -g and -ggdb options to insert debugging information into your compiled programs to facilitate debugging sessions. The -g option can be qualified with a 1, 2, or 3 to specify how much debugging information to include. The default level is 2 (-g2), which includes extensive symbol tables, line numbers, and information about local and external variables. Level 3 debugging information includes all of the level 2 information and all of the macro definitions present. Level 1 generates just enough information to create backtracks and stack dumps. It does not generate debugging information for local variables or line numbers. If you intend to use the GNU Debugger, gdb (covered in Chapter 36, “Debugging: GNU gdb”), using the -ggdb option creates extra information that eases the debugging chore under gdb. However, this will also likely make the program impossible to debug using other debuggers, such as the DBX debugger common on the Solaris operating system. -ggdb accepts the same level specifications as -g, and they have the same effects on the debugging output. Using either of the two debug-enabling options will, however, dramatically increase the size of your binary. Simply compiling and linking the simple hello.c program I used earlier in this chapter resulted in a binary of 4089 bytes on my system. The resulting sizes when I compiled it with the -g and -ggdb options may surprise you:
$ gcc -g hello.c -o hello_g $ ls -l hello_g -rwxr-xr-x 1 kwall users $ gcc -ggdb hello.c -o hello_ggdb $ ls -l hello_ggdb -rwxr-xr-x 1 kwall users hello_ggdb*
6809 Jan 12 15:09 hello_g*
354867 Jan 12 15:09
As you can see, the -g option increased the binary’s size by half, while the -ggdb option bloated the binary nearly 900 percent! Despite the size increase, I recommend shipping binaries with standard debugging symbols (created using –g) in them in case someone encounters a problem and wants to try to debug your code for you. Additional debugging options include the -p and -pg options, which embed profiling information into the binary. This information is useful for tracking down performance bottlenecks in your code. –p adds profiling symbols that the prof program can read, and –pg adds symbols that the GNU project’s prof incarnation, gprof, can interpret. The -a option generates counts of how many times blocks of code (such as functions) are entered. -save-temps saves the intermediate files, such as the object and assembler files, generated during compilation.
Slide 49: Using GNU cc CHAPTER 3
49
Finally, as I mentioned at the beginning of this chapter, gcc allows you simultaneously to optimize your code and insert debugging information. Optimized code presents a debugging challenge, however, because variables you declare and use may not be used in the optimized program, flow control may branch to unexpected places, statements that compute constant values may not execute, and statements inside loops will execute elsewhere because the loop was unrolled. My personal preference, though, is to debug a program thoroughly before worrying about optimization. Your mileage may vary.
NOTE
Do not, however, take “optimize later” to mean “ignore efficiency during the design process.” Optimization, in the context of this chapter, refers to the compiler magic I have discussed in this section. Good design and efficient algorithms have a far greater impact on overall performance than any compiler optimization ever will. Indeed, if you take the time up front to create a clean design and use fast algorithms, you may not need to optimize, although it never hurts to try.
3
USING GNU
GNU C Extensions
GNU C extends the ANSI standard in a variety of ways. If you don’t mind writing blatantly non-standard code, some of these extensions can be very useful. For all of the gory details, I will direct the curious reader to gcc’s info pages. The extensions covered in this section are the ones frequently seen in Linux’s system headers and source code. To provide 64-bit storage units, for example, gcc offers the “long long” type:
long long long_int_var;
CC
NOTE
The “long long” type exists in the new draft ISO C standard.
On the x86 platform, this definition results in a 64-bit memory location named long_int_var. Another gcc-ism you will encounter in Linux header files is the use of inline functions. Provided it is short enough, an inline function expands in your code much as a macro does, thus eliminating the cost of a function call. Inline functions are better than macros, however, because the compiler type-checks them at compile time. To use the inline functions, you have to compile with at least -O optimization.
Slide 50: 50
The Linux Programming Toolkit PART I
The attribute keyword tells gcc more about your code and aids the code optimizer. Standard library functions, such as exit() and abort(), never return so the compiler can generate slightly more efficient code if it knows that the function does not return. Of course, userland programs may also define functions that do not return. gcc allows you to specify the noreturn attribute for such functions, which acts as a hint to the compiler to optimize the function. Suppose, for example, you have a function named die_on_error() that never returns. To use a function attribute, append __attribute__ ((attribute_name)) after the closing parenthesis of the function declaration. Thus, the declaration of die_on_error() would look like:
void die_on_error(void) __attribute__ ((noreturn));
The function would be defined normally:
void die_on_error(void) { /* your code here */ exit(1); }
You can also apply attributes to variables. The aligned attribute instructs the compiler to align the variable’s memory location on a specified byte boundary.
int int_var __attribute__ ((aligned 16)) = 0;
will cause gcc to align int_var on a 16-byte boundary. The packed attribute tells gcc to use the minimum amount of space required for variables or structs. Used with structs, packed will remove any padding that gcc would ordinarily insert for alignment purposes. A terrifically useful extension is case ranges. The syntax looks like:
case LOWVAL ... HIVAL:
Note that the spaces preceding and following the ellipsis are required. Case ranges are used in switch() statements to specify values that fall between LOWVAL and HIVAL:
switch(int_var) { case 0 ... 2: /* your code here */ break; case 3 ... 5: /* more code here */ break; default: /* default code here */ }
Slide 51: Using GNU cc CHAPTER 3
51
The preceding fragment is equivalent to:
switch(int_var) { case 1: case 2: /* your code here */ break; case 3: case 4: case 5: /* more code here */ break; default: /* default code here */ }
Case ranges are just a shorthand notation for the traditional switch() statement syntax.
Summary
In this chapter, I have introduced you to gcc, the GNU compiler suite. In reality, I have only scratched the surface, though; gcc’s own documentation runs to several hundred pages. What I have done is show you enough of its features and capabilities to enable you to start using it in your own development projects.
3
USING GNU
CC
Slide 52: 52
Slide 53: Project Management Using GNU make
CHAPTER 4
by Kurt Wall
IN THIS CHAPTER
• Why make? 54 54 56 • Writing Makefiles • More About Rules
• Additional make Command-line Options 61 • Debugging make 62 63
• Common make Error Messages • Useful Makefile Targets 63
Slide 54: 54
The Linux Programming Toolkit PART I
In this chapter, we take a long look at make, a tool to control the process of building (or rebuilding) software. make automates what software gets built, how it gets built, and when it gets built, freeing the programmer to concentrate on writing code.
Why make?
For all but the simplest software projects, make is essential. In the first place, projects composed of multiple source files typically require long, complex compiler invocations. make simplifies this by storing these difficult command lines in the makefile, which the next section discusses.
make also minimizes rebuild times because it is smart enough to determine which files have changed, and thus only rebuilds files whose components have changed. Finally, make maintains a database of dependency information for your projects and so can verify that all of the files necessary for building a program are available each time you start a build.
Writing Makefiles
So, how does make accomplish these magical feats? By using a makefile. A makefile is a text file database containing rules that tell make what to build and how to build it. A rule consists of the following: • A target, the “thing” make ultimately tries to create • A list of one or more dependencies, usually files, required to build the target • A list of commands to execute in order to create the target from the specified dependencies When invoked, GNU make looks for a file named GNUmakefile, makefile, or Makefile, in that order. For some reason, most Linux programmers use the last form, Makefile.
Makefile
rules have the general form
target : dependency dependency [...] command command [...]
Slide 55: Project Management Using GNU make CHAPTER 4
55
WARNING
The first character in a command must be the tab character; eight spaces will not suffice. This often catches people unaware, and can be a problem if your preferred editor “helpfully” translates tabs to eight spaces. If you try to use spaces instead of a tab, make displays the message “Missing separator” and stops.
is generally the file, such as a binary or object file, that you want created. depenis a list of one or more files required as input in order to create target. The commands are the steps, such as compiler invocations, necessary to create target. Unless specified otherwise, make does all of its work in the current working directory.
target dency
If this is all too abstract for you, I will use Listing 4.1 as an example. It is the makefile for building a text editor imaginatively named editor. LISTING 4.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
SIMPLE MAKEFILE ILLUSTRATING TARGETS, DEPENDENCIES,
AND
COMMANDS
editor : editor.o screen.o keyboard.o gcc -o editor editor.o screen.o keyboard.o editor.o : editor.c editor.h keyboard.h screen.h gcc -c editor.c screen.o : screen.c screen.h gcc -c screen.c keyboard.o : keyboard.c keyboard.h gcc -c keyboard.c clean : rm editor *.o
MAKE
4
USING GNU
To compile editor, you would simply type make in the directory where the makefile exists. It’s that simple. This makefile has five rules. The first target, editor, is called the default target—this is the file that make tries to create. editor has three dependencies, editor.o, screen.o, and keyboard.o; these three files must exist in order to build editor. Line 2 (the line numbers do not appear in the actual makefile; they are merely pedagogic tools) is the command that make will execute to create editor. As you recall from Chapter 3, “Using GNU cc,” this command builds an executable named editor from the three object files. The next three rules (lines 4–11) tell make how to build the individual object files.
Slide 56: 56
The Linux Programming Toolkit PART I
Here is where make’s value becomes evident: ordinarily, if you tried to build editor using the command from line 2, gcc would complain loudly and ceremoniously quit if the dependencies did not exist. make, on the other hand, after seeing that editor requires these other files, verifies that they exist and, if they don’t, executes the commands on lines 5, 8, and 11 first, then returns to line 2 to create the editor executable. Of course, if the dependencies for the components, such as keyboard.c or screen.h don’t exist, make will also give up, because it lacks targets named, in this case, keyboard.c and
screen.h.
“All well and good,” you’re probably thinking, “but how does make know when to rebuild a file?” The answer is stunningly simple: If a specified target does not exist in a place where make can find it, make (re)builds it. If the target does exist, make compares the timestamp on the target to the timestamp of the dependencies. If one or more of the dependencies is newer than the target, make rebuilds the target, assuming that the newer dependency implies some code change that must be incorporated into the target.
More About Rules
In this section, I will go into more detail about writing makefile rules. In particular, I cover creating and using phony targets, makefile variables, using environment variables and make’s predefined variables, implicit rules, and pattern rules.
Phony Targets
In addition to the normal file targets, make allows you to specify phony targets. Phony targets are so named because they do not correspond to actual files. The final target in Listing 4.1, clean, is a phony target. Phony targets exist to specify commands that make should execute. However, because clean does not have dependencies, its commands are not automatically executed. This follows from the explanation of how make works: upon encountering the clean target, make sees if the dependencies exist and, because clean has no dependencies, make assumes the target is up to date. In order to build this target, you have to type make clean. In our case, clean removes the editor executable and its constituent object files. You might create such a target if you wanted to create and distribute a source-code tarball to your users or to start a build with a clean build tree. If, however, a file named clean happened to exist, make would see it. Again, because it has no dependencies, make would assume that it is up to date and not execute the commands listed on line 14. To deal with this situation, use the special make target .PHONY.
Slide 57: Project Management Using GNU make CHAPTER 4
57
Any dependencies of the .PHONY target will be evaluated as usual, but make will disregard the presence of a file whose name matches one of .PHONY’s dependencies and execute the corresponding commands anyway. Using .PHONY, our sample makefile would look like:
1 editor : editor.o screen.o keyboard.o 2 gcc -o editor editor.o screen.o keyboard.o 3 4 editor.o : editor.c editor.h keyboard.h screen.h 5 gcc -c editor.c 6 7 screen.o : screen.c screen.h 8 gcc -c screen.c 9 10 keyboard.o : keyboard.c keyboard.h 11 gcc -c keyboard.c 12 13.PHONY : clean 14 15 clean : 16 rm editor *.o
Variables
To simplify editing and maintaining makefiles, make allows you to create and use variables. A variable is simply a name defined in a makefile that represents a string of text; this text is called the variable’s value. Define variables using the general form:
VARNAME = some_text [...]
To obtain VARNAME’s value, enclose it in parentheses and prefix it with a $:
$(VARNAME) VARNAME
expands to the text on the right-hand side of the equation. Variables are usually defined at the top of a makefile. By convention, makefile variables are all uppercase, although this is not required. If the value changes, you only need to make one change instead of many, simplifying makefile maintenance. So, after modifying Listing 4.1 to use two variables, it looks like the following: USING VARIABLES
IN
4
USING GNU
MAKE
LISTING 4.2
1 2 3
MAKEFILES
OBJS = editor.o screen.o keyboard.o HDRS = editor.h screen.h keyboard.h editor : $(OBJS) continues
Slide 58: 58
The Linux Programming Toolkit PART I
LISTING 4.2
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
CONTINUED
gcc -o editor $(OBJS) editor.o : editor.c $(HDRS) gcc -c editor.c screen.o : screen.c screen.h gcc -c screen.c keyboard.o : keyboard.c keyboard.h gcc -c keyboard.c .PHONY : clean clean : rm editor $(OBJS)
and HDRS will expand to their value each time they are referenced. make actually uses two kinds of variables—recursively-expanded and simply expanded. Recursivelyexpanded variables are expanded verbatim as they are referenced; if the expansion contains another variable reference, it is also expanded. The expansion continues until no further variables exist to expand, hence the name, “recursively-expanded.” An example will make this clear.
OBJS
Consider the variables TOPDIR and SRCDIR defined as follows:
TOPDIR = /home/kwall/myproject SRCDIR = $(TOPDIR)/src
Thus, SRCDIR will have the value /home/kwall/myproject/src. This works as expected and desired. However, consider the next variable definition:
CC = gcc CC = $(CC) -o
Clearly, what you want, ultimately, is “CC = gcc -o.” That is not what you will get, however. $(CC) is recursively-expanded when it is referenced, so you wind up with an infinite loop: $(CC) will keep expanding to $(CC), and you never pick up the -o option. Fortunately, make detects this and reports an error:
*** Recursive variable `CC’ references itself (eventually). Stop.
To avoid this difficulty, make uses simply expanded variables. Rather than being expanded when they are referenced, simply expanded variables are scanned once and for all when they are defined; all embedded variable references are resolved. The definition syntax is slightly different:
CC := gcc -o CC += -O2
Slide 59: Project Management Using GNU make CHAPTER 4
59
The first definition uses := to set CC equal to gcc -o and the second definition uses += to append -O2 to the first definition, so that CC’s final value is gcc -o -O2. If you run into trouble when using make variables or get the “VARNAME references itself” error message, it’s time to use the simply expanded variables. Some programmers use only simply expanded variables to avoid unanticipated problems. Since this is Linux, you are free to choose for yourself!
Environment, Automatic, and Predefined Variables
In addition to user-defined variables, make allows the use of environment variables and also provides “automatic” variables and predefined variables. Using environment variables is ridiculously simple. When it starts, make reads every variable defined in its environment and creates variables with the same name and value. However, similarly named variables in the makefile override the environment variables, so beware. make provides a long list of predefined and automatic variables, too. They are pretty cryptic looking, though. See Table 4.1 for a partial list of automatic variables. TABLE 4.1 Variable
$@ $< $^ $?
AUTOMATIC VARIABLES Description
The filename of a rule’s target The name of the first dependency in a rule Space-delimited list of all the dependencies in a rule Space-delimited list of all the dependencies in a rule that are newer than the target The directory part of a target filename, if the target is in a subdirectory The filename part of a target filename, if the target is in a subdirectory
MAKE
$(@D) $(@F)
4
USING GNU
In addition to the automatic variables listed in Table 4.1, make predefines a number of other variables that are used either as names of programs or to pass flags and arguments to these programs. See Table 4.2. TABLE 4.2 Variable
AR AS
PREDEFINED VARIABLES
FOR
PROGRAM NAMES
AND
FLAGS
Description
Archive-maintenance programs; default value = ar Program to do assembly; default value = as
continues
Slide 60: 60
The Linux Programming Toolkit PART I
TABLE 4.2 Variable
CC CPP RM ARFLAGS ASFLAGS CFLAGS CPPFLAGS LDFLAGS
CONTINUED
Description
Program for compiling C programs; default value = cc C Preprocessor program; default value = cpp Program to remove files; default value = “rm Flags for the assembler program; no default Flags for the C compiler; no default Flags for the C preprocessor; no default Flags for the linker (ld); no default
-f”
Flags for the archive-maintenance program; default = rv
If you want, you can redefine these variables in the makefile. In most cases, their default values are reasonable.
Implicit Rules
In addition to the rules that you explicitly specify in a makefile, which are called explicit rules, make comes with a comprehensive set of implicit, or predefined, rules. Many of these are special-purpose and of limited usage, so we will only cover a few of the most commonly used implicit rules. Implicit rules simplify makefile maintenance. Suppose you have a makefile that looks like the following:
1 2 3 4 5 6 7 8 OBJS = editor.o screen.o keyboard.o editor : $(OBJS) cc -o editor $(OBJS) .PHONY : clean clean : rm editor $(OBJS)
The command for the default target, editor, mentions editor.o, screen.o, and keyboard.o, but the makefile lacks rules for building those targets. As a result, make will use an implicit rule that says, in essence, for each object file somefile.o, look for a corresponding source file somefile.c and build the object file with the command gcc -c somefile.c -o somefile.o. So, make will look for C source files named editor.c, screen.c, and keyboard.c, compile them to object files (editor.o, screen.o, and keyboard.o), and finally, build the default editor target. The mechanism is actually more general than what I described. Object (.o) files can be created from C source, Pascal source, Fortran source, and so forth. make looks for the
Slide 61: Project Management Using GNU make CHAPTER 4
61
dependency that can actually be satisfied. So, if you have files editor.p, screen.p, and keyboard.p, the Pascal compiler will be invoked rather than the C compiler (.p is the assumed extension of Pascal source files). The lesson here is that if, for some perverse reason, your project uses multiple languages, don’t rely on the implicit rules because the results may not be what you expected.
Pattern Rules
Pattern rules provide a way around the limitations of make’s implicit rules by allowing you to define your own implicit rules. Pattern rules look like normal rules, except that the target contains exactly one character (%) that matches any nonempty string. The dependencies of such a rule also use % in order to match the target. So, for example, the rule
%.o : %.c
tells make to build any object file somename.o from a source file somename.c. Like implicit rules, make uses several predefined pattern rules:
%.o : %.c $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -o $@
This is the same as the example. It defines a rule that makes any file x.o from x.c. This rule uses the automatic variables $< and $@ to substitute the names of the first dependency and the target each time the rule is applied. The variables $(CC), $(CFLAGS), and $(CPPFLAGS) have the default values listed in Table 4.2.
Comments
You can insert comments in a makefile by preceding the comment with the hash sign (#). When make encounters a comment, it ignores the hash symbol and the rest of the line following it. Comments can be placed anywhere in a makefile. Special consideration must be given to comments that appear in commands, because most shells treat # as a metacharacter (usually as a comment delimiter). As far as make is concerned, a line that contains only a comment is, for all practical purposes, blank.
4
USING GNU
MAKE
Additional make Command-line Options
Like most GNU programs, make accepts a cornucopia of command-line options. The most common ones are listed in Table 4.3.
Slide 62: 62
The Linux Programming Toolkit PART I
TABLE 4.3 Option
-f -n
COMMON make COMMAND-LINE OPTIONS Description
Specify an alternatively-named makefile file. Print the commands that would be executed, but don’t actually execute them. Specify dirname as a directory in which make should search for included makefiles. Don’t print the commands as they are executed. If make changes directories while executing, print the current directory names. Act as if file has been modified; use with -n to see how make would behave if file had been changed. Disable all of make’s built-in rules. Print lots of debugging information. Ignore non-zero error codes returned by commands in a makefile rule. make will continue executing even if a command returns a non-zero exit status. If one target fails to build, continue to build other targets. Normally, make terminates if a target fails to build successfully. Run N commands at once, where N is a non-zero integer.
file
-Idirname
-s -w
-Wfile
-r -d -i
-k
-jN
Debugging make
If you have trouble using make, the -d option tells make to print lots of extra debugging information in addition to the commands it is executing. The output can be overwhelming because the debugging dump will display what make does internally and why. This includes the following: • Which files make evaluates for rebuilding • Which files are being compared and what the comparison results are • Which files actually need to be remade • Which implicit rules make thinks it will use • Which implicit rules make decides to use and the commands it actually executes
Slide 63: Project Management Using GNU make CHAPTER 4
63
Common make Error Messages
This section lists the most common error messages you will encounter while using make. For complete documentation, refer to the make manual or info pages. • No rule to make target `target’. Stop The makefile does not contain a rule telling make how to construct the named target and no default rules apply. • `target’ is up to date The dependencies for the named target have not changed. • Target `target’ not remade because of errors An error occurred while building the named target. This message only appears when using make’s -k option. • command: Command not found make could not find command. This usually occurs because command has been misspelled or is not in $PATH.s • Illegal option – option The invocation of make included an option that it does not recognize.
Useful Makefile Targets
In addition to the clean target I mentioned previously, several other targets typically inhabit makefiles. A target named install moves the final binary, any supporting libraries or shell scripts, and documentation to their final homes in the filesystem and sets file permissions and ownership appropriately. An install target typically also compiles the program and may also run a simple test to verify that the program compiled correctly. An uninstall target would delete the files installed by an install target. A dist target is a convenient way to prepare a distribution package. At the very least, the dist target will remove old binary and object files from the build directory and create an archive file, such as a gzipped tarball, ready for uploading to World Wide Web pages and FTP sites. For the convenience of other developers, you might want to create a tags target that creates or updates a program’s tags table. If the procedure for verifying a program is complex, you will definitely want to create a separate target, named test or check that executes this procedure and emits the appropriate diagnostic messages. A similar target, named installtest or installcheck, would be used to validate an installation. Of course the install target must have successfully built and installed the program first.
4
USING GNU
MAKE
Slide 64: 64
The Linux Programming Toolkit PART I
Summary
This chapter covered the make command, explaining why it is useful and showing you how to write simple but useful makefiles. It also discussed some of the subtleties of make rules and listed some of make’s helpful command-line options. With this foundation, you should know enough to use make to manage the process of building and maintaining your software projects.
Slide 65: Creating SelfConfiguring Software with
CHAPTER 5
autoconf
by Kurt Wall
IN THIS CHAPTER
• Understanding autoconf • Built-In Macros • Generic Macros 69 76 77 66
• An Annotated autoconf Script
Slide 66: 66
The Linux Programming Toolkit PART I
Linux’s mixed origins and the variety of Linux distributions available demand a flexible and adaptable configuration and build environment. This chapter looks at GNU autoconf, a tool that enables you to configure your software to adapt to the wide assortment of system configurations in which it may be built, including many non-Linux systems.
Understanding autoconf
Developing software that runs on a number of different UNIX and UNIX-like systems requires considerable effort. First, the code itself must be portable. Portable code makes few assumptions about the hardware on which it may be run or the software libraries available to it. In addition, if it’s C code, to ensure maximum portability, the code has to stick to strict ISO/ANSI C, or isolate non-standard C to as few modules as possible. Second, you need to know a lot about the compile and runtime environments of many different systems and, possibly, hardware architectures. GNU software, while ubiquitous on Linux systems and available for a mind-boggling array of other operating systems and hardware platforms, may not always be available on those systems. In addition, the following conditions may exist: • The C compiler may be pre-ISO • Libraries may be missing key features • System services may function differently • Filesystem conventions will certainly be different On the hardware side, you may have to deal with big-endian, little-endian, or hybrid data representation mechanisms. When you get away from Intel’s x86 processors, you have to deal with, for example, PA-RISC, several varieties of Sparcs, the Motorola chips (in several generations) that drive Macintosh and Apple computers, MIPS, Amiga, and, coming soon to a computer near you, Intel’s Merced or IA64 chip. Finally, you have to write a generic makefile and provide instructions to your users on how to edit the makefile to fit local circumstances. addresses many of these problems. It generates shell scripts that automatically configure source code packages to adapt to many different brands of UNIX and UNIXlike systems. These scripts, usually named configure, test for the presence or absence of certain features a program needs or can use, and build makefiles based on the results of
autoconf
Slide 67: Creating Self-Configuring Software with autoconf CHAPTER 5
67
these tests. The scripts autoconf generates are self-contained, so users do not need to have autoconf installed on their own systems in order to build software. All they have to do is type ./configure in the source distribution directory. To build a configure script, you create a file named configure.in in the root directory of your source code tree. configure.in contains a series of calls to autoconf macros that test for the presence or behavior of features your program can utilize or that it requires. autoconf contains many predefined macros that test for commonly required features. A second set of macros allows you to build your own custom tests if none of autoconf’s built-in macros meet your needs. If need be, configure.in can also contain shell scripts that evaluate unusual or specialized characteristics. Besides the autoconf package itself (we cover version 2.12), you will need at least version 1.1 of GNU’s m4, a macro processor that copies its input to output, expanding macros as it goes (autoconf’s author, David MacKenzie, recommends version 1.3 or better for speed reasons). The latest versions of both packages can be obtained from the GNU Web site, www.gnu.org, their FTP site, ftp.gnu.org, or from many other locations around the Web. Most Linux distributions contain them, too.
Building configure.in
Each configure.in file must invoke AC_INIT before any test and AC_OUTPUT after all the tests. These are the only two required macros. The following is the syntax for AC_INIT:
AC_INIT(unique_file_in_source_dir)
is a file present in the source code directory. The call to creates shell code in the generated configure script that looks for unique_file_in_source_dir to make sure that it is in the correct directory.
unique_file_in_source_dir AC_INIT AC_OUTPUT
creates the output files, such as Makefiles and other (optional) output files. Its syntax is as follows:
AC_OUTPUT([file...[,extra_cmds[,init_cmds]]])
is a space separated list of output files. Each file is created by copying file.in to file. extra_cmds is a list of commands appended to config.status, which can be used to regenerate the configure script. init_cmds will be inserted into config.status immediately before extra_cmds.
file
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 68: 68
The Linux Programming Toolkit PART I
Structuring the File
With few exceptions, the order in which you call autoconf macros does not matter (we note the exceptions as they occur). That said, the following is the recommended order:
AC_INIT
Tests for programs Tests for libraries Tests for header files Tests for typedefs Tests for structures Tests for compiler behavior Tests for library functions Tests for system services
AC_OUTPUT
The suggested ordering reflects the fact that, for example, the presence or absence of libraries has consequences for the inclusion of header files, so header files should be checked after libraries. Similarly, some system services depend on the existence of particular library functions, which may only be called if they are prototyped in header files. You cannot call a function prototyped in a header file if the required library does not exist. The moral is stick with the recommended order unless you know exactly what you are doing and have a compelling reason to deviate. A few words on the layout of configure.in may prove helpful. Use only one macro call per line, because most of autoconf’s macros rely on a newline to terminate commands. In situations where macros read or set environment variables, the variables may be set on the same line as a macro call. A single macro call that takes several arguments may exceed the one-call-per-line rule; use \ to continue the argument list to the next line and enclose the argument list in the m4 quote characters, [ and ]. The following two macro calls are equivalent:
AC_CHECK_HEADERS([unistd.h termios.h termio.h sgtty.h alloca.h \ sys/itimer.h]) AC_CHECK_HEADERS(unistd.h termios.h termio.h sgtty.h alloca.h sys/timer.h)
Slide 69: Creating Self-Configuring Software with autoconf CHAPTER 5
69
The first example wraps the arguments in [ and ] and uses \ (which is interpreted by the shell, not by m4 or autoconf) to indicate line continuation. The second example is simply a single long line. Finally, to insert comments into configure.in, use m4’s comment delimiter, dnl. For example,
dnl dnl This is an utterly gratuitous comment dnl AC_INIT(some_darn_file)
Helpful autoconf Utilities
In addition to autoconf’s built-in macros, covered in some detail in the next section, the autoconf package contains several helpful scripts to assist in creating and maintaining configure.in. To kick start the process, the Perl script autoscan extracts information from your source files about function calls and included header files, outputting configure.scan. Before renaming or copying this to configure.in, however, manually examine it to identify features it overlooked. ifnames functions similarly, looking for the preprocessor directives #if, #elif, #ifdef and #ifndef in your source files. Use it to augment autoscan’s output.
Built-In Macros
In many cases, autoconf’s built-in macros will be all that you require. Each set of builtin tests may be further subdivided into macros that test specific features and more general tests. This section lists and briefly describes most of the built-in tests. For a complete list and description of autoconf’s predefined tests, see the autoconf info page.
Tests for Alternative Programs
Table 5.1 describes a group of tests that check for the presence or behavior of particular programs in situations where you want or need to be able to choose between several alternative programs. The compilation process is complex, so these macros give you flexibility by confirming the existence of necessary programs or making sure, if they do exist, that they are properly invoked.
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 70: 70
The Linux Programming Toolkit PART I
TABLE 5.1 Test
ALTERNATIVE PROGRAM TESTS Description
Checks, in order, for mawk, gawk, nawk, and awk, sets output variable AWK to the first one it finds Decides which C compiler to use, sets output variable CC Determines whether or not the compiler accepts the -c and -o switches; if not, defines NO_MINUS_C_MINUS_O Sets output variable CPP to the command that executes the C preprocessor Sets output variable INSTALL to a BSD-compatible install program or to
install-sh
AC_PROG_AWK
AC_PROG_CC AC_PROG_CC_C_O
AC_PROG_CPP AC_PROG_INSTALL
AC_PROG_LEX AC_PROG_LN_S
Looks for flex or lex, setting output variable LEX to the result Sets variable LN_S to ln otherwise
-s
if system supports symbolic links or to ln
AC_PROG_RANLIB AC_PROG_YACC
Set output variable RANLIB to ranlib if ranlib exists, to : otherwise Checks, in order, for bison, byacc, and yacc, setting output variable YACC to bison -y, byacc, or yacc, respectively, depending on which it finds
Generally, the macros in Table 5.1 establish the paths to or confirm the calling conventions of the programs with which they are concerned. In the case of AC_PROG_CC, for example, you would not want to hard code gcc if it is not available on the target system. AC_PROG_CC_C_O exists because older compilers (or, at least, non-GNU compilers) do not necessarily accept –c and –o or use them the same way gcc does. A similar situation obtains with AC_PROG_LN_S because many filesystem implementations do not support creating symbolic links.
Tests for Library Functions
Table 5.2 describes tests that look for particular libraries, first to see if they exist, second to determine any differences in arguments passed to functions in those libraries. Despite the best laid plans, programming libraries eventually change in such a way that later versions become incompatible, sometimes dramatically so, with earlier versions. The macros in Table 5.2 enable you to adjust the build process to accommodate this unfortunate reality. In extreme cases, you can simply throw up your hands in despair and refuse to build until the target system is upgraded.
Slide 71: Creating Self-Configuring Software with autoconf CHAPTER 5
71
TABLE 5.2 Test
LIBRARY FUNCTION TESTS Description
Determines if function exists in library lib by attempting to link a C program with lib. Executes shell commands action_ if_found if the test succeeds or adds -llib to the output variable LIB if action_if_found is empty. action_if_not found _ adds -lother_libs to the link command If the system has the getloadavg() function, add the libraries necessary to get the function to LIBS Tests whether or not getprgrp() takes no argument, in which case it defines GETPGRP_VOID. Otherwise, getpgrp requires a process ID argument. If memcmp() isn’t available, add memcmp.o to LIBOBJS Set HAVE_MMAP if mmap() is present Tests whether or not setprgrp() takes no argument, in which case it defines SETPGRP_VOID. Otherwise, setpgrp requires two process ID arguments. If utime(file,
NULL)
AC_CHECK_LIB (lib, function [, action_if_found [, action_if_not_found, [, other_libs]]]) AC_FUNC_GETLOADAVG
AC_FUNC_GETPGRP
AC_FUNC_MEMCMP AC_FUNC_MMAP AC_FUNC_SETPGRP
AC_FUNC_UTIME_NULL
sets file’s timestamp to the present, define
HAVE_UTIME_NULL AC_FUNC_VFORK AC_FUNC_VRPINTF
If vfork.h isn’t present, define vfork() to be fork() Defines HAVE_VPRINTF if vprintf() exists
AC_CHECK_LIB
is arguably the most useful macro in this group, because it gives you the option to say, “This program won’t work unless you have the required library.” The other macros exist to accommodate the divergence between BSD and AT&T UNIX. One branch had functions or function arguments that differed sharply from the other. Because Linux has a mixed BSD and AT&T heritage, these macros help you properly configure your software.
Tests for Header Files
Header tests check for the presence and location of C-style header files. As with the macros in Table 5.2, these macros exist to allow you to take into account differences
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 72: 72
The Linux Programming Toolkit PART I
between UNIX and C implementations across systems. Believe it or not, many odd or old UNIX and UNIX-like systems lack an ANSI-compliant C compiler. Other systems may lack POSIX-compliant system calls. Table 5.3 describes these tests. TABLE 5.3 Test
AC_DECL_SYS_SIGLIST
HEADER FILE TESTS Description
If signal.h or unistd.h defines sys_syglist, define
SYS_SIGLIST_DECLARED
AC_HEADER_DIRENT
Checks for the following header files in order, dirent.h, sysdir/ndir.h, sys/dir.h, ndir.h, and defines HAVE_DIRENT_H, HAVE_SYS_NDIR_H, HAVE_SYS_DIR_H or HAVE_NDIR_H, respectively, depending on which header defines DIR Defines STDC_HEADERS if the system has ANSI/ISO C header files If the system has a POSIX compatible sys/wait.h, define output variable HAVE_SYS_WAIT
AC_HEADER_STDC AC_HEADER_SYS_WAIT
attempts to account for the wide variety of filesystems in use on UNIX and UNIX-like systems. Since most programs rely heavily on filesystem services, it is useful to know where their header files live and what functions they make available. AC_HEADER_STDC determines whether ANSI/ISO-compatible header files are available, not necessarily whether an compliant compiler is present.
AC_HEADER_DIRENT
Tests for Structures
The structure tests look for certain structure definitions or for the existence and type of structure members in header files. Reflecting, again, the UNIX family split, different implementations provide different data structures. The macros Table 5.4 describes give you an opportunity to adjust your code accordingly. TABLE 5.4 Test
AC_HEADER_TIME
STRUCTURE TESTS Description
Set output variable TIME_WITH_SYS_TIME if both time.h and sys/time.h can be included in a program Defines output variable HAVE_ST_BLKSIZE if struct stat has a st_blksize member Defines output variable HAVE_ST_BLOCKS if struct stat has a member st_blocks
AC_STRUCT_ST_BLKSIZE
AC_STRUCT_ST_BLOCKS
Slide 73: Creating Self-Configuring Software with autoconf CHAPTER 5
73
Test
AC_STRUCT_TIMEZONE
Description
Figures out how to get the timezone. Defines HAVE_TM_ZONE if struct tm has a tm_zone member or HAVE_TZNAME if an array tzname is found
Tests for typedefs
Table 5.5 describes macros that look for typedefs in the header files sys/types.h and stdlib.h. These macros enable you to adjust your code for the presence or absence of certain typedefs that might be present on one system but absent on another. TABLE 5.5 Test
AC_TYPE_GETGROUPS
TYPEDEF TESTS Description
Sets GETGROUPS_T to the gid_t or int, whichever is the base type of the array passed to getgroups() Define mode_t as int if mode_t is undefined Define pid_t as int if pid_t is undefined Define RETSIGTYPE as int if signal.h does not define signal as (void*)() Define size_t as unsigned if size_t is undefined Define uid_t and gid_t as int if uid_t is undefined
AC_TYPE_MODE_T AC_TYPE_PID_T AC_TYPE_SIGNAL
AC_TYPE_SIZE_T AC_TYPE_UID_T
Tests of Compiler Behavior
Table 5.6 describes macros that evaluate compiler behavior or peculiarities of particular host architectures. Given the array of available compilers and the CPUs on which they run, these macros allow you to adjust your program to reflect these differences and take advantage of them. TABLE 5.6 Test
AC_C_BIGENDIAN
COMPILER BEHAVIOR TESTS Description
If words are stored with the most significant bit first, define WORDS_BIGENDIAN
autoconf
5
CREATING SELFCONFIGURING SOFTWARE WITH
AC_C_CONST
If the compiler does not fully support the const declaration, define const to be empty
continues
Slide 74: 74
The Linux Programming Toolkit PART I
TABLE 5.6 Test
CONTINUED
Description
If the compiler does not support the keywords inline, __inline__, or __inline, define inline to be empty Define CHAR_UNSIGNED if char is unsigned Define HAVE_LONG_DOUBLE if the host compiler supports the long double type. Defines output variable SIZEOF_UCtype to be the size of the C or C++ built in type type
AC_C_INLINE
AC_C_CHAR_UNSIGNED AC_C_LONG_DOUBLE
AC_C_CHECK_SIZEOF(type [,cross-size])
Tests for System Services
Table 5.7 describes macros that determine the presence and behavior of operating system services and abilities. The services and capabilities that host operating systems provide varies widely, so your code needs to be able to accommodate the variety gracefully, if possible. TABLE 5.7 Test
AC_SYS_INTERPRETER
SYSTEM SERVICES TESTS Description
Set shell variable ac_cv_sys_interpreter to yes or no, depending on whether scripts start with #! /bin/sh Try to find the path to X Window include and library files, setting the shell variables x_includes and x_libraries to the correct paths, or set no_x if the paths could not be found Define HAVE_LONG_FILE_NAMES if the system supports filenames longer than 14 characters On systems that support system call restarts of signal interruptions, define HAVE_RESTARTABLE_SYSCALLS
AC_PATH_X
AC_SYS_LONG_FILE_NAMES
AC_SYS_RESTARTABLE_SYSCALLS
Strange as it may seem, there are still filesystems, even UNIX filesystems, that limit filenames to 14 characters, so AC_SYS_LONG_FILE_NAMES allows you to detect such a barbarian filesystem. AC_PATH_X acknowledges that some operating systems do not support the X Window system.
Slide 75: Creating Self-Configuring Software with autoconf CHAPTER 5
75
Tests for UNIX Variants
Tests in this class address vagaries and idiosyncrasies of specific UNIX and UNIX-like operating systems. As the autoconf author states, “These macros are warts; they will be replaced by a more systematic approach, based on the functions they make available or the environments they provide (34). Table 5.8 describes these tests. TABLE 5.8 Test
AC_AIX AC_DYNIX_SEQ AC_IRIX_SUN AC_ISC_POSIX AC_MINIX
UNIX VARIANT TESTS Description
Define _ALL_SOURCE if the host system is AIX Obsolete—use AC_FUNC_GETMNTENT instead Obsolete—use AC_FUNC_GETMNTENT instead Defines _POSIX_SOURCE to allow use of POSIX features Defines _MINIX and _POSIX_SOURCE on MINIX systems to allow use of POSIX features Obsolete—use AC_FUNC_STRFTIME instead Obsolete—use AC_HEADER_DIRENT instead
AC_SCO_INTL AC_XENIX_DIR
“Why,” you might be asking yourself, “should I concern myself with obsolete macros?” There are two related reasons. First, you may run across configure.in files that contain the macros. If you do, you can replace them with the proper macros. Second, they are obsolete because better, more general macros have been created. That is, their existence reflects the fact that a large body of extant code exists that still relies upon the peculiarities of operating system implementations and the difference between UNIX implementations.
TIP
The easiest way to stay up-to-date on macros that have become obsolete is to monitor the ChangeLog file in the autoconf distribution, available at the GNU FTP site and many other locations all over the Internet.
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 76: 76
The Linux Programming Toolkit PART I
Generic Macros
The autoconf manual describes the following macros as the building blocks for new tests. In most cases, they test compiler behavior and so require a test program that can be preprocessed, compiled, and linked (and optionally executed), so that compiler output and error messages can be examined to determine the success or failure of the test.
AC_TRY_CPP(includes [,action_if_true [,action_if_false]])
This macro passes includes through the preprocessor, running shell commands action_if_true if the preprocessor returns no errors, or shell commands action_if_false otherwise.
AC_EGREP_HEADER(pattern, header, action_if_found [,action_if_not_found])
Use this macro to search for the egrep expression pattern in the file header. Execute shell commands action_if_found if pattern is found, or action_if_not_found otherwise.
AC_EGREP_CPP(pattern, program, [action_if_found [,action_if_not_found]])
Run the C program text program through the preprocessor, looking for the egrep expression pattern. Execute shell commands action_if_found if pattern is found, or action_if_not_found otherwise.
AC_TRY_COMPILE(includes, function_body, [,action_if_not_found]]) [action_if_found \
This macro looks for a syntax feature of the C or C++ compiler. Compile a test program that includes files in includes and uses the function defined in function_body. Execute shell commands action_if_found if compilation succeeds, or action_if_not_found if compilation fails. This macro does not link. Use AC_TRY_LINK to test linking.
AC_TRY_LINK(includes, function_body, [action_if_found \ [,action_if_not_found]])
This macro adds a link test to AC_TRY_COMPILE. Compile and link a test program that includes files in includes and uses the function defined in function_body. Execute shell commands action_if_found if linking succeeds, or action_if_not_found if linking fails.
AC_TRY_RUN(program, [action_if_true [, action_if_false \ [, action_if_cross_compiling]]])
This macro tests the runtime behavior of the host system. Compile, link, and execute the text of the C program program. If program returns 0, run shell commands action_if_true, otherwise, run action_if_false. action_if_cross_compiling
Slide 77: Creating Self-Configuring Software with autoconf CHAPTER 5
77
is executed instead of action_if_found if a program is being built to run another system type.
AC_CHECK_PROG
Checks whether a program exists in the current path.
AC_CHECK_FUNC
Checks whether a function with C linkage exists.
AC_CHECK_HEADER
Tests the existence of a header file.
AC_CHECK_TYPE
If a typedef does not exist, set a default value.
An Annotated autoconf Script
In this section, we create a sample configure.in file. It does not configure an actually useful piece of software, but merely illustrates many of the macros we discussed in the preceding sections, some we did not, and some of autoconf’s other features. The following is the beginning of a listing, which is shown in pieces throughout this section. A discussion appears after each listing to discuss what is happening.
1 dnl Autoconfigure script for bogusapp 2 dnl Kurt Wall <kwall@xmission.com> 3 dnl 4 dnl Process this file with `autoconf’ to produce a `configure’ ➥script
Lines 1–4 are a standard header that indicates the package to which configure.in corresponds, contact information, and instructions for regenerating the configure script.
5 6 AC_INIT(bogusapp.c) AC_CONFIG_HEADER(config.h)
Line 6 creates a header file named config.h in the root directory of your source tree that contains nothing but preprocessor symbols extracted from your header files. By including this file in your source code and using the symbols it contains, your program should compile smoothly and seamlessly on every system on which it might land. autoconf creates config.h from an input file named config.h.in that contains all the #defines you’ll need. Fortunately, autoconf ships with an ever-so-handy shell script named autoheader that generates config.h.in. autoheader generates config.h.in by reading configure.in, a file named acconfig.h that is part of the autoconf distribution, and a
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 78: 78
The Linux Programming Toolkit PART I
in your source tree for preprocessor symbols. The good news, before you start complaining about having to create another file, is that ./acconfig.h only needs to contain preprocessor symbols that aren’t defined anywhere else. Better still, they can have dummy values. The file simply needs to contain legitimately defined C-style preprocessor symbols that autoheader and autoconf can read and utilize. See the file acconfig.h on the CD-ROM for an illustration.
./acconfig.h 7 8 9 10 11 12 test -z “$LDFLAGS” && LDFLAGS=”-I/usr/include” AC_SUBST(CFLAGS) dnl Tests for UNIX variants dnl AC_CANONICAL_HOST
reports GNU’s idea of the host system. It spits out a name of the form cpu-company-system. On one of my systems, for example, AC_CANONICAL_HOST reports the box as i586-unknown-linux.
AC_CANONICAL_HOST 13 14 15 16 17 18 19 20 21 22 23 24 25 26 dnl Tests for programs dnl AC_PROG_CC AC_PROG_LEX AC_PROG_AWK AC_PROG_YACC AC_CHECK_PROG(SHELL, bash, /bin/bash, /bin/sh) dnl Tests for libraries dnl AC_CHECK_LIB(socket, socket) AC_CHECK_LIB(resolv, res_init, [echo “res_init() not in ➥libresolv”], [echo “res_init() found in libresolv”])
Line 25 demonstrates how to write custom commands for the autoconf macros. The third and fourth arguments are the shell commands corresponding to action_if_found and action_if_not_found. Because of m4’s quoting and delimiting peculiarities, it is generally advisable to delimit commands that use “ or ‘ with m4’s quote characters ([ and ]) to protect them from shell expansion.
27 28 29 30 31 32 dnl Tests for header files dnl AC_CHECK_HEADER(killer.h) AC_CHECK_HEADERS([resolv.h termio.h curses.h sys/time.h fcntl.h \ sys/fcntl.h memory.h])
Slide 79: Creating Self-Configuring Software with autoconf CHAPTER 5
79
Lines 31 and 32 illustrate the correct way to continue multiple line arguments. Use the \ character to inform m4 and the shell of a line continuation, and surround the entire argument list with m4’s quote delimiters.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 AC_DECL_SYS_SIGLIST AC_HEADER_STDC dnl Tests for typedefs dnl AC_TYPE_GETGROUPS AC_TYPE_SIZE_T AC_TYPE_PID_T dnl Tests for structures AC_HEADER_TIME AC_STRUCT_TIMEZONE dnl Tests of compiler behavior dnl AC_C_BIGENDIAN AC_C_INLINE AC_CHECK_SIZEOF(int, 32)
Line 48 will generate a warning that AC_TRY_RUN was called without a default value to allow cross-compiling. You may ignore this warning.
51 52 53 54 55 56 57 58 59 60 61 62 63 dnl Tests for library functions dnl AC_FUNC_GETLOADAVG AC_FUNC_MMAP AC_FUNC_UTIME_NULL AC_FUNC_VFORK dnl Tests of system services dnl AC_SYS_INTERPRETER AC_PATH_X AC_SYS_RESTARTABLE_SYSCALLS
Line 63 will generate a warning that AC_TRY_RUN was called without a default value to allow cross-compiling. You may ignore this warning.
64 65 66 67 68 69 dnl Tests in this section exercise a few of `autoconf’s ➥generic macros dnl dnl First, let’s see if we have a usable void pointer type dnl AC_MSG_CHECKING(for a usable void pointer type)
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 80: 80
The Linux Programming Toolkit PART I
prints “checking” to the screen, followed by a space and the argument passed, in this case, “for a usable void pointer type.” This macro allows you to mimic the way autoconf reports its activity to the user, and to let the user know what configure is doing. It is preferable to an apparent screen lockup.
AC_MSG_CHECKING 70 71 72 73 74 75 76 AC_TRY_COMPILE([], [char *ptr; void *xmalloc(); ptr = (char *) xmalloc(1); ], [AC_DEFINE(HAVE_VOID_POINTER) ➥AC_MSG_RESULT(usable void pointer)], AC_MSG_RESULT(no usable void pointer type))
Lines 70–76 deserve considerable explanation. autoconf will embed the actual C code (71–73) inside a skeletal C program, write the resulting program to the generated configure script, which will compile it when configure runs. configure catches the compiler output and looks for errors (you can track this down yourself by looking for xmalloc in the configure script). Line 75 creates a preprocessor symbol HAVE_VOID_POINTER (that you would have to put into ./acconfig.h, since it doesn’t exist anywhere else except your code). If the compilation succeeds, configure will output #define HAVE_VOID_POINTER 1 to config.h and print the message “usable void pointer” to the screen; if compilation fails, configure outputs /*#undef HAVE_VOID_POINTER */ to config.h and displays “no usable void pointer” to the screen. In your source files, then, you simply test this preprocessor symbol like so:
#ifdef HAVE_VOID_POINTER /* do something */ #else /* do something else */ #endif 77 78 79 80 dnl dnl Now, let’s exercise the preprocessor dnl AC_TRY_CPP(math.h, echo ‘found math.h’, echo ‘no math.h? ➥- deep doo doo!’)
On line 80, if configure finds the header file math.h, it will write “found math.h” to the screen; otherwise, it informs you that you have a problem.
81 82 83 84 85 86 87 dnl dnl Next, we test the linker dnl AC_TRY_LINK([#ifndef HAVE_UNISTD_H #include <signal.h> #endif],
Slide 81: Creating Self-Configuring Software with autoconf CHAPTER 5
88 89 90 [char *ret = *(sys_siglist + 1);], [AC_DEFINE(HAVE_SYS_SIGLIST), AC_MSG_RESULT(got sys_siglist)], [AC_MSG_RESULT(no sys_siglist)])
81
We perform the same sort of test in lines 85–90 that we performed on lines 70–75. Again, because HAVE_SYS_SIGLIST is not a standard preprocessor symbol, you have to declare it in ./acconfig.h.
91 92 93 94 dnl dnl Finally, set a default value for a ridiculous type dnl AC_CHECK_TYPE(short_short_t, unsigned short)
Line 94 simply checks for a (hopefully) non-existent C data type. If it does not exist, we define short_short_t to be unsigned short. You can confirm this by looking in config.h for a #define of short_short_t.
95 96 97 98 dnl Okay, we’re done. dnl AC_OUTPUT(Makefile) Create the output files and get out of here
Having completed all of our tests, we are ready to create our Makefile. AC_OUTPUT’s job is to convert all of the tests we perform into information the compiler can understand so that when your happy end user types make, it builds your program, taking into account the peculiarities of the host system. To do its job, AC_OUTPUT needs a source file named, in this case, Makefile.in. Hopefully, you will recall that in the descriptions of autoconf’s macros, I frequently used the phrase “sets output variable FOO”. autoconf uses those output variables to set values in the Makefile and in config.h. For example, AC_STRUCT_TIMEZONE defines HAVE_TZNAME if an array tzname is found. In the config.h that configure creates, you will find #define HAVE_TZNAME 1. In your source code, then, you could wrap code that uses the tzname array in a conditional statement such as:
if(HAVE_TZNAME) /* do something */ else /* do something else */
Similarly, Makefile.in contains a number of expressions such as “CFLAGS = @CFLAGS@”. configure replaces each token of the form @output_variable@ in the Makefile with the correct value, as determined by the tests performed. In this case, @CFLAGS@ holds the debugging and optimization options, which, by default, are -g -O2.
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 82: 82
The Linux Programming Toolkit PART I
With the template created, type autoconf in the directory where you created configwhich should be the root of your source tree. You will see two warnings (on lines 48 and 63), and wind up with a shell script named configure in your current working directory. To test it, type ./configure. Figure 5.1 shows configure while it is executing.
ure.in,
FIGURE 5.1
configure
while
running.
If all went as designed, configure creates Makefile, config.h, and logs all of its activity to config.log. You can test the generated Makefile by typing make. The log file is especially useful if configure does not behave as expected, because you can see exactly what configure was trying to do at a given point. For example, the log file snippet below shows the steps configure took while looking for the socket() function (see line 24 of configure.in).
configure:979: checking for socket in -lsocket configure:998: gcc -o conftest -g -O2 -I/usr/include conftest.c -lsocket 1>&5 /usr/bin/ld: cannot open -lsocket: No such file or directory collect2: ld returned 1 exit status configure: failed program was: #line 987 “configure” #include “confdefs.h” /* Override any gcc2 internal prototype to avoid an error. */ /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char socket(); int main() { socket() ; return 0; }
Slide 83: Creating Self-Configuring Software with autoconf CHAPTER 5
83
You can see that linker, ld, failed because it could not find the socket library, libsocket. The line numbers in the snippet refer to the configure script line numbers being executed. Although it is a bit involved and tedious to set up, using autoconf provides many advantages for software developers, particularly in terms of code portability among different operating systems and hardware platforms and in allowing users to customize software to the idiosyncrasies of their local systems. You only have to perform autoconf’s set up steps once—thereafter, minor tweaks are all you need to create and maintain self-configuring software.
Summary
This chapter took a detailed look at autoconf. After a high level overview of autoconf’s use, you learned about many built-in macros that autoconf uses to configure software to a target platform. In passing, you also learned a bit about the wide variety of systems that, while all basically the same, vary just enough to make programming for them a potential nightmare. Finally, you walked step-by-step through creating a template file, generating a configure script, and using it to generate a makefile, the ultimate goal of autoconf.
5
CREATING SELFCONFIGURING SOFTWARE WITH
autoconf
Slide 84: 84
Slide 85: Comparing and Merging Source Files
CHAPTER 6
by Kurt Wall
IN THIS CHAPTER
• Comparing Files 86 98 • Preparing Source Code Patches
Slide 86: 86
The Linux Programming Toolkit PART I
Programmers often need to quickly identify differences between two files, or to merge two files together. The GNU project’s diff and patch programs provide these facilities. The first part of this chapter shows you how to create diffs, files that express the differences between two source code files. The second part illustrates using diffs to create source code patches in an automatic fashion.
Comparing Files
The diff command is one of a suite of commands that compares files. It is the one on which we will focus, but first we briefly introduce the cmp command. Then, we cover the other two commands, diff3 and sdiff, in the following sections.
Understanding the cmp Command
The cmp command compares two files, showing the offset and line numbers where they differ. Optionally, cmp displays differing characters side-by-side. Invoke cmp as follows:
$ cmp [options] file1 [file2]
A hyphen (-) may be substituted for file1 or file2, so cmp may be used in a pipeline. If one filename is omitted, cmp assumes standard input. The options include the following: • -c|--print-chars Print the first characters encountered that differ • -I N|--ignore-initial=N Ignore any difference encountered in the first N bytes • -l|--verbose Print the offsets of differing characters in decimal format and the their values in octal format • -s|--silent|--quiet Suppress all output, returning only an exit code. 0 means no difference, 1 means one or more differences, 2 means an error occurred. • -v|--version Print cmp’s version information From a programmer’s perspective, cmp is not terribly useful. Listings 6.1 and 6.2 show two versions of Proverbs 3, verses 5 and 6. The acronyms JPS and NIV stand for Jewish Publication Society and New International Version, respectively. LISTING 6.1 JPS VERSION
OF
PROVERBS 3:5-6
Trust in the Lord with all your heart, And do not rely on your own understanding. In all your ways acknowledge Him, And He will make your paths smooth.
Slide 87: Comparing and Merging Source Files CHAPTER 6
87
LISTING 6.2
NIV VERSION
OF
PROVERBS 3:5-6
6
COMPARING AND MERGING SOURCE FILES
Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make your paths straight.
A bare cmp produces the following:
$ cmp jps niv jps niv differ: char 38, line 1
Helpful, yes? We see that the first difference occurs at byte 38 one line 1. Adding the -c option, cmp reports:
$ cmp -c jps niv jps niv differ: char 38, line 1 is 54 , 12 ^J
Now we know that the differing character is decimal 52, a control character, in this case. Replacing -c with -l produces the following:
$ cmp -l jps niv 38 54 12 39 12 141 40 101 156 41 156 144 42 144 40 43 40 154 ... 148 157 164 149 164 56 150 150 12 151 56 12 cmp: EOF on niv
The first column of the preceding listing shows the character number where cmp finds a difference, the second column lists the character from the first file, and the third column the character from the second file. Note that the second to last line of the output ( 151 56 12) may not appear on some Red Hat systems. Character 38, for example, is octal 54, a comma (,) in the file jps, while it is octal 12, a newline, in the file niv. Only part of the output is shown to save space. Finally, combining -c and -l yields the following:
$ cmp -cl jps niv 38 54 , 12 39 12 ^J 141 40 101 A 156 41 156 n 144 42 144 d 40 ^J a n d
Slide 88: 88
The Linux Programming Toolkit PART I
43 40 154 l ... 148 157 o 164 t 149 164 t 56 . 150 150 h 12 ^J 151 56 . 12 ^J cmp: EOF on niv
Using -cl results in more immediately readable output, in that you can see both the encoded characters and their human-readable translations for each character that differs.
Understanding the diff Command
The diff command shows the differences between two files, or between two identically named files in separate directories. You can direct diff, using command line options, to format its output in any of several formats. The patch program, discussed in the section “Preparing Source Code Patches” later in this chapter, reads this output and uses it to recreate one of the files used to create the diff. As the authors of the diff manual say, “If you think of diff as subtracting one file from another to produce their difference, you can think of patch as adding the difference to one file to reproduce the other.” Because this book attempts to be practical, I will focus on diff’s usage from a programmer’s perspective, ignoring many of its options and capabilities. While comparing files may seem an uninteresting subject, the technical literature devoted to the subject is extensive. For a complete listing of diff’s options and some of the theory behind file comparisons, see the diff info page (info diff). The general syntax of the diff command is
diff [options] file1 file2
operates by attempting to find large sequences of lines common to file1 and interrupted by groups of differing lines, called hunks. Two identical files, therefore, will have no hunks and two complete different files result in one hunk consisting of all the lines from both files. Also bear in mind that diff performs a line-by-line comparison of two files, as opposed to cmp, which performs a character-by-character comparison. diff produces several different output formats. I will discuss each them in the following sections.
diff file2,
The Normal Output Format
If we diff Listings 6.1 and 6.2 (jps and niv, respectively, on the CD-ROM), the output is as follows:
$ diff jps niv 1,4c1,4
Slide 89: Comparing and Merging Source Files CHAPTER 6
< Trust in the Lord with all your heart, < And do not rely on your own understanding. < In all your ways acknowledge Him, < And He will make your paths smooth. --> Trust in the Lord with all your heart > and lean not on your own understanding; > in all your ways acknowledge him, > and he will make your paths straight.
89
6
COMPARING AND MERGING SOURCE FILES
The output is in normal format, showing only the lines that differ, uncluttered by context. This output is the default in order to comply with Posix standards. Normal format is rarely used for distributing software patches; nevertheless, here is a brief description of the output, or hunk format. The general normal hunk format is as follows:
change_command < file1 line < file1 line... --> file2 line > file2 line...
takes the form of a line number or a comma- separated range of lines from file1, a one character command, and a line number or comma-separated range of lines from file2. The character will be one of the following:
change_command
• a—add • d—delete • c—change The change command is actually the ed command to execute to transform file1 into file2. Looking at the hunk above, to convert jps to niv, we would have to change lines 1–4 of jps to lines 1–4 of niv.
The Context Output Format
As noted in the preceding section, normal hunk format is rarely used to distribute software patches. Rather, the “context” or “unified” hunk formats diff produces are the preferred formats to patches. To generate context diffs, use the -c, —context=[NUM], or -C NUM options to diff. So-called “context diffs” show differing lines surrounded by NUM lines of context, so you can more clearly understand the changes between files. Listings 6.3 and 6.4 illustrate the context diff format using a simple bash shell script that changes the signature files appended to the bottom of email and Usenet posts. (No line numbers were inserted into these listings in order to prevent confusion with the line numbers that diff produces.)
Slide 90: 90
The Linux Programming Toolkit PART I
LISTING 6.3
sigrot.1
#!/usr/local/bin/bash # sigrot.sh # Version 1.0 # Rotate signatures # Suitable to be run via cron ############################# sigfile=signature old=$(cat num) let new=$(expr $old+1) if [ -f $sigfile.$new ]; then cp $sigfile.$new .$sigfile echo $new > num else cp $sigfile.1 .$sigfile echo 1 > num fi
LISTING 6.4
sigrot.2
#!/usr/local/bin/bash # sigrot.sh # Version 2.0 # Rotate signatures # Suitable to be run via cron ############################# sigfile=signature srcdir=$HOME/doc/signatures srcfile=$srcdir/$sigfile old=$(cat $srcdir/num) let new=$(expr $old+1) if [ -f $srcfile.$new ]; then cp $srcfile.$new $HOME/.$sigfile echo $new > $srcdir/num else cp $srcfile.1 $HOME/.$sigfile echo 1 > $srcdir/num fi
Context hunk format takes the following form:
*** file1 file1_timestamp --- file2 file2_timestamp
Slide 91: Comparing and Merging Source Files CHAPTER 6
*************** *** file1_line_range **** file1 line file1 line... --- file2_line_range file2 line file2 line...
91
6
COMPARING AND MERGING SOURCE FILES
The first three lines identify the files compared and separate this information from the rest of the output, which is one or more hunks of differences. Each hunk shows one area where the files differ, surrounded (by default) by two line of context (where the files are the same). Context lines begin with two spaces and differing lines begin with a !, +, or -, followed by one space, illustrating the difference between the files. A + indicates a line in the file2 that does not exist in file1, so in a sense, a + line was added to file1 to create file2. A - marks a line in file1 that does not appear in file2, suggesting a subtraction operation. A ! indicates a line that was changed between file1 and file2; for each line or group of lines from file1 marked with !, a corresponding line or group of lines from file2 is also marked with a !. To generate a context diff, execute a command similar to the following:
$ diff -C 1 sigrot.1 sigrot.2
The hunks look like the following:
*** sigrot.1 Sun Mar 14 22:41:34 1999 --- sigrot.2 Mon Mar 15 00:17:40 1999 **** 2,4 **** # sigrot.sh ! # Version 1.0 # Rotate signatures --- 2,4 ---# sigrot.sh ! # Version 2.0 # Rotate signatures *************** *** 8,19 **** sigfile=signature ! old=$(cat num) let new=$(expr $old+1) ! if [ -f $sigfile.$new ]; then ! cp $sigfile.$new .$sigfile ! echo $new > num else ! cp $sigfile.1 .$sigfile ! echo 1 > num
Slide 92: 92
The Linux Programming Toolkit PART I
fi --- 8,21 ---sigfile=signature + srcdir=$HOME/doc/signatures + srcfile=$srcdir/$sigfile ! old=$(cat $srcdir/num) let new=$(expr $old+1) ! if [ -f $srcfile.$new ]; then ! cp $srcfile.$new $HOME/.$sigfile ! echo $new > $srcdir/num else ! cp $srcfile.1 $HOME/.$sigfile ! echo 1 > $srcdir/num fi**************
NOTE
To shorten the display, -C 1 was used to indicate that only a single line of context should be displayed. The patch command requires at least two lines of context to function properly. So when you generate context diffs to distribute as software patches, request at least two lines of context.
The output shows two hunks, one covering lines 2–4 in both files, the other covering lines 8–19 in sigrot.1 and lines 8–21 in sigrot.2. In the first hunk, the differing lines are marked with a ! in the first column. The change is minimal, as you can see, merely an incremented version number. In the second hunk, there are many more changes, and two lines were added to sigrot.2, indicated by the +. Each change and addition in both hunks is surrounded by a single line of context.
The Unified Output Format
Unified format is a modified version of context format that suppresses the display of repeated context lines and compacts the output in other ways as well. Unified format begins with a header identifying the files compared
--- file1 file1_timestamp +++ file2 file2_timestamp
followed by one or more hunks in the form
@@ file1_range file2_range @@ line_from_either_file line_from_either_file...
Slide 93: Comparing and Merging Source Files CHAPTER 6
93
Context lines begin with a single space and differing lines begin with a + or a -, indicating that a line was added or removed at this location with respect to file1. The following listing was generated with the command diff –U 1 sigrot.1 sigrot.2.
--- sigrot.1 Sun Mar 14 2:41:34 1999 +++ sigrot.2 Mon Mar 15 00:17:40 1999 @@ -2,3 +2,3 @@ # sigrot.sh -# Version 1.0 +# Version 2.0 # Rotate signatures @@ -8,12 +8,14 @@ sigfile=signature +srcdir=$HOME/doc/signatures +srcfile=$srcdir/$sigfile -old=$(cat num) +old=$(cat $srcdir/num) let new=$(expr $old+1) -if [ -f $sigfile.$new ]; then cp $sigfile.$new .$sigfile echo $new > num +if [ -f $srcfile.$new ]; then + cp $srcfile.$new $HOME/.$sigfile + echo $new > $srcdir/num else cp $sigfile.1 .$sigfile echo 1 > num + cp $srcfile.1 $HOME/.$sigfile + echo 1 > $srcdir/num fi
6
COMPARING AND MERGING SOURCE FILES
As you can see, the unified format’s output is much more compact, but just as easy to understand without repeated context lines cluttering the display. Again, we have two hunks. The first hunk consists of lines 2–3 in both files, the second lines 8–12 in sigrot.1 and lines 8–14 of sigrot.2. The first hunk says “delete ‘# Version 1.0’ from file1 and add ‘# Version 2.0’ to file1 to create file2.” The second hunk has three similar sets of additions and deletions, plus a simple addition of two lines at the top of the hunk. As useful and compact as the unified format is, however, there is a catch: only GNU generates unified diffs and only GNU patch understands the unified format. So, if you are distributing software patches to systems that do not or may not use GNU diff and GNU patch, don’t use unified format. Use the standard context format.
diff
Slide 94: 94
The Linux Programming Toolkit PART I
Additional diff Features
In addition to the normal, context, and unified formats we have discussed, diff can also produce side-by-side comparisons, ed scripts for modifying or converting files, and an RCS-compatible output format, and it contains a sophisticated ability to merge files using an if-then-else format. To generate side-by-side output, use diff’s -y or --sideby-side options. Note, however, that the output will be wider than usual and long lines will be truncated. To generate ed scripts, use the -e or --ed options. For information about diff’s RCS and if-then-else capabilities, see the documentation—they are not discussed in this book because they are esoteric and not widely used.
diff Command-Line Options
Like most GNU programs, diff sports a bewildering array of options to fine tune its behavior. Table 6.1 summarizes some of these options. For a complete list of all options, use the command diff --help. TABLE 6.1 Option
--binary -c|-C NUM|--context=NUM
SELECTED diff OPTIONS Meaning
Read and write data in binary mode Produce context format output, displaying NUM lines of context Expand tabs to spaces in the output Ignore case changes, treating upper- and lowercase letters the same Modify diff’s handling of large files Ignore whitespace when comparing lines Ignore lines that insert or delete lines that match the regular expression REGEXP Ignore changes that insert or delete blank lines Ignore changes in the amount of whitespace Paginate the output by passing it through pr Show the C function in which a change occurs Only report if files differ, do not output the differences Treat all files as text, even if they appear to be binary, and perform a line-by-line comparison
-t|--expand-tabs -i|--ignore-case
-H|--speed-large-files -w|--ignore-all-space -I REGEXP|--ignorematching-lines=REGEXP
-B|--ignore-blank-lines -b|--ignore-space-change -l|--paginate -p|--show-c-function -q|--brief
-a|--text
Slide 95: Comparing and Merging Source Files CHAPTER 6
95
Option
-u|-U NUM|--unified=NUM
Meaning
Produce unified format output, displaying NUM lines of context Print diff’s version number Produce side-by-side format output
6
COMPARING AND MERGING SOURCE FILES
-v|--version -y|--side-by-side
Understanding the diff3 Command
diff3
shows its usefulness when two people change a common file. It compares the two sets of changes, creates a third file containing the merged output, and indicates conflicts between the changes. diff3’s syntax is:
diff3 [options] myfile oldfile yourfile oldfile is the common ancestor from which myfile and yourfile were derived. Listing 6.5 introduces sigrot.3. It is the same as sigrot.1, except that we added a return statement at the end of the script.
LISTING 6.5
sigrot.3
#!/usr/local/bin/bash # sigrot.sh # Version 3.0 # Rotate signatures # Suitable to be run via cron ############################# sigfile=signature old=$(cat num) let new=$(expr $old+1) if [ -f $sigfile.$new ]; then cp $sigfile.$new .$sigfile echo $new > num else cp $sigfile.1 .$sigfile echo 1 > num fi return 0
Predictably, diff3’s output is more complex because it must juggle three input files. diff3 only displays lines that vary between the files. Hunks in which all three input files
Slide 96: 96
The Linux Programming Toolkit PART I
are different are called three-way hunks; two-way hunks occur when only two of the three files differ. Three-way hunks are indicated with ====, while two-way hunks add a 1, 2, or 3 at the end to indicate which of the files is different. After this header, diff3 displays one or more commands (again, in ed style), that indicate how to produce the hunk, followed by the hunk itself. The command will be one of the following:
file:la—The hunk appears after line l, but does not exist in file, so it must be appended after line l to produce the other files. file:rc—The hunk consists of range r lines from file and one of the indicated changes must be made in order to produce the other files.
To distinguish hunks from commands, diff3 hunks begin with two spaces. For example,
$ diff3 sigrot.2 sigrot.1 sigrot.3
yields (output truncated to conserve space):
==== 1:3c # Version 2.0 2:3c # Version 1.0 3:3c # Version 3.0 ====1 1:9,10c srcdir=$HOME/doc/signatures srcfile=$srcdir/$sigfile 2:8a 3:8a ====1 1:12c old=$(cat $srcdir/num) 2:10c 3:10c old=$(cat num) ...
The first hunk is a three-way hunk. The other hunks are two-way hunks. To obtain sigrot.2 from sigrot.1 or sigrot.3, the lines
srcdir=$HOME/doc/signatures srcfile=$srcdir/$sigfile
from sigrot.2 must be appended after line 8 of sigrot.1 and sigrot.3. Similarly, to obtain sigrot.1 from sigrot.2, line 10 from sigrot.1 must be changed to line 12 from
sigrot.1.
Slide 97: Comparing and Merging Source Files CHAPTER 6
97
As previously mentioned, the output is complex. Rather than deal with this, you can use the -m or --merge to instruct diff3 to merge the files together, and then sort out the changes manually.
$ diff3 -m sigrot.2 sigrot.1 sigrot.3 > sigrot.merged
6
COMPARING AND MERGING SOURCE FILES
merges the files, marks conflicting text, and saves the output to sigrot.merged. The merged file is much simpler to deal with because you only have to pay attention to conflicting output, which, as shown in Listing 6.6, is clearly marked with <<<<<<<, |||||||, or >>>>>>>. LISTING 6.6 OUTPUT
OF diff3’S
MERGE OPTION
#!/usr/local/bin/bash # sigrot.sh <<<<<<< sigrot.2 # Version 2.0 ||||||| sigrot.1 # Version 1.0 ======= # Version 3.0 >>>>>>> sigrot.3 # Rotate signatures # Suitable to be run via cron ############################# sigfile=signature srcdir=$HOME/doc/signatures srcfile=$srcdir/$sigfile old=$(cat $srcdir/num) let new=$(expr $old+1) if [ -f $srcfile.$new ]; then cp $srcfile.$new $HOME/.$sigfile echo $new > $srcdir/num else cp $srcfile.1 $HOME/.$sigfile echo 1 > $srcdir/num fi return 0
<<<<<<< |||||||
marks conflicts from myfile, >>>>>>> marks conflicts from yourfile, and marks conflicts with oldfile. In this case, we probably want the most recent version number, so we would delete the marker lines and the lines indicating the 1.0 and 2.0 versions.
Slide 98: 98
The Linux Programming Toolkit PART I
Understanding the sdiff Command
enables you to interactively merge two files together. It displays the files in sideby-side format. To use the interactive feature, specify the -o file or --output file to indicate the filename to which output should be saved. sdiff will display each hunk, followed by a % prompt, at which you type one of these commands, followed by Enter:
sdiff
• l—Copy the left-hand column to the output file • r—Copy the right-hand column to the output file • el—Edit the left-hand column, then copy the edited text to the output file • er—Edit the right-hand column, then copy the edited text to the output file • e—Discard both versions, enter new text, then copy the new text to the output file • eb—Concatenate the two versions, edit the concatenated text, then copy it to the output file • q—Quit Editing sdiff is left as an exercise for you.
Preparing Source Code Patches
Within the Linux community, most software is distributed either in binary (ready to run) format, or in source format. Source distributions, in turn, are available either as complete source packages, or as diff-generated patches. patch is the GNU project’s tool for merging diff files into existing source code trees. The following sections discuss patch’s command-line options, how to create a patch using diff, and how to apply a patch using patch. Like most of the GNU project’s tools, patch is a robust, versatile, and powerful tool. It can read the standard normal and context format diffs, as well as the more compact unified format. patch also strips header and trailer lines from patches, enabling you to apply a patch straight from an email message or Usenet posting without performing any preparatory editing.
patch Command-Line Options
Table 6.2 lists commonly used patch options. For complete details, try patch --help or the patch info pages.
Slide 99: Comparing and Merging Source Files CHAPTER 6
99
TABLE 6.2 Option
patch
OPTIONS Meaning
Interpret the patch file as a context diff Interpret the patch file as an ed script Interpret the patch file as a normal diff Interpret the patch file as a unified diff Make DIR the current directory for interpreting filenames in the patch file Set the fuzz factor to NUM lines when resolving inexact matches Consider any sequence of whitespace equivalent to any other sequence of whitespace Strip NUM filename components from filenames in the patch file Work silently unless errors occur Assume the patch file was created with the old and new files swapped Do not ask any questions Display patch’s version information and exit
6
COMPARING AND MERGING SOURCE FILES
-c|--context -e|--ed -n|--normal -u|--unified -d DIR|--directory=DIR
-F NUM|--fuzz=NUM
-l|--ignore-white-space
-pNUM|--strip=NUM
-s|--quiet -R|--reverse
-t|--batch --version
In most cases, patch can determine the format of a patch file. If it gets confused, however, use the -c, -e, -n, or -u options to tell patch how to treat the input patch file. As previously noted , only GNU diff and GNU patch can create and read, respectively, the unified format, so unless you are certain that only users with access to these GNU utilities will receive your patch, use the context diff format for creating patches. Also recall that patch requires at least two lines of context correctly to apply patches. The fuzz factor (-F NUM or --fuzz=NUM) sets the maximum number of lines patch will ignore when trying to locate the correct place to apply a patch. It defaults to 2, and cannot be more than the number of context lines provided with the diff. Similarly, if you are applying a patch pulled from an email message or a Usenet post, the mail or news client may change spaces into tabs or tabs into spaces. If so, and you are having trouble applying the patch, use patch’s -l or --ignore-white-space option. Sometimes, programmers reverse the order of the filenames when creating a diff. The correct order should be old-file new-file. If the patch encounters a diff that appears
Slide 100: 100
The Linux Programming Toolkit PART I
to have been created in new-file old-file order, it will consider the patch file a “reverse patch.” To apply a reverse patch in normal order, specify -R or --reverse to patch. You can also use -R to back out a previously applied patch. As it works, patch makes a backup copy of each source file it is going to change, appending .orig to the end of the file. If patch fails to apply a hunk, it saves the hunk using the filename stored in the patch file and adding .rej (for reject) to it.
Creating a Patch
To create a patch, use diff to create a context or unified diff, place the name of the older file before the newer file on the diff command line, and name your patch file by appending .diff or .patch to the filename. For example, to create a patch based on sigrot.1 and sigrot.2, the appropriate command line would be
$ diff -c sigrot.1 sigrot.2 > sigrot.patch
to create a context diff, or
$ diff -u sigrot.1 sigrot.2 > sigrot.patch
to create a unified diff. If you have a complicated source tree, one with several subdirectories, use diff’s -r (--recursive) option to tell diff to recurse into each subdirectory when creating the patch file.
Applying a Patch
To apply the patch, the command would be
$ patch -p0 < sigrot.patch
The -pNUM option tells patch how many “/”s and intervening filename components to strip off the filename in the patch file before applying the patch. Suppose, for instance, the filename in the patch is /home/kwall/src/sigrot/sigrot.1. -p1 would result in home/kwall/src/sigrot/sigrot.1; -p4 would result in sigrot/sigrot.1; -p strips off every part but the final filename, or sigrot.1. If, after applying a patch, you decide it was mistake, simply add -R to the command line you used to install the patch, and you will get your original, unpatched file back:
$ patch -p0 -R < sigrot.patch
See, using diff and patch is not hard! Admittedly, there is a lot to know about the various file formats and how the commands work, but actually applying them is very simple and straightforward. As with most Linux commands, there is much more you can learn, but it isn’t necessary to know everything in order to be able to use these utilities effectively.
Slide 101: Comparing and Merging Source Files CHAPTER 6
101
Summary
In this chapter, you learned about the cmp, diff, diff3, sdiff, and patch commands. Of these, diff and patch are the most commonly used for creating and applying source code patches. You have also learned about diff’s various output formats. The standard format is the context format, because most patch programs can understand it. What you have learned in this chapter will prove to be an essential part of your Linux software development toolkit.
6
COMPARING AND MERGING SOURCE FILES
Slide 102: 102
Slide 103: Version Control with RCS
by Kurt Wall
CHAPTER 7
IN THIS CHAPTER
• Terminology 104 105 • Basic RCS Usage • rcsdiff 110 113
• Other RCS Commands
Slide 104: 104
The Linux Programming Toolkit PART I
Version control is an automated process for keeping track of and managing changes made to source code files. Why bother? Because one day you will make that one fatal edit to a source file, delete its predecessor and forget exactly which line or lines of code you “fixed”; because simultaneously keeping track of the current release, the next release, and eight bug fixes manually will become too tedious and confusing; because frantically searching for the backup tape because one of your colleagues overwrote a source file for the fifth time will drive you over the edge; because, one day, over your morning cappuccino, you will say to yourself, “Version control, it’s the Right Thing to Do.” In this chapter, we will examine RCS, the Revision Control System, a common solution to the version control problem. RCS is a common solution because it is available on almost all UNIX systems, not just on Linux. Indeed, RCS was first developed on real, that is, proprietary, UNIX systems, although it is not, itself, proprietary. Two alternatives to RCS, which is maintained by the GNU project, are SCCS, the Source Code Control System, a proprietary product, and CVS, the Concurrent Version System, which is also maintained by the GNU project. CVS is built on top of RCS and adds two features to it. First, it is better suited to managing multi-directory projects than RCS because it handles hierarchical directory structures more simply and its notion of a project is more complete. Whereas RCS is file-oriented, as you will see in this chapter, CVS is project-oriented. CVS’ second advantage is that it supports distributed projects, those where multiple developers in separate locations, both geographically and in terms of the Internet, access and manipulate a single source repository. The KDE project and the Debian Linux distribution are two examples of large projects using CVS’ distributed capabilities. Note, however, that because CVS is built on top of RCS, you will not be able to master CVS without some knowledge of RCS. This chapter introduces you to RCS because it is a simpler system to learn. I will not discuss CVS.
Terminology
Before proceeding, however, Table 7.1 lists a few terms that will be used throughout the chapter. Because they are so frequently used, I want to make sure you understand their meaning as far as RCS and version control in general are concerned. TABLE 7.1 Term
RCS File
VERSION CONTROL TERMS Description
Any file located in an RCS directory, controlled by RCS and accessed using RCS commands. An RCS file contains all versions of a particular file. Normally, an RCS file has a “.v” extension.
Slide 105: Version Control with RCS CHAPTER 7
105
Term
Working File
Description
One or more files retrieved from the RCS source code repository (the RCS directory) into the current working directory and available for editing. A working file retrieved for editing such that no one else can edit it simultaneously. A working file is “locked” by the first user against edits by other users. A specific, numbered version of a source file. Revisions begin with 1.1 and increase incrementally, unless forced to use a specific revision number.
Lock
Revision
7
VERSION CONTROL WITH RCS
The Revision Control System manages multiple versions of files, usually but not necessarily source code files (I used RCS to maintain the various revisions of this book). RCS automates file version storage and retrieval, change logging, access control, release management, and revision identification and merging. As an added bonus, RCS minimizes disk space requirements because it tracks only file changes.
NOTE
The examples used in this chapter assume you are using RCS version 5.7. To determine the version of RCS you are using, type rcs -V.
Basic RCS Usage
One of RCS’s attractions is its simplicity. With only a few commands, you can accomplish a great deal. This section discusses the ci, co, and ident commands as well as RCS keywords.
ci and co
You can accomplish a lot with RCS using only two commands, ci and co, and a directory named RCS. ci stands for “check in,” which means storing a working file in the RCS directory; co means “check out,” and refers to retrieving an RCS file from the RCS repository. To get started, create an RCS directory:
$ mkdir RCS
All RCS commands will use this directory, if it is present in your current working directory. The RCS directory is also called the repository. Next, create the source file shown in Listing 7.1, howdy.c, in the same directory in which you created the RCS directory.
Slide 106: 106
The Linux Programming Toolkit PART I
LISTING 7.1
howdy.c—BASIC
RCS USAGE
/* $Id$ * howdy.c * Sample code to demonstrate RCS Usage * Kurt Wall */ #include <stdio.h> #include <stdlib.h> int main(void) { fprintf(stdout, Howdy, Linux programmer!”); return EXIT_SUCCESS; }
Execute the command ci howdy.c. RCS asks for a description of the file, copies it to the RCS directory, and deletes the original. “Deletes the original?” Ack! Don’t worry, you can retrieve it with the command co howdy.c. Voilá! You have a working file. Note that the working file is read-only; if you want to edit it, you have to lock it. To do this, use the -l option with co (co -l howdy.c). -l means lock, as explained in Table 7.1.
$ ci howdy.c RCS/howdy.c,v <-- howdy.c enter description, terminated with single ‘.’ or end of file: NOTE: This is NOT the log message! >> Simple program to illustrate RCS usage >> . initial revision: 1.1 done $ co -l howdy.c RCS/howdy.c,v --> howdy.c revision 1.1 (locked) done
To see version control in action, make a change to the working file. If you haven’t already done so, check out and lock the file (co -l howdy.c). Change anything you want, but I recommend adding “\n” to the end of fprintf()’s string argument because Linux (and UNIX in general), unlike DOS and Windows, do not automatically add a newline to the end of console output.
fprintf(stdout, “Howdy, Linux programmer!\n”);
Next, check the file back in and RCS will increment the revision number to 1.2, ask for a description of the change you made, incorporate the changes you made into the RCS file, and (annoyingly) delete the original. To prevent deletion of your working files during check-in operations, use the -l or -u option with ci.
Slide 107: Version Control with RCS CHAPTER 7
$ ci -l howdy.c RCS/howdy.c,v <-- howdy.c new revision: 1.2; previous revision: 1.1 enter log message, terminated with single ‘.’ or end of file: >> Added newline >> . done
107
When used with ci, both the -l and -u options cause an implied check out of the file after the check in procedure completes. -l locks the file so you can continue to edit it, while -u checks out an unlocked or read-only working file. In addition to -l and -u, ci and co accept two other very useful options: -r (for “revision”) and -f (“force”). Use -r to tell RCS which file revision you want to manipulate. RCS assumes you want to work with the most recent revision; -r overrides this default. ci -r2 howdy.c (this is equivalent to ci -r2.1 howdy.c), for example, creates revision 2.1 of howdy.c; co -r1.7 howdy.c checks out revision 1.7 of howdy.c, disregarding the presence of higher-numbered revisions in your working directory. The -f option forces RCS to overwrite the current working file. By default, RCS aborts a check-out operation if a working file of the same name already exists in your working directory. So, if you really botch up your working file, co -l -f howdy.c is a handy way to discard all of the changes you’ve made and start with a known good source file. When used with ci, -f forces RCS to check in a file even if it has not changed. RCS’s command-line options are cumulative, as you might expect, and it does a good job of disallowing incompatible options. To check out and lock a specific revision of howdy.c, you would use a command like co -l -r2.1 howdy.c. Similarly, ci -u -r3 howdy.c checks in howdy.c, assigns it revision number 3.1, and deposits a read-only revision 3.1 working file back into your current working directory.
7
VERSION CONTROL WITH RCS
RCS Keywords
RCS keywords are special, macro-like tokens used to insert and maintain identifying information in source, object, and binary files. These tokens take the form $KEYWORD$. When a file containing RCS keywords is checked out, RCS expands $KEYWORD$ to
$KEYWORD: VALUE $.
$Id$
For example, that peculiar string at the top of Listing 7.1, $Id$, is an RCS keyword. The first time you checked out howdy.c, RCS expanded it to something like
$Id: howdy.c,v 1.1 1998/12/07 22:39:01 kwall Exp $
Slide 108: 108
The Linux Programming Toolkit PART I
The format of the $Id$ string is
$KEYWORD: FILENAME REV_NUM DATE TIME AUTHOR STATE LOCKER $”
On your system, most of these fields will have different values. If you checked out the file with a lock, you will also see your login name after the Exp entry.
$Log$
RCS replaces the $Log$ keyword with the log message you supplied during check in. Rather than replacing the previous log entry, though, RCS inserts the new log message above the last log entry. Listing 7.2 gives an example of how the $Log$ keyword is expanded after several check ins: LISTING 7.2 THE $Log$ KEYWORD AFTER
A
FEW CHECK
INS
/* $Id: howdy.c,v 1.5 1999/01/04 23:07:35 kwall Exp kwall $ * howdy.c * Sample code to demonstrate RCS usage * Kurt Wall * Listing 7.1 * * ********************* Revision History ********************* * $Log: howdy.c,v $ * Revision 1.5 1999/01/04 23:07:35 kwall * Added pretty box for the revision history * * Revision 1.4 1999/01/04 14:41:55 kwall * Add args to main for processing command line * * Revision 1.3 1999/01/04 14:40:15 kwall * Added the Log keyword. * ************************************************************ */ #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { fprintf(stdout, “Howdy, Linux programmer!\n”); return EXIT_SUCCESS; }
The $Log$ keyword makes it convenient to see the changes made to a given file while working within that file. Read from top to bottom, the change history lists the most recent changes first.
Slide 109: Version Control with RCS CHAPTER 7
109
Other RCS Keywords
Table 7.2 lists other RCS keywords and how RCS expands each of them. TABLE 7.2 Keyword
$Author$ $Date$ $Header$
RCS KEYWORDS Description
Login name of user who checked in the revision Date and time revision was checked, in UTC format Full pathname of the RCS file, the revision number, date, time, author, state, locker (if locked) Login name of the user who locked the revision (if not locked, field is empty) Symbolic name, if any, used to check out the revision Name of the RCS file without a path Revision number assigned to the revision Full pathname to the RCS file The state of the revision: Exp (experimental), the default; Stab (stable); Rel (released)
7
VERSION CONTROL WITH RCS
$Locker$
$Name$ $RCSfile$ $Revision$ $Source$ $State$
The ident Command
The ident command locates RCS keywords in files of all types. This feature lets you find out which revisions of which modules are used in a given program release. To illustrate, create the source file shown in Listing 7.3. LISTING 7.3 THE ident COMMAND
/* $Id$ * prn_env.c * Display values of environment variables. * Kurt Wall * Listing 7.3 */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> static char rcsid[] = “$Id$\n”; int main(void) continues
Slide 110: 110
The Linux Programming Toolkit PART I
LISTING 7.3
{
CONTINUED
extern char **environ; char **my_env = environ; while(*my_env) { fprintf(stdout, “%s\n”, *my_env); my_env++; } return EXIT_SUCCESS; }
The program, prn_env.c, loops through the environ array declared in the header file unistd.h to print out the values of all your environment variables (see man(3) environ for more details). The statement static char rcsid[] = “$Id$\n”; takes advantage of RCS’s keyword expansion to create a static text buffer holding the value of the $Id$ keyword in the compiled program that ident can extract. Check prn_env.c in using the -u option (ci -u prn_env.c), and then compile and link the program (gcc prn_env.c -o prn_env). Ignore the warning you may get that rcsid is defined but not used. Run the program if you want, but also execute the command ident prn_env. If everything worked correctly, you should get output resembling the following:
$ ident prn_env prn_env: $Id: prn_env.c,v 1.1 1999/01/06 03:04:40 kwall Exp $
The $Id$ keyword expanded as previously described and gcc compiled this into the binary. To confirm this, page through the source code file and compare the Id string in the source code to ident’s output. The two strings will match exactly. works by extracting strings of the form $KEYWORD: VALUE $ from source, object, and binary files. It even works on raw binary data files and core dumps. In fact, because ident looks for all instances of the $ KEYWORD: VALUE $ pattern, you can also use words that are not RCS keywords. This enables you to embed additional information into programs, for example, a company name. Embedded information can be a valuable tool for isolating problems to a specific code module. The slick part of this feature is that RCS updates the identification strings automatically—a real bonus for programmers and project managers.
ident
rcsdiff
If you need to see the differences between one of your working files and its corresponding RCS file, use the rcsdiff command. rcsdiff uses the diff(1) command (discussed
Slide 111: Version Control with RCS CHAPTER 7
111
in Chapter 6, “Comparing and Merging Source Files”) to compare file revisions. In its simplest form, rcsdiff filename, rcsdiff compares the latest revision of filename in the repository with the working copy of filename. You can also compare specific revisions using the -r option. Consider the sample program prn_env.c. Check out a locked version of it and remove the static char buffer. The result should look like the following:
#include <stdio.h> #include <stdlib.h> #include <unistd.h>
7
VERSION CONTROL WITH RCS
int main(void) { extern char **environ; char **my_env = environ; while(*my_env) { fprintf(stdout, “%s\n”, *my_env); my_env++; } return EXIT_SUCCESS; }
Now, execute the command rcsdiff prn_env.c. RCS complies and displays the following:
$ rcsdiff prn_env.c =================================================================== RCS file: RCS/prn_env.c,v retrieving revision 1.1 diff -r1.1 prn_env.c 11d10 < static char rcsid[] = ➥“$Id: prn_env.c,v 1.1 1999/01/06 03:04:40 kwall Exp kwall $\n”;
As we learned in the Chapter 6, this diff output means that line 11 in revision 1.1 would have appeared on line 10 of prn_env.c if it had not been deleted. To look at examining specific revisions using the -r option, check prn_env.c into the repository, check it right back out with a lock, add a sleep(5) statement immediately above the return statement, and, finally, check this third revision back in with the -u option. You should now have three revisions of prn_env.c in the repository. The general format for comparing specific file revisions using rcsdiff is
rcsdiff [ -rFILE1 [ -rFILE2 ] ] FILENAME
Slide 112: 112
The Linux Programming Toolkit PART I
First, compare revision 1.1 to the working file:
$ rcsdiff -r1.1 prn_env.c =================================================================== RCS file: RCS/prn_env.c,v retrieving revision 1.1 diff -r1.1 prn_env.c 1c1 < /* $Id: prn_env.c,v 1.1 1999/01/06 03:10:17 kwall Exp $ --> /* $Id: prn_env.c,v 1.3 1999/01/06 03:12:22 kwall Exp $ 11d10 < static char rcsid[] = ➥“$Id: prn_env.c,v 1.1 1999/01/06 03:04:40 Exp kwall $\n”; 21a21 > sleep(5);
Next, compare 1.2 to 1.3:
$ rcsdiff -r1.2 -r1.3 prn_env.c =================================================================== RCS file: RCS/prn_env.c,v retrieving revision 1.2 retrieving revision 1.3 diff -r1.2 -r1.3 1c1 < /* $Id: prn_env.c,v 1.1 1999/01/06 03:10:17 kwall Exp $ --> /* $Id: prn_env.c,v 1.3 1999/01/06 03:12:22 kwall Exp $ 20a21 > sleep(5); rcsdiff
is a useful utility for viewing changes to RCS files or preparing to merge multiple revisions into a single revision.
For you GNU Emacs aficionados, Emacs boasts an advanced version control mode, VC, that supports RCS, CVS, and SCCS. For example, to check the current file in or out of an RCS repository, type C-x v v or C-x C-q and follow the prompts. If you want to place the file you are currently editing into the repository for the first time (called “registering” a file with RCS), you would type C-x v i. All of Emacs’ version control commands are prefixed with C-x v. Figure 7.1 illustrates registering a file in an Emacs session with RCS. Emacs’ RCS mode greatly enhances RCS’ basic capabilities. If you are a fan of Emacs, I encourage you to explore Emacs’ VC mode.
Slide 113: Version Control with RCS CHAPTER 7
113
FIGURE 7.1
Registering a file with RCS in Emacs.
7
VERSION CONTROL WITH RCS
Other RCS Commands
Besides ci, co, ident, and rcsdiff, the RCS suite includes rlog, rcsclean, rcsmerge, and, of course, rcs. These additional commands extend your control of your source code, allowing you to merge or delete RCS files, review log entries, and perform other administrative functions.
rcsclean
rcsclean does what its name suggests: it cleans up RCS working files. The basic syntax is rcsclean [options] [file ... ]. A bare rcsclean command will delete all working files unchanged since they were checked out. The -u option tells rcsclean to unlock any locked files and removes unchanged working files. You can specify a revision to delete using the -rM.N format. $ rcsclean -r2.3 foobar.c
removes the 2.3 revision of foobar.c.
rlog
rlog prints the log messages and other information about files stored in the RCS repository. For example, rlog prn_env.c will display all of the log information for all revisions of prn_env.c. The -R option tells rlog to display only filenames. To see a list of all the files in the repository, for example, rlog -R RCS/* is the proper command (of course, you could always type ls -l RCS, too). If you only want to see a list of all locked files, use the -L option, as in rlog -R -L RCS/*. To see the log information on all files locked by the user named gomer, use the -l option: $ rlog -lgomer RCS/*
Slide 114: 114
The Linux Programming Toolkit PART I
rcs
The rcs command is primarily an administrative command. In normal usage, though, it is useful in two ways. If you checked out a file read-only, then made changes you can’t bear to lose, rcs -l filename will check out filename with a lock without simultaneously overwriting the working file. If you need to break a lock on a file checked out by someone else, rcs -u filename is the command to use. The file will be unlocked, and a message sent to the original locker, with an explanation from you about why you broke the lock. As you will recall, each time you check a file in, you can type a check in message explaining what has changed or what you did. If you make a typographical error or some other mistake in the check in message, or would simply like to add additional information to it, you can use the following rcs command:
$ rcs –mrev:msg rev
is the revision whose message you want to correct or modify and msg is the corrected or additional information you want to add.
rcsmerge
rcsmerge
attempts to merge multiple revisions into a single working file. The general
syntax is
rcsmerge -rAncestor -rDescendant Working_file -p > Merged_file
Both Descendant and Working_file must be descended from Ancestor. The -p option tells rcsmerge to send its output to stdout, rather than overwriting Working_file. By redirecting the output to Merged_file, you can examine the results of the merge. While rcsmerge does the best it can merging files, the results can be unpredictable. The -p option protects you from this unpredictability. For more information on RCS, see these man pages: rcs(1), ci(1), co(1),
rcsintro(1), rcsdiff(1), rcsclean(1), rcsmerge(1), rlog(1), rcsfile(1), ident(1).
and
Summary
In this chapter, you learned about RCS, the Revision Control System. ci and co, with their various options and arguments, are RCS’s fundamental commands. RCS keywords enable you to embed identifying strings in your code and in compiled programs that can later be extracted with the ident command. You also learned other helpful but less frequently used RCS commands, including rcsdiff, rcsclean, rcsmerge, and rlog.
Slide 115: Creating Programs in Emacs
by Kurt Wall and Mark Watson
CHAPTER 8
IN THIS CHAPTER
• Introduction to Emacs • Features Supporting Programming 125 • Automating Development with Emacs Lisp 132 116
Slide 116: 116
The Linux Programming Toolkit PART I
Emacs provides a rich, highly configurable programming environment. In fact, you can start Emacs in the morning, and, while you are compiling your code, you can catch up on last night’s posts to alt.vampire.flonk.flonk.flonk, email a software patch, get caring professional counseling, and write your documentation, all without leaving Emacs. This chapter gets you started with Emacs, focusing on Emacs’ features for programmers.
Introduction to Emacs
Emacs has a long history, as one might expect of software currently shipping version 20.3 (the version used for this chapter), but we won’t recite it. The name Emacs derives from the “editing macros” that Richard Stallman originally wrote for the TECO editor. Stallman has written his own account of Emacs’ history, which can be viewed online at http://www.gnu.org/philosophy/stallman-kth.html (you will also get a good look at GNU’s philosophical underpinnings).
NOTE
The world is divided into three types of people—those who use Emacs, those who prefer vi, and everyone else. Many flame wars have erupted over the Emacs versus vi issue. Commenting on Emacs’ enormous feature set, one wag said: “Emacs is a great operating system, but UNIX has more programs.” I’m always interested in Emacs humor. Send your Emacs related wit to kwall@xmission.com with “Emacs Humor” somewhere in the subject line.
What is true of any programmer’s editor is especially true of Emacs: Time invested in learning Emacs repays itself many times over during the development process. This chapter presents enough information about Emacs to get you started using it and also introduces many features that enhance its usage as a C development environment. However, Emacs is too huge a topic to cover in one chapter. A complete tutorial is Sams Teach Yourself Emacs in 24 Hours. For more detailed information, see the GNU Emacs Manual and the GNU Emacs Lisp Reference Manual, published by the Free Software Foundation, Inc., and Learning GNU Emacs and Writing GNU Emacs Extensions, published by O’Reilly.
Slide 117: Creating Programs in Emacs CHAPTER 8
117
Starting and Stopping Emacs
To start Emacs, type emacs or emacs filename. If you have X configured and running on your system, try xemacs to start XEmacs, a graphical version of Emacs, formerly known as Lucid Emacs. If Emacs was built with Athena widget set support, Emacs will have mouse support and a pull-down menu. Depending on which command you type, you should get a screen that looks like Figure 8.1, Figure 8.2, or Figure 8.3. FIGURE 8.1
Emacs on a text mode console. Menu bar
Editing window
Status bar Minibuffer
8
CREATING PROGRAMS IN EMACS
Menu bar
FIGURE 8.2
Emacs, with Athena (X) support.
Editing window
Status bar Minibuffer
Slide 118: 118
The Linux Programming Toolkit PART I
FIGURE 8.3
XEmacs has an attractive graphical interface. Menu bar Toolbar Editing window
Status bar Minibuffer
If you take a notion to, type C-h t to go through the interactive tutorial. It is instructive and only takes about thirty minutes to complete. We will not cover it here because we do not want to spoil the fun. The following list explains the notation used in this chapter: • C-x means press and hold the Ctrl key and press letter x • C x means press and release the Ctrl key, and then press letter x • M-x means press and hold the Alt key and press letter x (if M-x does not work as expected, try Esc x) • M x means press and release the Alt key, and then press letter x Due to peculiarities in terminal configuration, the Alt key may not work with all terminal types or keyboards. If a command preceded with the Alt key fails to work as expected, try using the Esc key instead. On the so-called “Windows keyboards,” try pressing the Window key between Alt and Ctrl.
Slide 119: Creating Programs in Emacs CHAPTER 8
119
TIP
To exit any version of Emacs, type C-x C-c.
Moving Around
Although Emacs usually responds appropriately if you use the arrow keys, we recommend you learn the “Emacs way.” At first, it will seem awkward, but as you become more comfortable with Emacs, you will find that you work faster because you don’t have to move your fingers off the keyboard. The following list describes how to move around in Emacs: • M-b—Moves the cursor to the beginning of the word left of the cursor • M-f—Moves the cursor to the end of word to the right of the cursor • M-a—Moves to the beginning of the current sentence • M-e—Moves to the end of the current sentence • C-n—Moves the cursor to the next line • C-p—Moves the cursor to the previous line • C-a—Moves the cursor to the beginning of the line • C-e—Moves the cursor the end of the line • C-v—Moves display down one screen full • M-v—Moves display up one screen full • M->—Moves the cursor to the end of the file • M-<—Moves the cursor to the beginning of the file If you open a file ending in .c, Emacs automatically starts in C mode, which has features that the default mode, Lisp Interaction, lacks. M-C-a, for example, moves the cursor to the beginning of the current function, and M-C-e moves the cursor to the end of the current function. In addition to new commands, C mode modifies the behavior of other Emacs commands. In C mode, for instance, M-a moves the cursor to the beginning of the innermost C statement, and M-e moves the cursor to the end of the innermost C statement. You can also apply a “multiplier” to almost any Emacs command by typing C-u [N], where N is any integer. C-u by itself has a default multiplier value of 4. So, C-u 10 C-n will move the cursor down ten lines. C-u C-n moves the cursor down the default four lines. If your Alt key works like the Meta (M-) key, M-n, where n is some digit, works as a multiplier, too.
8
CREATING PROGRAMS IN EMACS
Slide 120: 120
The Linux Programming Toolkit PART I
Inserting Text
Emacs editing is simple: just start typing. Each character you type is inserted at the “point,” which, in most cases, is the cursor. In classic GNU style, however, Emacs’ documentation muddles what should be a clear, simple concept making an almost pointless distinction between the point and the cursor. “While the cursor appears to point *at* a particular character, you should think of point as *between* two characters; it points before the character that appears under the cursor (GNU Emacs Manual, 15).” Why the distinction? The word “point” referred to “.” in the TECO language in which Emacs was originally developed. “.” was the command for obtaining the value at what is now called the point. In practice, you can generally use the word “cursor” anywhere the GNU documentation uses “point.” To insert a blank line after the cursor, type C-x o. C-o inserts a blank line above the current line and positions the cursor at the beginning of the line. C-x C-o deletes all but one of multiple consecutive blank lines.
Deleting Text
Del and, on most PC systems, Backspace, erases the character to the left of the cursor. C-d deletes the character under the cursor. C-k deletes from the current cursor location to the end of the line, but, annoyingly, doesn’t delete the terminating newline (it does delete the newline if you use the multiplier; that is, C-u 1 C-k deletes the line, newline and all). To delete all the text between the cursor and the beginning of a line, use C-x Del. To delete a whole region of text, follow these steps: 1. Move the cursor to the first character of the region. 2. Type C-@ (C-SPACE) to “set the mark.” 3. Move the cursor to the first character past the end of the region. 4. Type C-w to delete, or “wipe,” the region. If you want to make a copy of a region, type M-w instead of C-w. If you lose track of where the region starts, C-x C-x swaps the location of the cursor and the mark. In C mode, M-C-h combines moving and marking: It moves the cursor to the beginning of the current function and sets a mark at the end of the function. If you delete too much text, use C-x u to “undo” the last batch of changes, which is usually just your last edit. The default undo buffer size is 20,000 bytes, so you can continue the undo operation. To undo an undo, type M-C-x u. To cut and paste, use M-w to copy a region of text, move to the location in the buffer where you want to insert the text, and perform a “yank” by typing C-y.
Slide 121: Creating Programs in Emacs CHAPTER 8
121
To facilitate yanking and undoing, Emacs maintains a kill ring of your last 30 deletions. To see this in action, first delete some text, move elsewhere, and then type C-y to yank the most recently deleted text. Follow that with M-y, which replaces the text yanked with the next most recently deleted text. To cycle further back in the kill ring, continue typing M-y.
Search and Replace
Emacs’ default search routine is a non–case-sensitive incremental search, invoked with C-s. When you type C-s, the minibuffer prompts for a search string, as shown in Figure 8.4. FIGURE 8.4
Minibuffer prompt for an incremental search.
8
CREATING PROGRAMS IN EMACS
Prompt
In most cases, a non–case-sensitive search will be sufficient, but, when writing C code, which is case sensitive, it may not have the desired result. To make case-sensitive searches the default, add the following line to the Emacs initialization file, ~/.emacs:
(setq case-fold-search nil)
As you type the string, Emacs moves the cursor to the next occurrence of that string. To advance to the next occurrence, type C-s again. Esc cancels the search, leaving the cursor at its current location. C-g cancels the search and returns the cursor to its original location. While in a search, Del erases the last character in the search string and backs the cursor up to its previous location. A failed search beeps at you annoyingly and writes “Failed I-search” in the minibuffer. Incremental searches can wrap to the top of the buffer. After an incremental search fails, another C-s forces the search to wrap to the top of the buffer. If you want to search backwards through a buffer, use C-r.
Slide 122: 122
The Linux Programming Toolkit PART I
Emacs also has regular expression searches, simple (non-incremental) searches, searches that match entire phrases, and, of course, two search-and-replace functions. The safest search-and-replace operation is M-%, which performs an interactive search and replace. Complete the following steps to use M-%: 1. Type M-%. 2. Type the search string and press Enter. 3. Type the replacement string and press Enter. 4. At the next prompt, use one of the following: SPACE or y Make the substitution and move to the next occurrence of search string Del or n Skip to the next occurrence of search string ! Perform global replacement without further prompts . Make the substitution at current location, and then exit the searchand-replace operation M- or q Exit the search and replace, and place cursor at its original location ^ Backtrack to the previous match C-x r Start a recursive edit Figure 8.5 shows the results of these steps. Recursive edits allow you to make more extensive edits at the current cursor location. Type M-C-c to exit the recursive editing session and return to your regularly scheduled search and replace. Other search and replace variants include M-x query-replace-regexp, which executes an interactive search and replace using regular expressions. For the very stout of heart or those confident of their regular expression knowledge, consider M-x replace-regexp, which performs a global, unconditional (sans prompts) search and replace using regular expressions.
Slide 123: Creating Programs in Emacs CHAPTER 8
123
FIGURE 8.5
Search and replace minibuffer prompt. Minibuffer after Step 1
Minibuffer after Step 2
Minibuffer after Step 4
8
Saving and Opening Files
To save a file, use C-x C-s. Use C-x C-w to save the file using a new name. To open a file into the current buffer, type C-x C-f to “visit” the file, type the filename in the minibuffer, and press Enter, which opens it in the current buffer. If you only want to browse a file without editing it, you can open it in read-only mode using C-x C-r, typing the filename in the minibuffer, and pressing Enter. Having opened a file in read-only mode, it is still possible to edit the buffer. Like most editors, Emacs opens a new buffer for each file visited and keeps the buffer contents separate from the disk file until explicitly told to write the buffer to disk using C-x C-f or C-x C-w. So, you can edit a read-only buffer by typing C-x C-q, but you won’t be able to save it to disk unless you change its name. Emacs makes two kinds of backups of files you edit. The first time you save a file, Emacs creates a backup of the original file in the current directory by appending a ~ to the filename. The other kind of backup is made for crash recovery. Every 300 keystrokes (a default you can change), Emacs creates an auto-save file. If your system crashes and you later revisit the file you were editing, Emacs will prompt you to do a file recovery, as shown in Figure 8.6.
CREATING PROGRAMS IN EMACS
Slide 124: 124
The Linux Programming Toolkit PART I
FIGURE 8.6
Recovering a file after a crash.
Prompt to perform file recovery
Multiple Windows
Emacs uses the word “frame” to refer to separate Emacs windows because it uses “window” to refer to a screen that has been divided into multiple sections with independently controlled displays. This distinction dates back to Emacs’ origins, which predate the existence of GUIs capable of displaying multiple screens. To display two windows, Emacs divides the screen into two sections, as illustrated in Figure 8.7. FIGURE 8.7
Emacs windows. Window 1
Window 2
To create a new window, type C-x 2, which splits the current window into two windows. The cursor remains in the “active” or current window. The following is a list of commands for moving among and manipulating various windows: • C-x o—Move to the other window • C-M-v—Scroll the other window • C-x 0—Delete the current window • C-x 1—Delete all windows except the current one
Slide 125: Creating Programs in Emacs CHAPTER 8
125
• C-x 2—Split screen into two windows • C-x 3—Split the screen horizontally, rather than vertically • C-x 4 C-f—“Visit” a file into the other window Note that deleted buffers are hidden, not closed, or “killed” in Emacs’ parlance. To close a buffer, switch to that buffer and type C-x k and press Enter. If the buffer has not been saved, Emacs prompts you to save it. Under the X Window system, you can also create new frames, windows that are separate from the current window, using the following commands: • C-x 5 2—Create a new frame of the same buffer • C-x 5 f—Create a new frame and open a new file into it • C-x 5 0—Close the current frame When using framed windows, be careful not to use C-x C-c to close a frame, because it will close all frames, not just the current one, thus terminating your Emacs session.
Features Supporting Programming
Emacs has modes for a wide variety of programming languages. These modes customize Emacs’ behavior to fit the syntax and indentation requirements of the language. Supported languages include several varieties of Lisp, C, C++, Fortran, Awk, Icon, Java, Objective-C, Pascal, Perl, and Tcl. To switch to one of the language modes, type M-x [language]-mode, replacing [language] with the mode you want. So, to switch to Java mode, type M-x java-mode.
8
CREATING PROGRAMS IN EMACS
Indenting Conveniences
Emacs automatically indents your code while you type it. In fact, it can enforce quite a few indentation styles. The default style is gnu, a style conforming to GNU’s coding standards. Other supported indentation styles include k&r, bsd, stroustrup, linux, python, java, whitesmith, ellemtel, and cc. To use one of the supported indentation styles, type the command M-x c-set-style followed by Enter, enter the indentation style you want to use, and press Enter again. Note that this will only affect newly visited buffers; existing buffers will be unaffected. Each line that begins with a Tab will force subsequent lines to indent correctly, depending on the coding style used. When you are using one of Emacs’ programming modes, pressing Tab in the middle of a line automatically indents it correctly.
Slide 126: 126
The Linux Programming Toolkit PART I
Syntax Highlighting
Emacs’ font-lock mode turns on a basic form of syntax highlighting. It uses different colors to mark syntax elements. To turn on font-lock mode, type M-x font-lock-mode and press Enter. Figure 8.8 illustrates font-lock mode in the C major mode. FIGURE 8.8
The effect of fontlock mode on C code.
Using Comments
M-; inserts comment delimiters (/* */) on the current line and helpfully positions the cursor between them. The comment will be placed (by default) at column 32. If the current line is longer than 32 characters, Emacs places the comment just past the end of the line, as illustrated in Figure 8.9. FIGURE 8.9
Inserting comments.
Slide 127: Creating Programs in Emacs CHAPTER 8
127
If you are creating a multi-line comment, an Emacs minor mode, auto-fill, will indent and line wrap comment lines intelligently. To set this minor mode, use the M-x autofill-mode command. In the middle of an existing comment, M-; aligns the comment appropriately. If you have a whole region that you want to convert to comments, select the region and type M-x comment-region. Although not strictly related to comments, Emacs helps you make or maintain a change log for the file you’re editing. To create or add an entry to a change log in the current directory, type C-x 4 a. The default filename is ChangeLog.
Compilation Using Emacs
The command M-x compile compiles code using, by default, make -k. Ordinarily, this would require the presence of a Makefile in the current directory. If you are using GNU make, however, you can take advantage of a shortcut. For example, if you are working on a file named rdline.c and want to compile it, type M-x compile. Then, when the buffer prompts for a filename, type rdline.o, as illustrated in Figure 8.10. GNU make has an internal suffix rule that says, in effect, for a given file FILE.o, create it with the make command “cc -c FILE.c -o FILE.o”. FIGURE 8.10
Compiling a program within Emacs.
8
CREATING PROGRAMS IN EMACS
The first time you issue the compile command, Emacs sets the default compile command for the rest of the session to the make command you enter. If you have not yet saved the buffer, Emacs asks if you want to. When you compile from within Emacs, it creates a scratch buffer called the compilation buffer, which lists the make commands executed, any errors that occur, and the compilation results. If any error occurs, Emacs includes an error-browsing feature that takes you to the location of each error. In Figure 8.11, an error occurred while compiling rdline.c.
Slide 128: 128
The Linux Programming Toolkit PART I
FIGURE 8.11
The compilation buffer lists compilation messages, including errors.
Compilation buffer
To go to the line where the error occurred, type C-x ` (back quote); Emacs positions the cursor at the beginning of the line containing the error, as illustrated in Figure 8.12. FIGURE 8.12
C-x ` positions the cursor on the line containing the error.
If there are other errors, C-x ` will take you to each error in the source file. Unfortunately, the progression is only one way; you cannot backtrack the error list. This shortcoming aside, Emacs’ error browsing feature is very handy. To close the compilation buffer, use the command C-x 1 to delete all buffers except the current one. Tag support is another handy Emacs programming feature. Tags are a type of database that enables easy source code navigation by cross-referencing function names and, optionally, typedefs, to the files in which they appear and are defined. Tags are especially useful for locating the definitions of function names or typedefs. The etags program creates tag files that Emacs understands. To create an Emacs tag file, execute the following command:
$ etags -t <list of files>
Slide 129: Creating Programs in Emacs CHAPTER 8
129
This command creates the tags database, TAGS by default, in the current directory. <list of files> is the files for which you want tags created. The -t option will include typedefs in the tag file. So, to create a tag file of all the C source and header files in the current directory, the command is:
$ etags -f *.[ch]
Once you’ve created the tag file, use the following commands to take advantage of it: • M-. tagname—Finds the file containing the definition of tagname and opens it in a new buffer, replacing the previous buffer • C-x 4 . tagname—Functions like M-., but visits the file into another window • C-x 5 . tagname—Functions like C-x 4., but visits the file into another frame Emacs’ tags facility makes it very easy to view a function’s definition while editing another file. You can also perform search-and-replace operations using tag files. To perform an interactive search and replace: 1. Type M-x tags-query-replace and press Enter. 2. Type the search string and press Enter. 3. Type the replacement string and press Enter. CREATING PROGRAMS IN EMACS 4. Press Enter to accept the default tags table, TAGS, or type another name and press Enter. 5. Use the commands described for the query-replace operation. Another help feature allows you to run a region of text through the C preprocessor, so you can see how it expands. The command to accomplish this feat is C-c C-e. Figure 8.13 illustrates how it works. FIGURE 8.13
Running a text region through the C preprocessor.
8
Slide 130: 130
The Linux Programming Toolkit PART I
In the top window, we define a preprocessor macro named square(x). After marking the region, type C-c C-e. The bottom window, named *Macroexpansion*, shows how the preprocessor expanded the function. Pretty neat, huh?
Customization in Brief
In this section, we list a few commands you can use to customize Emacs’ behavior. We can only scratch the surface, however, so we will forgo long explanations of why the customizations we offer work and ask, instead, that you simply trust us that they do work.
Using the ~/.emacs File
Table 8.1 lists the commands and variables that you will find useful for customizing Emacs. They control various elements of Emacs’ default behavior. Table 8.1 Name
inhibit-default-init case-fold-search user-mail-address
EMACS COMMANDS
AND
VARIABLES Description
Disables any site-wide customizations Sets case sensitivity of searches Contains user’s mail address
Type
Command Command Variable
The file ~/.emacs ($HOME/.emacs) contains Lisp code that is loaded and executed each time Emacs starts. To execute a Lisp command, use the syntax
(setq lisp-command-name [arg])
For example, (setq inhibit-default-init t) executes the Emacs Lisp command inhibit-default-init with a value of “t” (for true). arg may be either Boolean (t = true, nil = false), a positive or negative digit, or a double quote delimited string. To set a variable value, the syntax is
(set-variable varname value)
some_guy@call_me_now.com) some_guy@call_me_now.com.
This initializes varname to value. So, (set-variable user-mail-address sets the variable user-mail-address to
You can also set variables and execute commands on-the-fly within Emacs. First, type C-x b to switch to another buffer. Press Tab to view a list of the available buffers in the echo area. Figure 8.14 shows what the buffer list might look like.
Slide 131: Creating Programs in Emacs CHAPTER 8
131
FIGURE 8.14
Sample buffer list.
Echo area Type buffer name here Available buffers
Now, type *scratch* and press Enter. Finally, to execute a Lisp command, type, for example, (setq case-fold-search t), and press C-j. Lisp evaluates the statement between parentheses, displaying the result on the line below the command, as illustrated in Figure 8.15. FIGURE 8.15
Screen after execution of Lisp command. Command entered Result
8
CREATING PROGRAMS IN EMACS
Follow a similar procedure to set a variable value. The syntax takes the general form
(setq set-variable varname value)
The behavior is exactly the same as executing a command. For example, to set usermail-address on-the-fly, the command to type in the scratch buffer is (setq set-variable user-mail-address “someone@somewhere.com”) followed by C-j.
Slide 132: 132
The Linux Programming Toolkit PART I
Creating and Using Keyboard Macros
This section will briefly describe how to create and execute keyboard macros within Emacs. Keyboard macros are user-defined commands that represent a whole sequence of keystrokes. They are a quick, easy way to speed up your work. For example, the section on deleting text pointed out that the command C-k at the beginning of a line would delete all the text on the line, but not the newline. In order to delete the newline, too, you have to either type C-k twice or use the multiplier with an argument of 1, that is, C-u 1 C-k. In order to make this more convenient, you can define a keyboard macro to do this for you. To start, type C-x (, followed by the commands you want in the macro. To end the definition, type C-x ). Now, to execute the macro you’ve just defined, type C-x e, which stands for the command call-last-kbd-macro. Actually, the macro was executed the first time while you defined it, allowing you to see what it was doing as you were defining it. If you would like to see the actual commands, type C-x C-k, which will start a special mode for editing macros, followed by C-x e to execute the macro. This will format the command in a special buffer. The command C-h m will show you instructions for editing the macro. C-c C-c ends the macro editing session. The material in this section should give you a good start to creating a highly personal and convenient Emacs customization. For all of the gory details, see Emacs’ extensive info (help) file. The next section introduces you to enough Emacs Lisp to enable you to further customize your Emacs development environment.
Automating Emacs with Emacs Lisp
Emacs can be customized by writing functions in Elisp (or Emacs Lisp). You have already seen how to customize Emacs by using the file ~/.emacs. It is assumed that you have some knowledge of Lisp programming. The full reference to Emacs Lisp, GNU Emacs Lisp Reference Manual (written by Bill Lewis, Dan Laliberte, and Richard Stallman), can be found on the Web at http://www.gnu.org/manual/elisp-manual-202.5/elisp.html. In this section, you will see how to write a simple Emacs Lisp function that modifies text in the current text buffer. Emacs Lisp is a complete programming environment, capable of doing file I/O, building user interfaces (using Emacs), doing network programming for retrieving email, Usenet news, and so on. However, most Emacs Lisp programming involves manipulating the text in Emacs edit buffers.
Slide 133: Creating Programs in Emacs CHAPTER 8
133
Listing 8.1 shows a very simple example that replaces the digits “0”, “1”, and so on with the strings “ZERO”, “ONE”, and so on. Listing 8.1
sample.el
(defun sample () (let* ((txt (buffer-string)) (len (length txt)) (x nil)) (goto-char 0) (dotimes (n len) ;; see if the next character is a number 0, 1, .. 9 (setq x (char-after)) (if x (let () (setq x (char-to-string (char-after))) (if x (let () (if (equal x “0”) (replace-char “ZERO”)) (if (equal x “1”) (replace-char “ONE”)) (if (equal x “2”) (replace-char “TWO”)) (if (equal x “3”) (replace-char “THREE”)) (if (equal x “4”) (replace-char “FOUR”)) (if (equal x “5”) (replace-char “FIVE”)) (if (equal x “6”) (replace-char “SIX”)) (if (equal x “7”) (replace-char “SEVEN”)) (if (equal x “8”) (replace-char “EIGHT”)) (if (equal x “9”) (replace-char “NINE”)))))) ;; move the text pointer forward (forward-char)))) (defun replace-char (s) (delete-char 1) (insert s))
8
CREATING PROGRAMS IN EMACS
char
The example in Listing 8.1 defines two functions: sample and replace-char. replaceis a helper function that only serves to make the sample function shorter. This example uses several text-handling utility functions that are built in to Emacs Lisp: • buffer-string—Returns as a string the contents of the current Emacs text buffer • length—Returns the number of characters in a string • char-after—Returns the character after the Emacs edit buffer insert point • char-to-string—Converts a character to a string • forward-char—Moves the Emacs edit buffer insert point forward by one character position
Slide 134: 134
The Linux Programming Toolkit PART I
• delete-char—Deletes the character immediately following the Emacs edit buffer insert point • insert—Inserts a string at the current Emacs edit buffer insert point You can try running this example by either copying the sample.el file into your ~/.emacs file or using M-x load-file to load sample.el. You run the program by typing M-: (sample). Typing M-: should give you a prompt Eval:. Much of the functionality of Emacs comes from Emacs Lisp files that are auto-loaded into the Emacs environment. When you install Emacs in your Linux distribution, one of the options is to install the Emacs Lisp source files (the compiled Emacs Lisp files are installed by default). Installing the Emacs Lisp source files provides many sample programs for doing network programming, adding menus to Emacs, and so on.
Summary
Emacs is a rich, deep programming environment. This chapter introduced you to the basics of editing and writing programs with GNU Emacs. It covered starting and stopping Emacs, cursor movement, basic editing functions, and search-and-replace operations. In addition, you learned how to use Emacs features that support programming, such as using tags tables, special formatting, syntax highlighting, and running sections of code through the C preprocessor. The chapter also showed you how to perform basic Emacs customization using the ~/.emacs initialization file, keyboard macros, and Emacs Lisp.
Slide 135: System Programming
PART
II
IN THIS PART
• I/O Routines 137 161 173 215 • File Manipulation • Process Control • Accessing System Information • Handling Errors 229 247 • Memory Management
Slide 137: I/O Routines
by Mark Whitis
CHAPTER 9
IN THIS CHAPTER
• File Descriptors 138 138 • Calls That Use File Descriptors • Types of Files 152
Slide 138: 138
System Programming PART II
This chapter covers file descriptor–based I/O. This type of file I/O is UNIX specific, although C development environments on many other platforms may include some support. The use of file pointer (stdio) based I/O is more portable and will be covered in the next chapter, “File Manipulation.” In some cases, such as tape I/O, you will need to use file descriptor–based I/O. The BSD socket programming interface for TCP/IP (see Chapter 19, “TCP/IP and Socket Programming”) also uses file descriptor–based I/O, once a TCP session has been established. One of the nice things about Linux, and other UNIX compatible operating systems, is that the file interface also works for many other types of devices. Tape drives, the console, serial ports, pseudoterminals, printer ports, sound cards, and mice are handled as character special devices which look, more or less, like ordinary files to application programs. TCP/IP and UNIX domain sockets, once the connection has been established, are handled using file descriptors as if they were standard files. Pipes also look similar to standard files.
File Descriptors
A file descriptor is simply an integer that is used as an index into a table of open files associated with each process. The values 0, 1, and 2 are special and refer to the stdin, stdout, and stderr streams; these three streams normally connect to the user’s terminal but can be redirected. There are many security implications to using file descriptor I/O and file pointer I/O (which is built on top of file descriptor I/O); these are covered in Chapter 35, “Secure Programming.” The workarounds actually rely heavily on careful use of file descriptor I/O for both file descriptor and file pointer I/O.
Calls That Use File Descriptors
A number of system calls use file descriptors. This section includes brief descriptions of each of those calls, including the function prototypes from the man pages and/or header files. Most of these calls return a value of -1 in the event of error and set the variable errno to the error code. Error codes are documented in the man pages for the individual system calls and in the man page for errno. The perror() function can be used to print an error message based on the error code. Virtually every call in this chapter is mentioned in Chapter 35. Some calls are vulnerable, others are used to fix vulnerabilities, and many wear both hats. The calls that take file descriptors are much safer than those that take filenames.
Slide 139: I/O Routines CHAPTER 9
139
Each section contains a code fragment that shows the necessary include files and the prototype for the function(s) described in that section, copied from the man pages for that function.
The open() Call
The open() call is used to open a file. The prototype for this function and descriptions for its variables and flags follow.
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int flags). int open(const char *pathname, int flags, mode_t mode);
The pathname argument is simply a string with the full or relative pathname to the file to be opened. The third parameter specifies the UNIX file mode (permissions bits) to be used when creating a file and should be present if a file may be created. The second parameter, flags, is one of O_RDONLY, O_WRONLY, or O_RDWR, optionally OR-ed with additional flags; Table 9.1 lists the flag values. TABLE 9.1 Flag
O_RDONLY O_WRONLY O_RDWR O_CREAT O_EXCL O_NOCTTY
FLAGS
FOR THE open()
CALL
Description
Open file for read-only access. Open file for write-only access. Open file for read and write access. Create the file if it does not exist. Fail if the file already exists. Don’t become controlling tty if opening tty and the process had no controlling tty. Truncate the file to length 0 if it exists. Append file pointer will be positioned at end of file. If an operation cannot complete without delay, return before completing the operation. (See Chapter 22, “Non-blocking Socket I/O.”) Same as O_NONBLOCK. Operations will not return until the data has been physically written to the disk or other device.
9
I/O ROUTINES
O_TRUNC O_APPEND O_NONBLOCK
O_NODELAY O_SYNC
open() returns a file descriptor unless an error occurred. In the event of an error, it will return -1 and set the variable errno.
Slide 140: 140
System Programming PART II
NOTE
The creat() call is the same as open() with O_CREAT|O_WRONLY|O_TRUNC.
The close() Call
You should close a file descriptor when you are done with it. The single argument is the file descriptor number returned by open(). The prototype for close() is as follows.
#include <unistd.h> int close(int fd);
Any locks held by the process on the file are released, even if they were placed using a different file descriptor. If closing the file causes the link count to reach zero, the file will be deleted. If this is the last (or only) file descriptor associated with an open file, the entry in the open file table will be freed. If the file is not an ordinary file, other side effects are possible. The last close on one end of a pipe may affect the other end. The handshake lines on a serial port might be affected. A tape might rewind.
The read() Call
The read() system call is used to read data from the file corresponding to a file descriptor.
#include <unistd.h> ssize_t read(int fd, void *buf, size_t count);
The first argument is the file descriptor that was returned from a previous open() call. The second argument is a pointer to a buffer to copy the data from, and the third argument gives the number of bytes to read. Read() returns the number of bytes read or a value of –1 if an error occurs (check errno).
The write() Call
The write() system call is used to write data to the file corresponding to a file descriptor.
#include <unistd.h> ssize_t write(int fd, const void *buf, size_t count);
Slide 141: I/O Routines CHAPTER 9
141
The first argument is the file descriptor which was returned from a previous open() call. The second argument is a pointer to a buffer to copy the data to (which must be large enough to hold the data) and the third argument gives the number of bytes to write. write() returns the number of bytes read or a value of -1 if an error occurs (check
errno).
The ioctl() Call
The ioctl() system call is a catchall for setting or retrieving various parameters associated with a file or to perform other operations on the file. The ioctls available, and the arguments to ioctl(), vary depending on the underlying device.
#include <sys/ioctl.h> int ioctl(int d, int request, ...)
The argument d must be an open file descriptor.
The fcntl() Call
The fcntl() call is similar to ioctl() but it sets or retrieves a different set of parameters.
#include <unistd.h> #include <fcntl.h> int fcntl(int fd, int cmd); int fcntl(int fd, int cmd, long arg);
Unlike ioctl(), these parameters are generally not controlled by the low-level device driver. The first argument is the file descriptor, the second is the command, and the third is usually an argument specific to the particular command. Table 9.2 lists the various command values that can be used for the second argument of the fcntl() call. TABLE 9.2 Command
F_DUPFD F_GETFD
9
I/O ROUTINES
COMMANDS
FOR fcntl()
Description
Duplicates file descriptors. Use dup2() instead. Gets close-on-exec flag. The file will remain open across exec() family calls if the low order bit is 0. Sets close-on-exec flag. Gets the flags set by open. Changes the flags set by open.
continues
F_SETFD F_GETFL F_SETFL
Slide 142: 142
System Programming PART II
TABLE 9.2 Command
F_GETLK F_SETLK F_SETLKW F_GETOWN
CONTINUED
Description
Gets discretionary file locks (see flock().) Sets discretionary lock, no wait. Sets discretionary lock, wait if necessary. Retrieves the process id or process group number that will receive the SIGIO and SIGURG signals. Sets the process id or process group number.
F_SETOWN
Since there are other ways to do most of these operations, you may have little need to use fcntl().
The fsync() Call
The fsync() system call flushes all of the data written to file descriptor fd to disk or other underlying device.
#include <unistd.h> int fsync(int fd); #ifdef _POSIX_SYNCHRONIZED_IO int fdatasync(int fd); #endif
The Linux filesystem may keep the data in memory for several seconds before writing it to disk in order to more efficiently handle disk I/O. A zero is returned if successful; otherwise -1 will be returned and errno will be set. The fdatasync() call is similar to fsync() but does not write the metadata (inode information, particularly modification time).
The ftruncate() Call
The ftruncate() system call truncates the file referenced by file descriptor fd to the length specified by length.
#include <unistd.h> int ftruncate(int fd, size_t length);
Return values are zero for success and -1 for an error (check errno).
Slide 143: I/O Routines CHAPTER 9
143
The lseek() Call
The lseek() function sets the current position of reads and writes in the file referenced by file descriptor files to position offset.
#include <sys/types.h> #include <unistd.h> off_t lseek(int fildes, off_t offset, int whence);
Depending on the value of whence, the offset is relative to the beginning (SEEK_SET), current position (SEEK_CUR), or end of file (SEEK_END). The return value is the resulting offset (relative to the beginning of the file) or a value of (off_t) -1 in the case of error (errno will be set).
The dup() and dup2() Calls
The system calls dup() and dup2() duplicate file descriptors. dup() returns a new descriptor (the lowest numbered unused descriptor). dup2() lets you specify the value of the descriptor that will be returned, closing newfd first, if necessary; this is commonly used to reopen or redirect a file descriptor.
#include <unistd.h> int dup(int oldfd); int dup2(int oldfd, int newfd);
Listing 9.1 illustrates using dup2() to redirect standard output (file descriptor 1) to a file. The function print_line() formats a message using snprintf(), a safer version of sprintf(). We don’t use printf() because that uses file pointer I/O, although the next chapter and Chapter 35 will show how to open a file pointer stream over a file descriptor stream. The results of running the program are shown in Listing 9.2.
dup()
9
I/O ROUTINES
and dup2() return the new descriptor or return -1 and set errno. The new and old descriptors share file offsets (positions), flags, and locks but not the close-on-exec flag.
dup.c—REDIRECTING
LISTING 9.1
#include #include #include #include #include
STANDARD OUTPUT
WITH dup2()
<sys/types.h> <sys/stat.h> <fcntl.h> <unistd.h> <assert.h> continues
Slide 144: 144
System Programming PART II
LISTING 9.1
CONTINUED
print_line(int n) { char buf[32]; snprintf(buf,sizeof(buf), “Line #%d\n”,n); write(1,buf, strlen(buf)); } main() { int fd; print_line(1); print_line(2); print_line(3); /* redirect stdout to file junk.out */ fd=open(“junk.out”, O_WRONLY|O_CREAT,0666); assert(fd>=0); dup2(fd,1); print_line(4); print_line(5); print_line(6); close(fd); close(1); }
LISTING 9.2
SAMPLE RUN
OF dup.c
$ ./dup Line #1 Line #2 Line #3 $ cat junk.out Line #4 Line #5 Line #6 $
The select() Call
The select() function call allows a process to wait on multiple file descriptors simultaneously with an optional timeout. The select() call will return as soon as it is possible to perform operations on any of the indicated file descriptors. This allows a process to
Slide 145: I/O Routines CHAPTER 9
145
perform some basic multitasking without forking another process or starting another thread. The prototype for this function and its macros is listed below.
#include <sys/time.h> #include <sys/types.h> #include <unistd.h> int select(int n, fd_set *readfds, fd_set fd_set *exceptfds, struct timeval *timeout); FD_CLR(int fd, fd_set *set); FD_ISSET(int fd, fd_set *set); FD_SET(int fd, fd_set *set); FD_ZERO(fd_set *set); select() is one of the more complicated system calls available. You probably won’t need to use it very often but when you do, you really need it. You could issue a bunch of non-blocking reads or writes on the various file descriptors, but that kind of programming is one of the reasons why DOS and Windows applications multitask so poorly; the task keeps running and chewing up CPU cycles even though it has no useful work to do. *writefds,
The first parameter is the number of file descriptors in the file descriptor sets (so the kernel doesn’t have to waste time checking a bunch of unused bits). The second, third, and fourth parameters are pointers to file descriptor sets (one bit per possible file descriptor) that indicate which file descriptors you would like to be able to read, write, or receive exception notifications on, respectively. The last parameter is a timeout value. All but the first parameter may be null. On return the file descriptor sets will be modified to indicate which descriptors are ready for immediate I/O operations. The timeout will also be modified on return, although that is not the case on most systems other than Linux. The return value itself will indicate a count of how many descriptors are included in the descriptor sets. If it is zero, that indicates a timeout. If the return value is -1, errno will be set to indicate the error (which may include EINTR if a signal was caught). The macros FD_ZERO(), FD_SET(), FD_CLEAR, and FD_ISSET() help manipulate file descriptor sets by erasing the whole set, setting the bit corresponding to a file descriptor, clearing the bit, or querying the bit. All but FD_ZERO() take a file descriptor as the first parameter. The remaining parameter for each is a pointer to a file descriptor set. Listing 9.3 has a crude terminal program that illustrates the use of select(). The program doesn’t disable local echo or line buffering on the keyboard, set the baud rate on the serial port, lock the serial line, or do much of anything but move characters between the two devices. If compiled with BADCODE defined, it will spin on the input and output operations tying up CPU. Otherwise, the program will use select() to sleep until it is possible to do some I/O. It will wake up every ten seconds, for no good reason. It is
9
I/O ROUTINES
Slide 146: 146
System Programming PART II
limited to single character buffers so it will make a system call for every character in or out instead of doing multiple characters at a time when possible. My manyterm program also illustrates the use of select(). LISTING 9.3
#include #include #include #include #include #include #include #include /* /* /* /* /* select
BASED TERMINAL PROGRAM
<sys/time.h> <sys/types.h> <sys/stat.h> <fcntl.h> <unistd.h> <assert.h> <stdio.h> /* for fprintf(stderr,... */ <termios.h>
crude terminal program */ - does not lock modem */ - does not disable echo on users terminal */ - does not put terminal in raw mode */ - control-c will abort */
int debug = 0; void dump_fds(char *name, fd_set *set, int max_fd) { int i; if(!debug) return; fprintf(stderr, “%s:”, name); for(i=0; i<max_fd; i++) { if(FD_ISSET(i, set)) { fprintf(stderr, “%d,”, i); } } fprintf(stderr, “\n”); } main() { int keyboard; int screen; int serial; char c; int rc; struct termios tio; #ifndef BADCODE
Slide 147: I/O Routines CHAPTER 9
fd_set readfds; fd_set writefds; fd_set exceptfds; struct timeval tv; int max_fd; /* inbound and outbound keep track of */ /* whether we have a character */ /* already read which needs to be sent in that direction */ /* the _char variables are the data buffer */ int outbound; char outbound_char; int inbound; char inbound_char; #endif keyboard = open(“/dev/tty”,O_RDONLY| O_NONBLOCK); assert(keyboard>=0); screen = open(“/dev/tty”,O_WRONLY| O_NONBLOCK); assert(screen>=0); serial = open(“/dev/modem”, O_RDWR| O_NONBLOCK); assert(serial>=0);
147
if(debug) { fprintf(stderr, “keyboard=%d\n”,keyboard); fprintf(stderr, “screen=%d\n”,screen); fprintf(stderr, “serial=%d\n”,serial); } #ifdef BADCODE while(1) { rc=read(keyboard,&c,1); if(rc==1) { while(write(serial,&c,1) != 1) ; } rc=read(serial,&c,1); if(rc==1) { while(write(screen,&c,1) != 1) ; } } #else outbound = inbound = 0; while(1) { FD_ZERO(&writefds); if(inbound) FD_SET(screen, &writefds); continues
9
I/O ROUTINES
Slide 148: 148
System Programming PART II
LISTING 9.3
CONTINUED
if(outbound) FD_SET(serial, &writefds); FD_ZERO(&readfds); if(!outbound) FD_SET(keyboard, &readfds); if(!inbound) FD_SET(serial, &readfds); max_fd = 0; if(screen > max_fd) max_fd=screen; if(keyboard > max_fd) max_fd=keyboard; if(serial > max_fd) max_fd=serial; max_fd++; if(debug) fprintf(stderr, “max_fd=%d\n”,max_fd); tv.tv_sec = 10; tv.tv_usec = 0; dump_fds(“read in”, &readfds, max_fd); dump_fds(“write in”, &writefds, max_fd); rc= select(max_fd, &readfds, &writefds, NULL, &tv); dump_fds(“read out”, &readfds, max_fd); dump_fds(“write out”, &writefds, max_fd);
if(FD_ISSET(keyboard, &readfds)) { if(debug) fprintf(stderr, “\nreading outbound\n”); rc=read(keyboard,&outbound_char,1); if(rc==1) outbound=1; if(outbound == 3) exit(0); } if(FD_ISSET(serial, &readfds)) { if(debug) fprintf(stderr, “\nreading inbound\n”); rc=read(serial,&inbound_char,1); if(rc==1) inbound=1; } if(FD_ISSET(screen, &writefds)) { if(debug) fprintf(stderr, “\nwriting inbound\n”); rc=write(screen,&inbound_char,1); if(rc==1) inbound=0; } if(FD_ISSET(serial, &writefds)) { if(debug) fprintf(stderr, “\nwriting outbound\n”); rc=write(serial,&outbound_char,1); if(rc==1) outbound=0;
Slide 149: I/O Routines CHAPTER 9
} } #endif }
149
The fstat() Call
The fstat() system call returns information about the file referred to by the file descriptor files, placing the result in the struct stat pointed to by buf(). A return value of zero is success and -1 is failure (check errno).
#include <sys/stat.h> #include <unistd.h> int fstat(int filedes, struct stat *buf);
Here is the definition of struct stat, borrowed from the man page:
struct stat { dev_t st_dev; ino_t st_ino; mode_t st_mode; nlink_t st_nlink; uid_t st_uid; gid_t st_gid; dev_t st_rdev; off_t st_size; unsigned long st_blksize; unsigned long st_blocks; time_t st_atime; time_t st_mtime; time_t st_ctime; };
/* /* /* /* /* /* /* /* /* /* /* /* /*
device */ inode */ protection */ number of hard links */ user ID of owner */ group ID of owner */ device type (if inode device) */ total size, in bytes */ blocksize for filesystem I/O */ number of blocks allocated */ time of last access */ time of last modification */ time of last change */
9
I/O ROUTINES
This call is safer than its cousins stat() and even lstat().
The fchown() Call
The fchown() system call lets you change the owner and group associated with an open file.
#include <sys/types.h> #include <unistd.h> int fchown(int fd, uid_t owner, gid_t group);
Slide 150: 150
System Programming PART II
The first parameter is the file descriptor, the second the numerical user id, and the third the numerical group id. A value of -1 for either owner or group will leave that value unchanged. Return values are zero for success and -1 for failure (check errno).
*Note
An ordinary user may change the file’s group to any group they belong to. Only root may change the owner to any group.
The fchown() call is safer than its cousin chown(), which takes a pathname instead of a file descriptor.
The fchmod() Call
The fchmod() call changes the mode (permission bits) of the file referenced by fildes to mode.
#include <sys/types.h> #include <sys/stat.h> int fchmod(int fildes, mode_t mode);
Modes are frequently referred to in octal, a horrid base 8 numbering system that was used to describe groups of 3 bits when some systems could not print the letters A–F required for hexadecimal notation. Remember that one of the C language’s unpleasant idiosyncrasies is that any numeric constant that begins with a leading zero will be interpreted as octal. Return values are zero for success and -1 for error (check errno). Table 9.3 shows the file mode bits that may be OR-ed together to make the file mode. TABLE 9.3 Octal
04000 02000 01000 00400 00200 00100
FILE MODES Symbolic
S_ISUID S_ISGID S_SVTX S_IRUSR S_IWUSR S_IXUSR
Description
Set user id (setuid) Set group id (setgid) Sticky bit User (owner) may read User (owner) may write User (owner) may execute/search
I'm definitely enjoying the information. All <a href="http://http://fashion-style-snob.blogspot.com/">Burberry Handbags </a> is created with only the very best materials and top-quality engineering and design. Pandora charms style is the look that many fashionable folks are targeting.
<a href="http://style-snob-club.blogspot.com/">Burberry Outlet</a>
<a href="http://style-snob-zone.blogspot.com/">Gucci Handbags</a>
<a href="http://http://women-style-snob-club.blogspot.com/">Louis Vuitton Outlet</a>
<a href="http://fashion-style-snob.blogspot.com/">Chanel Outlet</a>
<a href="http://women-style-snob-club.blogspot.com/">True Religion Jeans</a>
<a href="http://blog.style-snob.com/">Christian Louboutin Boots</a>
Burberry Handbags
Discount Burberry outlet
Burberry Handbags Outlet
Cheap Burberry Handbags
Burberry Sport series of [url=http://www.burberry1856.net/]Burberry Bags[/url] are different from the previous concept, its design inspired by Burberry concept of motion, reflecting the brand innovation and rejuvenation of the spirit.[url=http://www.burberry1856.net/]Burberry On Sale[/url] As Christopher Bailey said: "the campaign is an extension of a human nature! We truly want to explore Burberry combines technical, functional, and sport-related and absorb all the elements of modern and innovative design.
Burberry Sport series of <a href="http://www.burberry1856.net/">Burberry Bags</a> are different from the previous concept, its design inspired by Burberry concept of motion, reflecting the brand innovation and rejuvenation of the spirit.<a href="http://www.burberry1856.net/">Burberry On Sale</a> As Christopher Bailey said: "the campaign is an extension of a human nature! We truly want to explore Burberry combines technical, functional, and sport-related and absorb all the elements of modern and innovative design.
thank you