This is a program identifier database package. These tools provide a logical extension to ctags. (which is limited in that it only stores the location of function and type *definitions*a) The ID facility stores the locations for all uses of identifiers, pre-processor names, and numbers. (in decimal, octal or hex) When fixing or enhancing a large program (particularly one that is unfamiliar) it is often necessary to audit the use of global data-structures in order to verify that the proposed modification will not trigger any hidden `gotchas'. Often this entails grepping through many thousands of lines of source code spread over dozens and sometimes hundreds of source files in multiple sub-directories. This process places a significant load on computing resources, and takes a long time. There is even the danger that a programmer will avoid doing a complete audit due to the perceived cost--he or she will rely on memory and hope that there are no booby traps. The id-database is most useful for maintaining large programs that consist of many source files. The database is simply a two dimensional boolean array indexed by identifier-name and source-file-name. For a given identifier and source-file, if the identifier occurs in the file, the boolean value is TRUE. The database may be queried either by identifier-name or file-name. The following types of queries supported: * name lookup list all the files where an identifier occurs. The name may be a regular expression. * name apropos list all the files for all identifiers that have the sub-string name in them. Matches are done in a case-insensitive mammer. * name `grep' search for an identifier in all the files where it occurs. This is an optimized `grep' over all the sources--we only search on files that contain the identifier. * name edit invoke an editor on the files where an identifier occurs, and use the identifier as an initial search string. * file lookup list all identifiers that occur in a file, or list the identifiers that are common between two files. * non-unique names list the names of all indentifiers whose names are non-unique within some number of characters. This is useful when porting a program from a `flexnames' system to one more limited names. * solo list all identifiers that occur exactly once in a software system. This may be useful for locating identifiers that are declared but never used, or library functions that are used but never declared. The first four queries are handled by one program. The type of query is determined by the name the program was invoked with. The four links are lid(1) for `lookup id', aid(1) for `apropos id', gid(1) for `grep id' and eid(1) for `edit id'. One or more identifiers may be passed on the command line. The identifiers may be literal strings or regular expressions. Here are some examples: $ lid FILE FILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,opensrc,scan-asm,scan-c}.c $ lid FILE$ AF_FILE mkid.c AF_IDFILE mkid.c FILE extern.h {fid,gets0,getsFF,idx,init,lid,mkid,opensrc,scan-asm,scan-c}.c IDFILE id.h {fid,lid,mkid}.c IdFILE {fid,lid}.c argFILE mkid.c gidFILE lid.c idFILE {init,mkid}.c inFILE {gets0,getsFF,scan-asm,scan-c}.c openSrcFILE extern.h {idx,mkid,opensrc}.c srcFILE {idx,mkid,opensrc}.c $ lid ^get get opensrc.c getAdaId getscan.c getAsmId extern.h {getscan,scan-asm}.c getCId extern.h {getscan,scan-c}.c getDirToName extern.h {fid,lid,paths}.c getId {idx,mkid}.c getLanguage extern.h {getscan,idx,mkid}.c getLispId getscan.c getPascalId getscan.c getRoffId getscan.c getSCCS extern.h opensrc.c getScanner extern.h {getscan,idx,mkid}.c getTeXId getscan.c getTextId getscan.c getc {gets0,getsFF,lid,scan-asm,scan-c}.c getchar lid.c getenv extern.h lid.c gets lid.c getsFF extern.h {bitsvec,fid,getsFF,lid,mkid}.c As you can see, when a regular expression is used, it is possible to get more than one line of output. If you wish multiple lines to be merged into one, supply the `-m' option: $ lid -m ^get ^get extern.h {bitsvec,fid,gets0,getsFF,getscan,idx,lid,mkid,opensrc,paths,scan-asm,scan-c}.c The query program searches for numbers numerically rather than textually. Therefore you may search for multiple representations of a number. It is best to illustrate this with examples: $ lid -a 0x10 020 numtst.c 0x00010 numtst.c 0x0010 scan-c.c 0x10 {id,radix}.h {scan-asm,stoi}.c 16 numtst.c The `-a' argument tells lid(1) to look for 0x10 in all radixes. (For numbers 0 through 7, lid(1) looks for all radixes by default. For numbers greater than 7, lid(1) only looks for the radix that the argument is supplied in.) It is also possible to restrict the search to selected radixes by supplying an argument consisting of one or more of the key-letters `o', `d', and `x' for octal decimal and hexadecimal respectively: $ lid -o 0x10 020 numtst.c $ lid -x 16 0x00010 numtst.c 0x0010 scan-c.c 0x10 {id,radix}.h {scan-asm,stoi}.c $ lid -d 020 16 numtst.c The grep interface behaves somewhat like the following command: $ grep -w -n `lid TRUE` Heres some sample output for the equivalent gid command: $ gid TRUE bool.h:5: #define TRUE (0==0) lid.c:102: case 'm': forceMerge = TRUE; break; lid.c:170: Merging = TRUE; lid.c:204: crunching = TRUE; lid.c:553: hitDigits = TRUE; lid.c:787: return TRUE; mkid.c:117: Verbose = TRUE; mkid.c:191: keepLang = TRUE; scan-asm.c:79: static bool eatUnder = TRUE; scan-asm.c:80: static bool preProcess = TRUE; scan-asm.c:96: static bool newLine = TRUE; scan-asm.c:130: newLine = TRUE; scan-asm.c:141: newLine = TRUE; scan-asm.c:145: newLine = TRUE; scan-asm.c:150: newLine = TRUE; scan-asm.c:165: newLine = TRUE; scan-c.c:88: static bool eatUnder = TRUE; scan-c.c:101: static bool newLine = TRUE; scan-c.c:138: newLine = TRUE; scan-c.c:199: newLine = TRUE; scan-c.c:205: newLine = TRUE; scan-c.c:210: newLine = TRUE; wmatch.c:37: return TRUE; Notice that each line is reported in the same format as a C-preprocessor error message. This feature allows gid(1) lines to be digested by any program that parses error messages, such as error(1) and gnu-emacs. If you want to edit all files that have an identifier, you may conveniently do so with eid(1): $ eid TRUE TRUE bool.h {lid,mkid,scan-asm,scan-c,wmatch}.c Edit? [y1-9^S/nq] Before the editor is invoked, you are given the lid(1) output to review and comfirm. If you want to edit all files listed, respond with a newline or with `y'. If you want to skip some number of files into the argument list, respond with a single digit `1' through `9' to skip that many files, or do a string-search to the first file you want with `^S' or `/'. If you don't want to edit anything, type `n' to go on to the next argument you gave to eid(1) or type `q' to quit altogether. The behavior of the editing interface is controlled by three environment variables called EIDARG, EIDLDEL, and EIDRDEL. The best way to illustrate their use by an example. Here is how to define them for vi(1) (using /bin/sh syntax) EIDARG='+/%s/' # printf(3) string for initial search-string argument EIDLDEL='\<' # left word-delimiter EIDRDEL='\>' # right word-delimiter `EID[LR]DEL' are positioned around the identifier as left and right word-delimiters if your editor supports that notion. Then the whole name-string is sprintf(3)'ed into `EIDARG' to construct the initial search-string argument to the editor. If your editor can't digest such an argument, simply leave these variables undefined in the environment. Some emacs users are appalled at the notion of starting up a fresh editor simply to follow an identifier. For those who are fortunate enough to have a programmable emacs such as gnu-emacs, it is fairly simple to devise a command that invokes gid(1) and digests its output as though it were /lib/cpp error strings to be examined. (Sorry, no such code is provided at this posting...) Another type of query is to find all identifiers that are non-unique within some number of characters. This is useful for finding potential portability problems when moving to a system whose compiler or linker limits the number of significant characters in a name. The `-u' argument does the trick. Here's a list of identifiers that may yield multiply-defined errors in a symbol table that only knows about the first 7 characters: $ lid -u7 SCAN_TEX getscan.c SCAN_TEXT getscan.c idh_argc id.h {init,mkid}.c idh_argo id.h {init,mkid}.c idh_namc id.h {fid,mkid}.c idh_namo id.h {fid,init,lid,mkid}.c oldHashSize mkid.c oldHashTable mkid.c Better yet, if you want to edit these, try $ eid -u7 ^SCAN_TE getscan.c Edit? [y1-9^S/nq] n ^idh_arg getscan.c id.h {init,mkid}.c Edit? [y1-9^S/nq] n ^idh_nam {fid,getscan}.c id.h {init,lid,mkid}.c Edit? [y1-9^S/nq] n ^oldHash {fid,getscan}.c id.h {init,lid,mkid}.c Edit? [y1-9^S/nq] n An additional feature of lid(1) is that pathnames are automatically adjusted for the current working directory. Large programs such as the UNIX kernel are often partitioned into subsystems whose sources live in different directories. What follows are several examples of the same search conducted from different points in the UNIX kernel source hierarchy: $ cd /src/uts/m68k $ lid bdevsw bdevsw sys/conf.h cf/conf.c io/bio.c os/{fio,main,prf,sys3}.c $ cd io $ lid bdevsw bdevsw ../sys/conf.h ../cf/conf.c bio.c ../os/{fio,main,prf,sys3}.c $ cd ../os bdevsw ../sys/conf.h ../cf/conf.c ../io/bio.c {fio,main,prf,sys3}.c The database is built with mkid(1). The user supplies pathnames either on the command line or on stdin. Here's the output of the `verbose' option to mkid(1): $ mkid -v *.h *.c c: bitops.h c: bool.h c: extern.h c: id.h c: patchlevel.h c: radix.h c: string.h c: basename.c c: bitcount.c c: bitops.c c: bitsvec.c c: bsearch.c c: bzero.c c: document.c c: fid.c c: gets0.c c: getsFF.c c: getscan.c c: hash.c c: idx.c c: init.c c: lid.c c: mkid.c c: numtst.c c: opensrc.c c: paths.c c: scan-asm.c c: scan-c.c c: stoi.c c: tty.c c: uerror.c c: wmatch.c Compressing Hash Table... Sorting Hash Table... Writing `ID'... Names: 593, Numbers: 64, Strings: 43, Solo: 119, Total: 697 Occurrances: 11.67, Load: 0.17, Probes: 1.07 Mkid(1) echoes the name of each file as it is scanned, prefixed by the name of the language it thinks the file is written in. Mkid(1) reports how many unique names and numbers were found, how many names occurred only once, and the total for names and numbers. It also reports the average number of occurrances for all names and numbers. Next, there are some hash-table statistics on the load-factor and the average number of open-addressed probes. Mkid(1) can take arguments from the command line, from stdin, or from a file. A file full of filenames may also contain mkid options of the form -