From pa.dec.com!decwrl!uunet!sparky!kent Sat Feb 2 14:37:14 PST 1991 Article: 2030 of comp.sources.misc Path: pa.dec.com!decwrl!uunet!sparky!kent From: goer@midway.uchicago.edu (Richard L. Goerwitz) Newsgroups: comp.sources.misc Subject: v16i085: mtf - Map tar filenames, Part01/02 Message-ID: <1991Jan29.012143.21569@sparky.IMD.Sterling.COM> Date: 29 Jan 91 01:21:43 GMT Sender: kent@sparky.IMD.Sterling.COM (Kent Landfield) Organization: University of Chicago Lines: 618 Approved: kent@sparky.imd.sterling.com X-Checksum-Snefru: 5d9f6bcf 21d03491 c3363fc3 4c90c7a2 Submitted-by: goer@midway.uchicago.edu (Richard L. Goerwitz) Posting-number: Volume 16, Issue 85 Archive-name: mtf/part01 Tar archives often come packed with filenames longer than 15 chars, and with source code that requires that the filenames be fully pre- served. This utility, mtf, runs through the tar headers, finds all overlong filenames, renames them, renames them in any text files it finds, and then rewrites the tar header checksums. -Richard ---- Cut Here and feed the following to sh ---- #!/bin/sh # This is a shell archive (produced by shar 3.49) # To extract the files from this archive, save it to a file, remove # everything above the "!/bin/sh" line above, and type "sh file_name". # # made 01/20/1991 23:34 UTC by goer@sophist.uchicago.edu # Source directory /u/richard/Mtf # # existing files will NOT be overwritten unless -c is specified # This format requires very little intelligence at unshar time. # "if test", "cat", "rm", "echo", "true", and "sed" may be needed. # # This is part 1 of a multipart archive # do not concatenate these parts, unpack them in order with /bin/sh # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 16721 -r--r--r-- mtf.icn # 3341 -rw-r--r-- README # 659 -rw-r--r-- Makefile.dist # if test -r _shar_seq_.tmp; then echo 'Must unpack archives in sequence!' echo Please unpack part `cat _shar_seq_.tmp` next exit 1 fi # ============= mtf.icn ============== if test -f 'mtf.icn' -a X"$1" != X"-c"; then echo 'x - skipping mtf.icn (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting mtf.icn (Text)' sed 's/^X//' << 'SHAR_EOF' > 'mtf.icn' && X############################################################################# X# X# NAME: mtf3.icn X# X# TITLE: map tar file X# X# AUTHOR: Richard Goerwitz X# X# VERSION: 3.3 X# X############################################################################# X# X# This and future versions of mtf are hereby placed in the public domain -RLG X# X############################################################################# X# X# PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars. X# Handles both header blocks and the archive itself. Mtf is intended X# to facilitate installation of tar'd archives on systems subject to X# the System V 14-character filename limit. X# X# USAGE: mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions] X# X# "Inputfile" is a tar archive. "Reportfile" is file containing a X# list of files already mapped by mtf in a previous run (used to X# avoid clashes with filenames in use outside the current archive). X# The -e switch precedes a list of filename .extensions which mtf is X# supposed to leave unscathed by the mapping process X# (single-character extensions such as .c and .o are automatically X# preserved; -e allows the user to specify additional extensions, X# such as .pxl, .cpi, and .icn). The final switch, -x, precedes a X# list of strings which should not be mapped at all. Use this switch X# if, say, you have a C file with a structure.field combination such X# as "thisisveryverybig.hashptr" in an archive that contains a file X# called "thisisveryverybig.h," and you want to avoid mapping that X# portion of the struct name which matches the name of the overlong X# file (to wit, "mtf inputfile -x thisisveryverybig.hashptr"). To X# prevent mapping of any string (including overlong filenames) begin- X# ning, say, with "thisisvery," use "mtf inputfile -x thisisvery." X# Be careful with this option, or you might end up defeating the X# whole point of using mtf in the first place. X# X# OUTPUT FORMAT: Mtf writes a mapped tar archive to the stdout. X# When finished, it leaves a file called "map.report" in the current X# directory which records what filenames were mapped and how. Rename X# and save this file, and use it as the "reportfile" argument to any X# subsequent runs of mtf in this same directory. Even if you don't X# plan to run mtf again, this file should still be examined, just to X# be sure that the new filenames are acceptable, and to see if X# perhaps additional .extensions and/or exceptions should be X# specified. X# X# BUGS: Mtf only maps filenames found in the main tar headers. X# Because of this, mtf cannot accept nested tar archives. If you try X# to map a tar archive within a tar file, mtf will abort with a nasty X# message about screwing up your files. Please note that, unless you X# give mtf a "reportfile" to consider, it knows nothing about files X# existing outside the archive. Hence, if an input archive refers to X# an overlong filename in another archive, mtf naturally will not X# know to shorten it. Mtf will, in fact, have no way of knowing that X# it is a filename, and not, say, an identifier in a C program. X# Final word of caution: Try not to use mtf on binaries. It cannot X# possibly preserve the correct format and alignment of strings in an X# executable. Same goes for compressed files. Mtf can't map X# filenames that it can't read! X# X#################################################################### X X Xglobal filenametbl, chunkset, short_chunkset # see procedure mappiece(s) Xglobal extensions, no_nos # ditto X Xrecord hblock(name,junk,size,mtime,chksum, # tar header struct; X linkflag,linkname,therest) # see readtarhdr(s) X X Xprocedure main(a) X X usage := "usage: mtf inputfile [-r reportfile] " || X "[-e .extensions] [-x exceptions]" X X *a = 0 & stop(usage) X X intext := open_input_file(a[1]) & pop(a) X X i := 0 X extensions := []; no_nos := [] X while (i +:= 1) <= *a do { X case a[i] of { X "-r" : readin_old_map_report(a[i+:=1]) X "-e" : current_list := extensions X "-x" : current_list := no_nos X default : put(current_list,a[i]) X } X } X X every !extensions ?:= (=".", tab(0)) X X # Run through all the headers in the input file, filling X # (global) filenametbl with the names of overlong files; X # make_table_of_filenames fails if there are no such files. X make_table_of_filenames(intext) | { X write(&errout,"mtf: no overlong path names to map") X a[1] ? (tab(find(".tar")+4), pos(0)) | X write(&errout,"(Is ",a[1]," even a tar archive?)") X exit(1) X } X X # Now that a table of overlong filenames exists, go back X # through the text, remapping all occurrences of these names X # to new, 14-char values; also, reset header checksums, and X # reformat text into correctly padded 512-byte blocks. Ter- X # minate output with 512 nulls. X seek(intext,1) X every writes(output_mapped_headers_and_texts(intext)) X X close(intext) X write_report() # Record mapped file and dir names for future ref. X exit(0) X Xend X X X Xprocedure open_input_file(s) X intext := open("" ~== s,"r") | X stop("mtf: can't open ",s) X find("UNIX",&features) | X stop("mtf: I'm not tested on non-Unix systems.") X s[-2:0] == ".Z" & X stop("mtf: sorry, can't accept compressed files") X return intext Xend X X X Xprocedure readin_old_map_report(s) X X initial { X filenametbl := table() X chunkset := set() X short_chunkset := set() X } X X mapfile := open_input_file(s) X while line := read(mapfile) do { X line ? { X if chunk := tab(many(~' \t')) & tab(upto(~' \t')) & X lchunk := move(14) & pos(0) then { X filenametbl[chunk] := lchunk X insert(chunkset,chunk) X insert(short_chunkset,chunk[1:16]) X } X if /chunk | /lchunk X then stop("mtf: report file, ",s," seems mangled.") X } X } X Xend X X X Xprocedure make_table_of_filenames(intext) X X local header # chunkset is global X X # search headers for overlong filenames; for now X # ignore everything else X while header := readtarhdr(reads(intext,512)) do { X # tab upto the next header block X tab_nxt_hdr(intext,trim_str(header.size),1) X # record overlong filenames in several global tables, sets X fixpath(trim_str(header.name)) X } X *\chunkset ~= 0 | fail X return &null X Xend X X X Xprocedure output_mapped_headers_and_texts(intext) X X # Remember that filenametbl, chunkset, and short_chunkset X # (which are used by various procedures below) are global. X local header, newtext, full_block, block, lastblock X X # Read in headers, one at a time. X while header := readtarhdr(reads(intext,512)) do { X X # Replace overlong filenames with shorter ones, according to X # the conversions specified in the global hash table filenametbl X # (which were generated by fixpath() on the first pass). X header.name := left(map_filenams(header.name),100,"\x00") X header.linkname := left(map_filenams(header.linkname),100,"\x00") X X # Use header.size field to determine the size of the subsequent text. X # Read in the text as one string. Map overlong filenames found in it X # to shorter names as specified in the global hash table filenamtbl. X newtext := map_filenams(tab_nxt_hdr(intext,trim_str(header.size))) X X # Now, find the length of newtext, and insert it into the size field. X header.size := right(exbase10(*newtext,8) || " ",12," ") X X # Calculate the checksum of the newly retouched header. X header.chksum := right(exbase10(get_checksum(header),8)||"\x00 ",8," ") X X # Finally, join all the header fields into a new block and write it out X full_block := ""; every full_block ||:= !header X suspend left(full_block,512,"\x00") X X # Now we're ready to write out the text, padding the final block X # out to an even 512 bytes if necessary; the next header must start X # right at the beginning of a 512-byte block. X newtext ? { X while block := move(512) X do suspend block X pos(0) & next X lastblock := left(tab(0),512,"\x00") X suspend lastblock X } X } X # Write out a final null-filled block. Some tar programs will write X # out 1024 nulls at the end. Dunno why. X return repl("\x00",512) X Xend X X X Xprocedure trim_str(s) X X # Knock out spaces, nulls from those crazy tar header X # block fields (some of which end in a space and a null, X # some just a space, and some just a null [anyone know X # why?]). X return s ? { X (tab(many(' ')) | &null) & X trim(tab(find("\x00")|0)) X } \ 1 X Xend X X X Xprocedure tab_nxt_hdr(f,size_str,firstpass) X X # Tab upto the next header block. Return the bypassed text X # as a string if not the first pass. X X local hs, next_header_offset X X hs := integer("8r" || size_str) X next_header_offset := (hs / 512) * 512 X hs % 512 ~= 0 & next_header_offset +:= 512 X if 0 = next_header_offset then return "" X else { X # if this is pass no. 1 don't bother returning a value; we're X # just collecting long filenames; X if \firstpass then { X seek(f,where(f)+next_header_offset) X return X } X else { X return reads(f,next_header_offset)[1:hs+1] | X stop("mtf: error reading in ", X string(next_header_offset)," bytes.") X } X } X Xend X X X Xprocedure fixpath(s) X X # Fixpath is a misnomer of sorts, since it is used on X # the first pass only, and merely examines each filename X # in a path, using the procedure mappiece to record any X # overlong ones in the global table filenametbl and in X # the global sets chunkset and short_chunkset; no fixing X # is actually done here. X X s2 := "" X s ? { X while piece := tab(find("/")+1) X do s2 ||:= mappiece(piece) X s2 ||:= mappiece(tab(0)) X } X return s2 X Xend X X X Xprocedure mappiece(s) X X # Check s (the name of a file or dir as recorded in the tar header X # being examined) to see if it is over 14 chars long. If so, X # generate a unique 14-char version of the name, and store X # both values in the global hashtable filenametbl. Also store X # the original (overlong) file name in chunkset. Store the X # first fifteen chars of the original file name in short_chunkset. X # Sorry about all of the tables and sets. It actually makes for X # a reasonably efficient program. Doing away with both sets, X # while possible, causes a tenfold drop in execution speed! X X # global filenametbl, chunkset, short_chunkset, extensions X local j, ending X X initial { X /filenametbl := table() X /chunkset := set() X /short_chunkset := set() X } X X chunk := trim(s,'/') X if chunk ? (tab(find(".tar")+4), pos(0)) then { X write(&errout, "mtf: Sorry, I can't let you do this.\n", X " You've nested a tar archive within\n", X " another tar archive, which makes it\n", X " likely I'll f your filenames ubar.") X exit(2) X } X if *chunk > 14 then { X i := 0 X X if /filenametbl[chunk] then { X # if we have not seen this file, then... X repeat { X # ...find a new unique 14-character name for it; X # preserve important suffixes like ".Z," ".c," etc. X # First, check to see if the original filename (chunk) X # ends in an important extension... X if chunk ? X (tab(find(".")), X ending := move(1) || tab(match(!extensions)|any(&ascii)), X pos(0) X ) X # ...If so, then leave the extension alone; mess with the X # middle part of the filename (e.g. file.with.extension.c -> X # file.with001.c). X then { X j := (15 - *ending - 3) X lchunk:= chunk[1:j] || right(string(i+:=1),3,"0") || ending X } X # If no important extension is present, then reformat the X # end of the file (e.g. too.long.file.name -> too.long.fi01). X else lchunk := chunk[1:13] || right(string(i+:=1),2,"0") X X # If the resulting shorter file name has already been used... X if lchunk == !filenametbl X # ...then go back and find another (i.e. increment i & try X # again; else break from the repeat loop, and... X then next else break X } X # ...record both the old filename (chunk) and its new, X # mapped name (lchunk) in filenametbl. Also record the X # mapped names in chunkset and short_chunkset. X filenametbl[chunk] := lchunk X insert(chunkset,chunk) X insert(short_chunkset,chunk[1:16]) X } X } X X # If the filename is overlong, return lchunk (the shortened X # name), else return the original name (chunk). If the name, X # as passed to the current function, contained a trailing / X # (i.e. if s[-1]=="/"), then put the / back. This could be X # done more elegantly. X return (\lchunk | chunk) || ((s[-1] == "/") | "") X Xend X X X Xprocedure readtarhdr(s) X X # Read the silly tar header into a record. Note that, as was X # complained about above, some of the fields end in a null, some X # in a space, and some in a space and a null. The procedure X # trim_str() may (and in fact often _is_) used to remove this X # extra garbage. X X this_block := hblock() X s ? { X this_block.name := move(100) # <- to be looked at later X this_block.junk := move(8+8+8) # skip the permissions, uid, etc. X this_block.size := move(12) # <- to be looked at later X this_block.mtime := move(12) X this_block.chksum := move(8) # <- to be looked at later X this_block.linkflag := move(1) X this_block.linkname := move(100) # <- to be looked at later X this_block.therest := tab(0) X } X integer(this_block.size) | fail # If it's not an integer, we've hit X # the final (null-filled) block. X return this_block X Xend X X X Xprocedure map_filenams(s) X X # Chunkset is global, and contains all the overlong filenames X # found in the first pass through the input file; here the aim X # is to map these filenames to the shortened variants as stored X # in filenametbl (GLOBAL). X X local s2, tmp_chunk_tbl, tmp_lst X static new_chunklist X initial { X X # Make sure filenames are sorted, longest first. Say we X # have a file called long_file_name_here.1 and one called X # long_file_name_here.1a. We want to check for the longer X # one first. Otherwise the portion of the second file which X # matches the first file will get remapped. X tmp_chunk_tbl := table() X every el := !chunkset X do insert(tmp_chunk_tbl,el,*el) X tmp_lst := sort(tmp_chunk_tbl,4) X new_chunklist := list() X every put(new_chunklist,tmp_lst[*tmp_lst-1 to 1 by -2]) X X } X X s2 := "" X s ? { X until pos(0) do { X # first narrow the possibilities, using short_chunkset X if member(short_chunkset,&subject[&pos:&pos+15]) X # then try to map from a long to a shorter 14-char filename X then { X if match(ch := !new_chunklist) & not match(!no_nos) X then s2 ||:= filenametbl[=ch] X else s2 ||:= move(1) X } X else s2 ||:= move(1) X } X } X return s2 X Xend X X X# From the IPL. Thanks, Ralph - X# Author: Ralph E. Griswold X# Date: June 10, 1988 X# exbase10(i,j) convert base-10 integer i to base j X# The maximum base allowed is 36. X Xprocedure exbase10(i,j) X X static digits X local s, d, sign X initial digits := &digits || &lcase X if i = 0 then return 0 X if i < 0 then { X sign := "-" X i := -i X } X else sign := "" X s := "" X while i > 0 do { X d := i % j X if d > 9 then d := digits[d + 1] X s := d || s X i /:= j X } X return sign || s X Xend X X# end IPL material X X Xprocedure get_checksum(r) X X # Calculates the new value of the checksum field for the X # current header block. Note that the specification say X # that, when calculating this value, the chksum field must X # be blank-filled. X X sum := 0 X r.chksum := " " X every field := !r X do every sum +:= ord(!field) X return sum X Xend X X X Xprocedure write_report() X X # This procedure writes out a list of filenames which were X # remapped (because they exceeded the SysV 14-char limit), X # and then notifies the user of the existence of this file. X X local outtext, stbl, i, j, mapfile_name X X # Get a unique name for the map.report (thereby preventing X # us from overwriting an older one). X mapfile_name := "map.report"; j := 1 X until not close(open(mapfile_name,"r")) X do mapfile_name := (mapfile_name[1:11] || string(j+:=1)) X X (outtext := open(mapfile_name,"w")) | X open(mapfile_name := "/tmp/map.report","w") | X stop("mtf: Can't find a place to put map.report!") X stbl := sort(filenametbl,3) X every i := 1 to *stbl -1 by 2 do { X match(!no_nos,stbl[i]) | X write(outtext,left(stbl[i],35," ")," ",stbl[i+1]) X } X write(&errout,"\nmtf: ",mapfile_name," contains the list of changes.") X write(&errout," Please save this list!") X close(outtext) X return &null X Xend SHAR_EOF true || echo 'restore of mtf.icn failed' rm -f _shar_wnt_.tmp fi # ============= README ============== if test -f 'README' -a X"$1" != X"-c"; then echo 'x - skipping README (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting README (Text)' sed 's/^X//' << 'SHAR_EOF' > 'README' && XNAME: mtf X XLANGUAGE: Icon X XAUTHOR: Richard Goerwitz (goer@sophist.uchicago.edu) X XPURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars. XHandles both header blocks and the archive itself. Mtf is intended to Xfacilitate installation of tar'd archives on systems subject to a X14-character filename limit. X XINSTALLATION: Cp Makefile.dist to Makefile and make. If all goes Xwell, and you have root priviledges, edit the Makefile to reflect Xyour local file structure, and make install. X XUSAGE: mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions] X X"Inputfile" is a tar archive. "Reportfile" is file containing a list Xof files already mapped by mtf in a previous run (used to avoid Xclashes with filenames in use outside the current archive). The -e Xswitch precedes a list of filename .extensions which mtf is supposed SHAR_EOF true || echo 'restore of README failed' fi echo 'End of part 1' echo 'File README is continued in part 2' echo 2 > _shar_seq_.tmp exit 0 exit 0 # Just in case... -- Kent Landfield INTERNET: kent@sparky.IMD.Sterling.COM Sterling Software, IMD UUCP: uunet!sparky!kent Phone: (402) 291-8300 FAX: (402) 291-4362 Please send comp.sources.misc-related mail to kent@uunet.uu.net.