    
               ݳ National Genealogical Society 
                          ݳ Computer Interest Group       
                              4527 Seventeenth Street North 
                           Arlington, VA  22207-2399     
                         
                          ݳ Voice: 703-525-0050   OPUS   
               ݳ BBS:   703-528-2612  109/650 
    
     Sysop: Don Wilson        24 hours a day        300/1200/2400 Baud 
    


                 ͻ
                             Help Guide No. 8             
                 Ķ
                      USING THE SOUNDEX CODING SYSTEM     
                 ͼ


What it is

A soundex code is a four character representation based on the way a name 
sounds rather than the way it is spelled. Theoretically, using this system, 
you should be able to index a name so that it can be found no matter how it 
was spelled. The system was developed by Margaret K. Odell and Robert C. 
Russell (see U.S. Patents 1261167 [1918] and 1435663 [1922]). 


Census indexes

The WPA used the soundex coding system in the 1930s to do a partial indexing 
on 3x5 cards of the 1880 (all households with a child age 10 or younger) and 
1900 censuses and a nearly full indexing of the censuses of 1910 (not all 
states completed) and 1920 (not yet released to the public).

The soundex indexes of the 1880, 1900 and 1910 census records are available 
on microfilm at the National Archives (and its branches) and many libraries 
or other archives. These microfilms also can be purchased from the National 
Archives. The names are arranged on the soundex indexes by first letter, then 
numerically within that letter, then alphabetically by the first name of the 
head of household within each different soundex code. There is usually a 
separate card for each individual within the household whose surname is 
different from that of the head of household. 

Besides telling where the original record can be found, the microfilmed 
soundex cards sometimes give basic information about each person in the 
household, such as place of residence, age, sex, relationship to head of 
household, state born, state where parents were born, etc. However, all of 
the information that is contained in the original census records is not 
included. 


Figuring the code

Every soundex code consists of a letter and three numbers, such as B525. The 
letter is always the first letter of the surname. The numbers are assigned 
this way: 

                                     -1-

                                SOUNDEX CODING

                 Ŀ
                  1  =  b,p,f,v      2  =  c,s,k,g,j,q,x,z 
                  3  =  d,t          4  =  l               
                  5  =  m,n          6  =  r               
                  disregard  -  a,e,i,o,u,w,y,h            
                 

To figure out a surname's code, do this:           JOHNSON
   - Eliminate any a,e,i,o,u,w,y,h                 JNSN
   - Write the first letter, as is, followed
     by the codes found in the table above         JNSN = J525

No matter how long or short the surname is, the soundex code is always the 
first letter of the name followed by three numbers. If you have coded the 
first letter and three numbers but still have more letters in the name, 
ignore them. If you have run out of letters in the name before you have three 
numbers, then add zeroes to the code: 

            Ŀ
             WASHINGTON = WSNGTN = W252 (ignore the ending TN) 
            

            Ŀ
             KUHNE      = KN     = K500 (add zeroes to the end)
            


Prefixes

If you have a surname with a prefix like Van, Von, De, Di, or Le, code it 
with and without the prefix because it may be listed under either code. Van 
Hoesen could be coded as VanHoesen or as Hoesen. Mac and Mc are NOT 
considered prefixes. 


Double letters

Any double letters side by side should be treated as one letter. For example 
LLOYD is coded as if it were spelled LOYD. GUTIERREZ is coded as if it were 
GUTIEREZ. 


Side by side letters with the same value

You may have different letters side by side that have the same code value. 
For example PFISTER (P & F are both 1), JACKSON (CKS are all 2). These 
letters should be treated as one letter.  PFISTER is coded as PSTR (P236) and 
JACKSON is coded as JCN (J250). 

Thus, variations in spellings or mispellings should produce the same code 
number: 

                    Ŀ
                     SMITH = S530        SMITHE = S530 
                     SMYTH = S530        SMYTHE = S530 
                    
                                     -2-

                                SOUNDEX CODING


Other variations

Note, however, that some names which are pronounced essentially the same 
produce different codes. An example is the "tz" sound in German names, which 
is normally pronounced the same as "ce" or "se." Also, the German "B" is 
often pronounced as the English "P." Thus the German name Bentz could be 
spelled that way or as Benz, Bens, Bents, Bennss, Bense, Bennss, Bants and 
Banz, or as Penz, Pentz, Pence, Pens, Pense, Penz, Pents, Penns, Pense, 
Penze, Pentze, etc. Indeed, it has been found in census record indexes under 
all of these - and more. Remember: Those making the index have as hard a time 
reading the handwriting of census takers as we do. They will sometimes 
mistake an script "z" as a "y" and record Penty instead of Pentz, or mistake 
a "c" for an "e" and record Penee, for examples. 
 
Therefore, to make sure you don't miss finding your ancestor, you may have to 
look under a half dozen or more different soundex codes if you are searching 
for the name PENCE (soundex code 530):

   Ŀ
    BENTZ (and equivalents) = B532       PENTZ (and equivalents) = P532 
    BENZ  (and equivalents) = B520       PENZ  (and equivalents) = P520 
    BENTY (and equivalents) = B530       PENTY (and equivalents) = P530 
                                         PENEE                   = P500 
   

Think through the possible variant spellings (and misspellings and 
misreadings) of the surname you are searching before concluding that it can't 
be found in the soundex listings. Use your imagination. No mistake is beyond 
possibility! For instance, the name Pence has been indexed as Peirce (the 
reader mistook the written letter "n" for an "i-r" combination) and vice 
versa. 

There are several computer programs available on this and other BBSs which 
will figure the soundex code for any name. Look for them under somevariation 
of SOUNDEX, such as SOUNDEXC.BAS.












  Ŀ
   This material based on "Beginning Your Genealogical Research in the   
   National Archives," courtesy ROOTS-BBS, CA, Brian Mavrogeorge, sysop. 
   Expanded by Richard A. Pence for the NGS/CIG BBS. See also "Federal   
   Population Censuses,"  1790-1890 (Washington DC, National Archives,   
   1971), page 90.                                                       
  

                                     -3-


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           