From hyphen@ibmpcug.co.uk Sat Apr 16 08:23:04 1994
Received: from oxmail by black.ox.ac.uk; Sat, 16 Apr 1994 08:23:03 +0100
Received: from kate.ibmpcug.co.uk by oxmail.ox.ac.uk with SMTP (PP) 
          id <17258-0@oxmail.ox.ac.uk>; Sat, 16 Apr 1994 08:22:52 +0100
To: pcl@ox.ac.uk
Subject: croatian readme
Organization: The PC User Group, UK
Cc: 
Date: Sat, 16 Apr 94 8:22:43 BST
From: D Fawthrop <hyphen@ibmpcug.co.uk>
Sender: hyphen@ibmpcug.co.uk
Message-Id: <9404160822.aa25892@kate.ibmpcug.co.uk>
Status: RO


This is a list of 28,000 Croatian words collected in early 1994 from
Hrvatski-Vjesnik a listserver for Croatian news in America.  We believe that the
quality of text used was good, and therefore spelling errors are rare.  One
obvious problem however is that these words are in "Computer Croatian" without
accents or the special characters.

Each word is followed by the frequency of occurrence in the text examined, in a
pale imitation of the Brown corpus.  This was performed using two utilities
which I have placed in the Public Domain "one word" and "uniq_num" which and are
to be found in the directory "utilities".

With the split up of the former Yugoslavia there is a linguistic war going on in
parallel with the shooting war.  The language is therefore diverging rapidly
from Serbo-croat.  We have done some rough and ready tests against a list of
Serbian words and found significant differences between the two.

This is the results of a project which only required some tens of thousands of
words, which is now complete.  Should anyone want to continue the project, good
luck, get on with it!  Hrvatski-Vjesnik contains text from many authors and on
many subjects, there are many more words available.

Dave Fawthrop <hyphen@ibmpcug.co.uk> Hyphen House, 8 Cooper Grove,
Shelf, Halifax, HX3 7RF, England. Phone/Fax/Answer : +44 274 691092
-   God loved the World so much that he gave his only Son, so that    -
- anyone who believes in him shall not perish, but have eternal life. -


