SCARRIE

Scarrie-anim2SCARRIE was an RTD project (LE3-4239) in the Language Engineering sector of the Telematics programme of the European Union. The project began on Dec. 1, 1996 and was concluded on Feb. 28, 1999. The coordinator of the project was WordFinder Software AB (Växjö, Sweden). The other main partners in the project were Universitetet i Bergen, Institutionen för lingvistik at Uppsala Universitet, Center for Sprogteknologi (København) and Svenska Dagbladet (Stockholm).

The aim of the project has been to build proofreading tools for Danish, Norwegian and Swedish. In order to achieve its goals, SCARRIE has researched effective error detection and correction mechanisms for the Scandinavian languages. Resources for these languages have been integrated in the CORRie platform, which was originally developed for Dutch by Cognitech. The prototype proofreading system provides several linguistically motivated error detection and correction mechanisms at both word level and sentence level.

kv-scarrieThe work for Norwegian was coordinated at the University of Bergen. The chief researcher on the project was Victoria Rosén (left). The scientific coordination was done by Prof. Koenraad de Smedt (right).

The Norwegian part of SCARRIE has been aimed at advanced spelling correction in Bokmål. It uses word form dictionaries in combination with special mechanisms for handling multi-word expressions and for recognizing newly seen compounds, proper names and other words not present in the dictionaries. In cooperation with NTNU, a suitable Norwegian word form dictionary has been built. The word forms in this list are tagged with information about lemma (basic form), standard, style or written norm, morphosyntactic characteristics and possibly replacement. Predictable misspellings are supplied with recommendations for corrections. New compounds are detected by an analysis based on rules supplied by the University of Oslo. Words that are outside the scope of the dictionary and are likely errors are processed by the correction mechanisms including sound-based similarity. In addition, a robust grammar was developed for the detection and correction of certain classes of errors which cannot be handled at word level, i.e. agreement errors. Finally, corrections are carried out so as to fit in the written norm which the document is written in (on a range from conservative to radical Bokmål). The main result consists of an implemented and tested prototype with enhanced capabilities for advanced error correction. It was tested on a limited test set and the results were favourable in comparison to state of the art products.

An extensive overview of the project goals, organization, methodology and results can be found in the final project report.

Some examples showing the linguistic functionality for Norwegian are given in the SCARRIE showcase.

An important component of SCARRIE is its lexical database with subnorms for Bokmål.

Publications from the Norwegian part of the project: (see also Koenraad De Smedt’s publication list)

There have been several reports in the media about the project, including an article in På Høyden, Nov. 17, 1999: “UiB ledende innen språkteknologi” (pdf).

Some other links related to proofreading: