Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
УМКД СИТиП 08.04.2014.doc
Скачиваний:
13
Добавлен:
21.02.2016
Размер:
1.79 Mб
Скачать

5. Recent research

The field of machine translation has in the last few years seen major changes. Currently a large amount of research is being done into statistical machine translation and example-based machine translation. Today, only a few companies use statistical machine translation commercially, e.g. Language Weaver (sells translation products and services), Google (uses their proprietary statistical MT system for some language combination in Google's language tools) and Microsoft (uses their proprietary statistical MT system to translate knowledge base articles). There has been a renewed interest in hybridisation, with researchers combining syntactic and morphological (i.e., linguistic) knowledge into statistical systems, as well as combining statistics with existing rule based systems.

References:

1. Hutchins, J. (2005). "The history of machine translation in a nutshell"2. Melby, Alan K. (1995). The Possibility of Language. Amsterdam: J. Benjamins. pp. 27–41. 3. Van Slype, G. (1983) "Better Translation for Better Communications", (Pergamon Press : Paris)

Lecture 2. Electronic dictionaries

An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including:

  • as dedicated handheld devices

  • as apps on smartphones and tablet computers or computer software

  • as a function built into an E-reader

  • as CD-ROMs and DVD-ROMs, typically packaged with a printed dictionary, to be installed on the user’s own computer

  • as free or paid-for online products

1. Overview

Most types of dictionary are available in electronic form. These include general-purpose monolingual and bilingual dictionaries, historical dictionaries such as the Oxford English Dictionary, monolingual learner's dictionaries, and specialized dictionaries of every type, such as medical or legal dictionaries, thesauruses (1. A book of synonyms, often including related and contrasting words and antonyms. 2. A book of selected words or concepts, such as a specialized vocabulary of a particular field, as of medicine or music.), travel dictionaries, dictionaries of idioms, and pronunciation guides.

Most of the early electronic dictionaries were, in effect, print dictionaries made available in digital form: the content was identical, but the electronic editions provided users with more powerful search functions. But soon the opportunities offered by digital media began to be exploited. Two obvious advantages are that limitations of space (and the need to optimize its use) become less pressing, so additional content can be provided; and the possibility arises of including multimedia content, such as audio pronunciations and video clips.

Electronic dictionary databases, especially those included with software dictionaries are often extensive and can contain up to 500,000 headwords and definitions, verb conjugation tables, and a grammar reference section. Bilingual electronic dictionaries and monolingual dictionaries of inflected languages often include an interactive verb conjugator, and are capable of word stemming.

Publishers and developers of electronic dictionaries may offer native content from their own lexicographers, licensed data from print publications, or both, as in the case of Babylon offering premium content from Merriam Webster, and Ultralingua offering additional premium content from Collins, Masson, and Simon & Schuster, and Paragon Software offering original content from Duden, Britannica, Harrap, Merriam-Webster and Oxford.

Writing systems

As well as Latin script, electronic dictionaries are also available in logographic and right-to-left scripts, including Arabic, Persian, Chinese, Devanagari (the alphabet used for Sanskrit, Hindi, and other Indian languages), Greek, Hebrew, Japanese, Korean, Cyrillic, and Thai.