USING A LANGUAGE INDEPENDENT DOMAIN MODEL FOR MULTILINGUAL INFORMATION EXTRACTION
Authors:
Saliha Azzam;
Kevin Humphreys;
Robert Gaizauskas; Yorick Wilks
DOI:
10.1080/088395199117252
Publication Frequency:
10 issues per year
Subjects:
Artificial Intelligence;
Computer Science (General);
Information & Communication Technology (ICT);
Formats available:
PDF
(English)
View Article:
View Article (PDF)
Abstract
The volume of electronic text in different languages, particularly on the World Wide Web, is growing significantly, and the problem of users who are restricted in the number of languages they read obtaining information from this text is becoming more widespread. This article investigates some of the issues involved in achieving multilingual information extraction (IE), describes the approach adopted in the M-LaSIE-II IE system, which addresses these problems, and presents the results of evaluating the approach against a small parallel corpus of English/French newswire texts. The approach is based on the assumption that it is possible to construct a language independent representation of concepts relevant to the domain, at least for the small well-defined domains typical of IE tasks, allowing multilingual IE to be successfully carried out without requiring full machine translation.
|

Download Citation
CiteULike
Del.icio.us
BibSonomy
Connotea