FFF

Brian Nolan: Recent work on implementing RRG via an Arabic-to-English

This talk presents recent research, and associated development work on a machine translation system called UniArab, a proof-of-concept system supporting the fundamental aspects of Arabic, such as the parts of speech, agreement and tenses. UniArab is based on the linking algorithm of Role and Reference Grammar (RRG) for syntax to semantics and vice versa. UniArab takes MSA Arabic as input in the native orthography, parses the sentence(s) into a logical meta-representation based on the fully expanded RRG logical structures and, using this, generates perfectly grammatical English output with full agreement and morphological resolution. RRG is a functional theory of grammar that posits a direct mapping between the semantic representation of a sentence and its syntactic representation. The theory allows a sentence in a specific language to be described in terms of its logical structure and grammatical procedures. RRG creates a linking relationship between syntax and semantics, and can account for how semantic representations are mapped into syntactic representations. We claim that RRG is very suitable for machine translation of Arabic, and that RRG can be implemented as the rule-based kernel of an Interlingua bridge MT engine in software. The version of Arabic we consider is Modern Standard Arabic (MSA). When we mention Arabic, therefore, we mean MSA, which is distinct from classical Arabic.


UniArab utilizes a Java programming language / XML-based implementation of elements of the RRG theory in software. In order to analyse Arabic by computer we first extract the lexical properties of the Arabic words. From the parse, the software then creates a computer-based representation for the logical structure of the Arabic sentence(s). We use the RRG theory to motivate the computational implementation of the architecture of the lexicon in software. and implement in software the RRG bidirectional linking system to build the parse and generate functions between the syntax-semantic interfaces. Through seven input phases, including the morphological and syntactic unpacking, UniArab extracts the logical structure of an Arabic sentence. Using the XML-based metadata representing the RRG logical structure, UniArab then accurately generates an equivalent grammatical sentence in the target language through four output phases. Following this, we discuss the technologies used to support its development and also the user interface that allows for the addition of lexical items directly to the lexicon in real time.  The UniArab system has been tested and evaluated by generating equivalent grammatical sentences, in English, via the logical structure of Arabic sentences, based on MSA Arabic input with very significant and accurate results. At present we are working to greatly extend the coverage by the addition of more verbs to the lexicon. We have demonstrated in this research also that RRG is a viable linguistic model for building accurate rule-based machine translation software.