I started my research career on speech recognition and more especially on statistical language modeling. I worked for more than 15 years on that topic. Then I oriented my research into machine translation. In both topics, the methods used are based on machine learning. My skills concern the development of original methods to modelize the natural language for operational systems. I mean that these kind of systems are trained on large vocabularies (more than 65000 lexical units). I proposed several original ideas: retrieving phrases based on class-phrases, purging statistical language models from impossible,events, Cache-features language model, multilingual triggers, . . .
Since my HDR, I oriented my research work towards machine translation. My idea is how to propose an alternative to the the baseline methods proposed by IBM.
The other aspect on which I work concerns the cross-lingual sentiment analysis. Our objective is to achieve Cross-lingual comparison of sentiment contained into two different texts expressed in two different languages. This is a challenging research area for which we have proposed an original automatic method allowing to tag a text in terms of opinions by transferring the annotations from the domain of movie reviews to other domains (news and talks), and from English to Arabic.
The other topic of my research concerns machine translation for vernacular languages and more especially
for Arabic dialects. We started working on Algerian Arabic dialects which are considered such as under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We develop PADIC (Parallel Arabic Dialect Corpus) which contains 6400 parallel sentences on Algerian (2 dialects), Moroccan, Tunisian, Palestinian, Syrian, MSA and French. Different tools have been proposed to process Algerian dialect (Morphological Analyser, Graphem to Phonem, Diacritization, etc.)