Login / Signup

Arabic Syntactic Diacritics Restoration Using BERT Models.

Waleed NazihYasser Hifny
Published in: Computational intelligence and neuroscience (2022)
The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems.
Keyphrases
  • machine learning
  • psychometric properties
  • working memory
  • deep learning
  • neural network
  • artificial intelligence
  • autism spectrum disorder