Indian Flag
Government Of India
A-
A
A+
DictDis

DictDis

Domain-specific neural machine translation (NMT) systems (, in educational applications)

About Dataset

Domain-specific neural machine translation (NMT) systems (, in educational applications) are socially significant with the potential to help make information accessible to a diverse set of users in multilingual societies. Such NMT systems should be lexically constrained and draw from domain-specific dictionaries. Dictionaries could present multiple candidate translations for a source word/phrase due to the polysemous nature of words. The onus is then on the NMT model to choose the contextually most appropriate candidate. Prior work has largely ignored this problem and focused on the single candidate constraint setting wherein the target word or phrase is replaced by a single constraint. In this work, we present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries. We achieve this by augmenting training data with multiple dictionary candidates to actively encourage disambiguation during training by implicitly aligning multiple candidate constraints. We demonstrate the utility of DictDis via extensive experiments on English-Hindi, English-German, and English-French datasets across a variety of domains including regulatory, finance, engineering, health and standard benchmark test datasets. In comparison with existing approaches for lexically constrained and unconstrained NMT, we demonstrate superior performance for the copy constraint and disambiguation-related measures on all domains, while also obtaining improved fluency of up to 2-3 BLEU points on some domains. We also release our test set consisting of 4K English-Hindi sentences in multiple domains.

Activity Overview Activity Overview

  • Downloads0
  • Downloads 6
  • Views 191
  • File Size 35.72 MB

Tags Tags

  • Education
  • Sanskrit
  • MeitY
  • IndiaAI
  • IITB
  • Machine transaltion
  • IITBombay
  • AIkosha
  • DataforAI
  • IITBImpact
  • BharatGen

License Control License Control

CC0 1.0 Public Domain

bob-pred.hi ( 24.81 KB )


To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(35.72 MB)
  • admin·9 month(s) ago
    • chevron_rightFolder
      bobdata
      • undefined
        bob-pred.hi
      • undefined
        bob-pred.hi.log
      • undefined
        bobtest-sheet3.en
      • undefined
        bobtest-sheet3.en.bpe
      • undefined
        bobtest-sheet3.en.constraints
      • undefined
        bobtest-sheet3.en.constraints._bpe
      • undefined
        bobtest-sheet3.en.constraints.bpe
      • undefined
        bobtest-sheet3.en.constraints.log
      • undefined
        bobtest-sheet3.en.constraints.norm
      • undefined
        bobtest-sheet3.en.leca
      • more_horiz 5 more