People are going to access the Internet in their mother tongue in greater numbers as it becomes more and more pervasive. In India, we have many languages that are mother tongues of our people, and in the recent centuries, English has emerged as the lingua franca, which means a language used to communicate between persons who “do not share a mother tongue, in particular when it is a third language, distinct from both persons’ mother tongues”.
Since English is also the lingua franca of the Internet, we have an advantage of a significant base of English-speaking individuals and thus we have been able to make major inroads in the world of information technology, where English rules. Here we have a distinct advantage over China, something that is now being steadily eroded because of the emphasis that China is placing on teaching the English language to its students.
In Shakespeare’s Romeo and Juliet, Juliet says: “What’s in a name? That which we call a rose by any other name would smell as sweet.”
What about rose in any other language, it not only smells the same, it is also a major literary device used by poets in other languages with as much élan.
What would Urdu poetry be without references to gulab? But how would we read Urdu poetry if we don’t know the language? In English translation? Much would be lost. What if we could read Urdu script in Hindi language? We would be nearer the original in culture and context. Gulab would still be gulab, but it would be written in a script that many would not be able to read.
Now, some computer scientists have been working on making people understand and read information that has been originally given in a language that is neither their mother tongue, nor English.
The Advanced Centre for Technical Development of Punjabi, Punjabi University Patiala, has recently released an Urdu to Devnagari script conversion software. It also does the reverse, i.e. from Devnagri to Urdu…and it works on websites.
Dr Gurpreet Singh Lehal, director and chief coordinator of the project, demonstrated the software here in my office and indeed, the results were impressive. We saw how the Urdu newspaper from Pakistan like the Daily Jung, Nawai Waqt and Afsana were rendered in Hindi. He also converted the Dainik Tribune website into Urdu.
One can also write an email in Urdu and it will be delivered in Hindi at other end and similarly email sent in Hindi can be read in Urdu.
Dr Lehal said that his programme had been funded by The Information Society Innovation Fund (ISIF), which emphasises on applying Internet technology for the benefit of Asia-Pacific users and communities. The project was awarded to Punjabi University in 2009 after a competition in which 148 competitors from 22 countries participated.
In a credible 18 month the team comprising Dr Lehal, Dr Virinder Singh Kalra from Manchester University UK and Tejinder Singh Saini from Punjabi University, completed the project, which is now freely available on the Centre’s website (http://uh.learnpunjabi.org). We must remember that there are differences in the way Devnagri and Arabic scripts render sounds, and thus this is not a simple case of transliterating which can introduce various howlers. Dr Lehal pointed out that the main challenges had been restoring the missing diacritical marks in Urdu text, resolving the lexical ambiguities in these languages, both at the level of characters and words. Dealing with split/merged words in Urdu script and the issue of multiple/zero equivalence of characters in the two scripts also proved challenging.
Dr Lehal claims that the current system has been tested on more than 200 documents and the word level transliteration accuracy has been found to be 98.03 per cent and 99.15 per cent for Urdu-Hindi and Hindi-Urdu transliteration systems, respectively. That would make it a hot contender for the best system in terms of transliteration accuracy.
It is interesting that a university dedicated to Punjabi has become a bridge between two other languages-Urdu and Hindi. I am sure that this software developed by the university will provide a bridge between people who have a natural cultural affinity, but are divided by the ignorance of each other’s script.
The article is the latest in the column Bits about Bytes, published in the Lifestyle section of The Tribune on January 4, 2020.