Thursday, March 30, 2006

40% of rural India may be able to come online by 2007

And this means more non-English speaking people coming online with a lot of thirst for indian language content.

Microsoft, Reliance Infocomm and kiosk agencies such as Drishtee have reportedly chalked out plans to set up PC kiosks in Indian villages.

The government has also planned to set up 100,000 kiosks in the country’s villages by December 2007.If all goes as planned, more than 40 per cent of India’s countryside will be logged on to the Web by ‘07-end, reports Economic Times.

At present, there are around 10,000 internet kiosks in rural India, most of which consist of ITC’s e-Choupals.

Meanwhile, Intel has announced its Jagruti initiative in a bid to support the spread of rural internet kiosks based on the new Intel-Powered Community PCs. These PCs would be available through Intel partners HCL and Wipro.

Tuesday, March 28, 2006

About Hindi language from Wikipedia

Hindi (हिन्दी hindī), an Indo-European language spoken mainly in North, Central, and West India, is one of the national languages of India. It is part of a dialect continuum of the Indo-Aryan family, bounded on the northwest and west by Panjābī, Sindhī, and Gujarātī; on the south by Marāthī; on the southeast by Orīyā; on the east by Bengālī; and on the north by Nepālī. Seeing the popularity of Hindi, BBC World Service started News in Hindi in 1940.

There are 480 million native hindi speakers in the world today.

Hindi also refers to a standardized register of Hindustani that was made one of the official languages of India. The grammatical description in this article concerns this standard Hindi.

Hindi is often contrasted with Urdū, another standardized form of Hindustani that is the official language of Pakistan and some states in India. The primary differences between the two are that Standard Hindi is written in Devanāgarī and has supplemented some of its Persian and Arabic vocabulary with words from Sanskrit, while Urdu is written in Nastaliq script, a variant of the Persio-Arabic script, and draws heavily on Persian and Arabic vocabulary. The term "Urdu" also includes dialects of Hindustani other than the standardized languages. Other than these, linguists consider Hindi and Urdu to be the same language.

Building castles in the air ...

"If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them."
- Henry David Thoreau, Walden

Sunday, March 26, 2006

Scaling the Language Barrier

"In the annals of computer comedy, one of the most famous anecdotes is about asking a speech recognition engine, "Recognize speech?" The translation comes back: "Wreck a nice beach.""


Scaling the Language Barrier

Translation as decoding

"One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode."

Warren Weaver, March 1947

Google dominates in machine translation tests

Search giant Google's ambitions to make the Web more international has gotten a slight boost from a U.S. government-run test in which its translation software beat out technology from IBM and academia.

[...]

Google's machine translation wasn't perfect, but it was well ahead of the competition. On a scale from zero to one, the company's software scored 0.5137 on the Arabic tests and 0.3531 on the Chinese tests. The University of Southern California's Information Sciences Institute came in second with a 0.4657 on Arabic tests and 0.3073 on Chinese. IBM scored 0.4646 on Arabic and 0.2571 on Chinese.

[...]

Read the complete news article on CNET.

What is Machine Translation?

Okay - This was supposed to be the first post on this Blog. Here it goes - the definition of Machine Translation.

Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains.

Steve Silberman on Natural Language Processing

A renewed international effort is gearing up to design computers and software that smash language barriers and create a borderless global marketplace.


- Steve Silberman

Koko Conversation


You might be thinking - is this a blog about Machine Aided Translation or is it about monkeys. There is a lot to learn from our conversation with Koko (gurilla).

I found it really insightful. Check it on PBS website.

Meaning of Langulin

Langulin is a sanskrit word meaning long tail. We chose this name because we are helping the long tail of media reach new readership.

Langulin is the origin of world langur also. Langur is a monkey found in India (mostly around Himalayas). Langurs have a really long tail.

Langulin's Mission

To bring the medium and small size english magazines to India in indian languages using the best available technology for machine-aided translation.


Why medium and small size magazines?
  • These magazines do not have resources to launch their magazines in foreign languages and new markets.
  • These magazines are high growth magazines in terms of number of readership.
Initially focusing on magazines under following subjects
  • Women
  • Parenting
  • Health & Wellness
  • Home and Garden
  • Travel
What is in it for the magazines?
  • A new readership.
  • A new revenue stream.
What is in it for the readers?
  • Access to the ocean of knowledge about the subjects that they are interested in.
Long Term Vision

As the world is getting smaller and political and social boundries getting shattered there is a need for information dissemination from one country to another. Huge gaps exist in terms of access of information for everybody irrespective of the language that they use.

Our aim to bridge that gap and make all the interesting information from around the world accessible to indians who do not speak english.