Next Previous Contents

4. Applications with Devanagari

4.1 Networking

telnet

In some installations, telnet is not 8-bit clean by default. In order   to be able to send Unicode keystrokes to the remote host, you need to set    telnet into "outbinary" mode. There are two ways to do this:


$ telnet -L <host>
and

$ telnet
telnet> set outbinary
telnet> open <host>

You should do telnet from ncst-term.

kermit

The communications program C-Kermit http://www.columbia.edu/kermit/ckermit.html , (an interactive tool for connection setup, telnet, file transfer, with support for TCP/IP and serial lines), in versions 7.0 or newer, understands the file and transfer encodings UTF-8 and UCS-2, and understands the terminal encoding UTF-8, and converts between these encodings and many others. Documentation of these features can be found in http://www.columbia.edu/kermit/ckermit2.html#x6.6 .

4.2 Browsers

Netscape Navigator

Netscape 6.01 or later can display HTML documents in UTF-8 encoding.  All a document needs is the following line between the <head> and </head>  tags:


<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Netscape 6.01 or newer can also display HTML and text files in  UCS-2 encoding with byte-order mark.

To setup Netscape so that it displays Hindi characters, follow the steps :


1. Goto, Edit -> Preferences
2. Select category, Appearance -> Fonts
3. Select Language encoding "Unicode"
4. Set Variable-width and Fixed-width fonts to "raghu"
5. Check button "Always use my font settings, overriding web page font"

Also do the following to set character coding scheme to UTF-8


1. Goto, View -> Character Coding
2. Select "Unicode (UTF-8)" from the list

Download Netscape-6 from http://www.netscape.com/computing/download/ or IndiX download page.

Screenshot


Mozilla

Mozilla milestone M16 has much better internationalization than Netscape 4. It can display HTML documents in UTF-8 encoding with support for more languages.

http://www.mozilla.org/

Screenshot


lynx

lynx-2.8 has an options screen (key 'O') which permits to set the display character set. When running in an ncst-term or Linux console in UTF-8 mode, set this to "UNICODE UTF-8". Note that for this setting to take effect in the current browser session, you have to confirm on the "Accept Changes" field, and for this setting to take effect in future browser sessions, you have to enable the "Save options to disk" field and then confirm it on the "Accept Changes" field.

Now, again, all a document needs is the following line between the <head> and </head> tags:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

When you are viewing text files in UTF-8 encoding, you also need to pass the command-line option "-assume_local_charset=UTF-8" (affects only file:/... URLs) or "-assume_charset=UTF-8" (affects all URLs). In lynx-2.8.2 you can alternatively, in the options screen (key 'O'), change the assumed document character set to "utf-8".

There is also an option in the options screen, to set the "preferred document character set". But it has no effect, at least with file:/... URLs and with http://... URLs served by apache-1.3.0.

There is a spacing and line-breaking problem, however. (Look at the Russian section of x-utf8.html, or at utf-8-demo.txt.)

Also, in lynx-2.8.2, configured with --enable-prettysrc, the nice colour scheme does not work correctly any more when the display character set has been set to "UNICODE UTF-8". This is fixed by a simple patch lynx282.diff.

The Lynx developers say: "For any serious use of UTF-8 screen output with lynx, compiling with slang lib and -DSLANG_MBCS_HACK is still recommended."

Latest stable release: ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz

http://lynx.isc.org/

General home page: http://lynx.browser.org/

http://www.slcc.edu/lynx/

Newer development shapshots: http://lynx.isc.org/current/ , ftp://lynx.isc.org/current/

Screenshot



Konqueror

Konqueror has good support for Unicode.

To setup Konqueror so that it displays Hindi characters, follow the steps :



1. Goto, Settings -> Configure Konqueror
2. Select "Konqueror Bowser" from the left pan
3. Goto "Appearance" tab on the right pan
4. Select charset "iso106460-1"
5. Set all fonts to "raghu" for this encoding and also set Default encoding to "utf8"

Screenshot



Test pages

What is Unicode? in Hindi

Some test pages for browsers can be found at the pages of Alan Wood http://www.hclrss.demon.co.uk/unicode/#links and James Kass http://home.att.net/~jameskass/.

4.3 Editors

yudit

yudit by Gáspár Sinai http://czyborra.com/yudit/ is a first-class unicode text editor for the X Window System. It supports simultaneous processing of many languages, input methods, conversions for local character standards. It has facilities for entering text in all languages with only an English keyboard, using keyboard configuration maps.

It can be compiled in three versions: Xlib GUI, KDE GUI,or Motif GUI.

Customization is very easy. Typically you will first customize   your font. From the font menu I chose "Unicode".

Next, you will customize your input method. The input methods "Straight", "Unicode" and "SGML" are most remarkable. For details about the other built-in input methods, look in /usr/local/share/yudit/data/.

To make a change the default for the next session, edit your $HOME/.yuditrc file.

The general editor functionality is limited to editing, cut&paste and search&replace. No undo.

yudit can display text using a TrueType font. But it doesn't seem to support combining characters.

vim

Open vim under ncst-term. vim (as of version 6.0) has good support for UTF-8:  when started  in an UTF-8 locale, it assumes UTF-8 encoding for the console  and the text  files being edited. It supports double-wide (CJK) characters  as well and combining characters and therefore fits perfectly into UTF-8 enabled ncst-term.            

Installation: Download vim 6.0 from ftp://ftp.vim.org or IndiX download page. After unpacking it, configure with --enable-multibyte, then do "make" and "make install".

Screenshot



emacs

xemacs

nedit

gedit

gedit is an editor developed using GtkText widget. gedit-0.9.0 does not support FontSet. So you can't edit both English and Hindi text simultaneously. But if you choose proper font then you will be able to use any one language at a time.

Screenshot



xedit

With XFree86-4.0.1, xedit is able to edit UTF-8 files if you set the locale accordingly (see above), and add the line "Xedit*international: true" to your $HOME/.Xdefaults file.

axe

As of version 6.1.2, aXe supports only 8-bit locales. If you add the line "Axe*international: true" to your $HOME/.Xdefaults file, it will simply dump core.

pico

mined98

mined98 is a small text editor by Michiel Huisjes, Achim Müller and Thomas Wolff. http://www.inf.fu-berlin.de/~wolff/mined.html It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit ncst-term. It also has powerful capabilities for entering Unicode characters.

mined lets you edit both 8-bit encoded and UTF-8 encoded files. By default it uses an autodetection heuristic. If you don't want to rely on heuristics, pass the command-line option -u when editing an UTF-8 file, or +u when editing an 8-bit encoded file. You can change the interpretation at any time from within the editor: It displays the encoding ("L:h" for 8-bit, "U:h" for UTF-8) in the menu line. Click on the first of these characters to change it.

mined knows about double-width and combining characters and displays them correctly.

mined also has very nice pull-down menus. Alas, the "Home", "End", "Delete" keys do not work.

4.4 Mailers

MIME: RFC 2279 defines UTF-8 as a MIME charset, which can be transported under the 8bit, quoted-printable and base64 encodings. The older MIME UTF-7 proposal (RFC 2152) is considered to be deprecated and should not be used any further.

Mail clients released after January 1, 1999, should be capable of sending and displaying UTF-8 encoded mails, otherwise they are considered deficient. But these mails have to carry the MIME labels


Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Simply piping an UTF-8 file into "mail" without caring about the MIME labels will not work.

Mail client implementors should take a look at http://www.imc.org/imc-intl/ and http://www.imc.org/mail-i18n.html .

Now about the individual mail clients (or "mail user agents"):

pine

The situation for an unpatched pine version 4.10 is as follows.

Pine does not do character set conversions. But it allows you to view UTF-8 mails in an UTF-8 text window (Linux console or ncst-term).

Normally, Pine will warn about different character sets each time you view an UTF-8 encoded mail. To get rid of this warning, from Main Menu choose S (setup), then C (config), then change the value of "character-set" to UTF-8. This option will not do anything, except to reduce the warnings, as Pine has no built-in knowledge of UTF-8. Also set "pass-control-characters-as-is" option in "Viewer Preferences" of config menu.

To compose mails in Devanagari script, you must use vim editor inside pine. To do this, change the value of "editor" to vim in config menu and then set "enable-alternate-editor-implicitly" option in "Composer Preferences" of config menu.

To enable 8-bit transfer with UTF-8 encoding at other places in pine, you must set header values properly. Change the value of "customized-hdrs" and set the following comma separated values :

Content-Type: text/plain; charset=UTF-8, Content-Transfer-Encoding: 8bit

However, alignment remains broken in many places; replying to a mail does not cause the character set to be converted as appropriate; and the editor, pico, cannot deal with multibyte characters.

Screenshot



kmail

kmail (as of KDE 1.0) does not support UTF-8 mails at all.

Netscape's Mail

Netscape's Mail can send and display mails in UTF-8 encoding, but it needs a little bit of manual user intervention.

To send an UTF-8 encoded mail: After opening the "Mail" window, but before starting to compose the message, select from the menu
"View -> Character Coding -> Unicode (UTF-8)".
Then compose the message and send it.

When you receive an UTF-8 encoded mail,   Netscape unfortunately does not display it in UTF-8 right away, and does  not even give a visual clue that the mail was encoded in UTF-8. You have to manually select from the menu "View -> Character Coding -> Unicode (UTF-8)".

For displaying UTF-8 mails, Netscape uses different fonts. You can adjust your font settings in the "Edit -> Preferences -> Fonts" dialog; choose the "Unicode" font category.

Screenshot



emacs (rmail, vm)

mutt

mutt-1.0, as available from http://www.mutt.org/, contains only rudimentary UTF-8 support. For full UTF-8 support, there are patches by Edmund Grimley Evans at http://www.rano.demon.co.uk/mutt.html .

exmh

exmh 2.1.2 with Tk 8.4a1 can recognize and correctly display UTF-8 mails if you add the following lines to your $HOME/.Xdefaults file.


!
! Exmh
!
exmh.mimeUCharsets: utf-8
exmh.mime_utf-8_registry: iso10646
exmh.mime_utf-8_encoding: 1
exmh.mime_utf-8_plain_families: fixed
exmh.mime_utf-8_fixed_families: fixed
exmh.mime_utf-8_proportional_families: fixed
exmh.mime_utf-8_title_families: fixed

4.5 Text processing

groff

groff 1.16, the GNU implementation of the traditional Unix text processing system troff/nroff, can output UTF-8 formatted text. Simply use `groff -Tutf8' instead of `groff -Tlatin1' or `groff -Tascii'.

TeX

The teTeX 0.9 (and newer) distribution contains an Unicode adaptation of TeX, called Omega ( http://www.gutenberg.eu.org/omega/ , ftp://ftp.ens.fr/pub/tex/yannis/omega ). Together with the unicode.tex file contained in utf8-tex-0.1.tar.gz it enables you to use UTF-8 encoded sources as input for TeX. A thousand of Unicode characters are currently supported.

All that changes is that you run `omega' (instead of `tex') or `lambda' (instead of `latex'), and insert the following lines at the head of your source input.


\ocp\TexUTF=inutf8
\InputTranslation currentfile \TexUTF
\input unicode

4.6 Databases

PostgreSQL

PostgreSQL 6.4 or newer can be built with the configuration option --with-mb=UNICODE.

4.7 Other text-mode applications

There are many text-mode applications in Linux. All these applications should be run in ncst-term.

less

With     http://www.flash.net/~marknu/less/less-358.tar.gz  you can browse  UTF-8  encoded text files in an UTF-8 ncst-term or console. Make sure that  the environment  variable LESSCHARSET is not set (or is set to utf-8). If  you also have a LESSKEY environment variable set, also make sure that the  file it points to does not define LESSCHARSET. If necessary, regenerate this  file using the `lesskey' command, or unset the LESSKEY environment variable.

Screenshot



lv

lv-4.21 by Tomio Narita http://www.mt.cs.keio.ac.jp/person/narita/lv/ is a file viewer with builtin character set converters. To view UTF-8 files in an UTF-8 console, use "lv -Au8". But it can also be used to view files in other CJK encodings in an UTF-8 console.

There is a small glitch: lv turns off cursor and doesn't turn it on again.

expand, wc

Get the GNU textutils-2.0 and apply the patch textutils-2.0.diff , then configure, add "#define HAVE_MBRTOWC 1", "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to config.h. In src/Makefile, modify CFLAGS and LDFLAGS so that they include the directories where libutf8 is installed. Then rebuild.

col, colcrt, colrm, column, rev, ul

Get the util-linux-2.9y package, configure it, then define ENABLE_WIDECHAR in defines.h, change the "#if 0" to "#if 1" in lib/widechar.h. In text-utils/Makefile, modify CFLAGS and LDFLAGS so that they include the directories where libutf8 is installed. Then rebuild.

figlet

figlet 2.2 has an option for UTF-8 input: "figlet -C utf8"

Base utilities

The Li18nux list of commands and utilities that ought to be made interoperable with UTF-8 is as follows. Useful information needs to get added here; I just didn't get around it yet :-)

As of glibc-2.2, regular expressions will only work for 8-bit characters. In an UTF-8 locale, regular expressions that contain non-ASCII characters or that expect to match a single multibyte character with "." will not work. This affects all commands and utilities listed below.

alias

No info available yet.

ar

No info available yet.

arch

No info available yet.

arp

No info available yet.

asa

No info available yet.

at

As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be replaced with a use of quotearg.c or an exclude list of the (fixed) list of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid and should become arbitrary length.

basename

As of sh-utils-2.0i: OK.

batch

No info available yet.

bc

No info available yet.

bg

No info available yet.

bunzip2

No info available yet.

bzip2

No info available yet.

bzip2recover

No info available yet.

cal

No info available yet.

cat

No info available yet.

cd

No info available yet.

cflow

No info available yet.

chgrp

As of fileutils-4.0u: OK.

chmod

As of fileutils-4.0u: OK.

chown

As of fileutils-4.0u: OK.

chroot

As of sh-utils-2.0i: OK.

cksum

As of textutils-2.0e: OK.

clear

No info available yet.

cmp

No info available yet.

col

No info available yet.

comm

No info available yet.

command

No info available yet.

compress

No info available yet.

cp

As of fileutils-4.0u: OK.

cpio

No info available yet.

csplit

No info available yet.

ctags

No info available yet.

crontab

No info available yet.

cut

No info available yet.

date

As of sh-utils-2.0i: OK.

dd

As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly.

depmod

No info available yet.

df

As of fileutils-4.0u: OK.

diff

As of diffutils-2.7 (1994): diff is not locale aware; the --side-by-side mode therefore doesn't compute column width correctly, not even in ISO-8859-1 locales.

diff3

No info available yet.

dirname

As of sh-utils-2.0i: OK.

domainname

No info available yet.

du

As of fileutils-4.0u: OK.

echo

As of sh-utils-2.0i: OK.

env

As of sh-utils-2.0i: OK.

expand

No info available yet.

expr

As of sh-utils-2.0i: The operators "match", "substr", "index", "length" don't work correctly.

false

As of sh-utils-2.0i: OK.

fc

No info available yet.

fg

No info available yet.

file

No info available yet.

find

As of findutils-4.1.5: The "-ok" option is not internationalized; a patch has been submitted to the maintainer. The "-iregex" does not work correctly; this needs a fix in function find/parser.c:insert_regex.

fort77

No info available yet.

ftp[BSD]

No info available yet.

fuser

No info available yet.

getconf

No info available yet.

getopts

No info available yet.

gunzip

No info available yet.

gzip

gzip-1.3 is UTF-8 capable, but it uses only English messages in ASCII charset. Proper internationalization would require: Use gettext. Call setlocale. In function check_ofname (file gzip.c), use the function rpmatch from GNU text/sh/fileutils instead of asking for "y" or "n". The use of strlen in gzip.c:852 is wrong, needs to use the function mbswidth.

hash

No info available yet.

head

No info available yet.

hostname

As of sh-utils-2.0i: OK.

id

As of sh-utils-2.0i: OK.

ifconfig

No info available yet.

imake

No info available yet.

insmod

No info available yet.

ipchains

No info available yet.

ipcrm

No info available yet.

ipcs

No info available yet.

ipmasqadm

No info available yet.

jobs

No info available yet.

join

No info available yet.

kerneld

No info available yet.

kill

No info available yet.

killall

No info available yet.

ksyms

No info available yet.

ldd

No info available yet.

less

No complete info available yet.

lex

No info available yet.

lilo

No info available yet.

ln

As of fileutils-4.0u: OK.

loadkeys

No info available yet.

logger

No info available yet.

logname

As of sh-utils-2.0i: OK.

lp

No info available yet.

lpc[BSD]

No info available yet.

lpr[BSD]

No info available yet.

lprm[BSD]

No info available yet.

lpq[BSD]

No info available yet.

ls

As of fileutils-4.0y: OK.

lsmod

No info available yet.

m4

No info available yet.

mailx

No info available yet.

make

No info available yet.

mesg

No info available yet.

mkdir

As of fileutils-4.0u: OK.

mkfifo

As of fileutils-4.0u: OK.

mkfs

No info available yet.

mkswap

No info available yet.

modprobe

No info available yet.

more

No info available yet.

Screenshot
mount

No info available yet.

mv

As of fileutils-4.0u: OK.

netstat

No info available yet.

newgrp

No info available yet.

nice

As of sh-utils-2.0i: OK.

nl

No info available yet.

nohup

As of sh-utils-2.0i: OK.

nslookup

No info available yet.

nm

No info available yet.

od

No info available yet.

passwd[BSD]

No info available yet.

paste

No info available yet.

patch

No info available yet.

pathchk

As of sh-utils-2.0i: OK.

ping

No info available yet.

printf

As of sh-utils-2.0i: OK.

pr

No info available yet.

ps

No info available yet.

pwd

As of sh-utils-2.0i: OK.

read

No info available yet.

rdev

No info available yet.

reboot

No info available yet.

renice

No info available yet.

rm

As of fileutils-4.0u: OK.

rmdir

As of fileutils-4.0u: OK.

rmmod

No info available yet.

shar[BSD]

No info available yet.

shutdown

No info available yet.

sleep

As of sh-utils-2.0i: OK.

split

No info available yet.

strings

No info available yet.

strip

No info available yet.

stty

As of sh-utils-2.0i: The string "<undef>" should not be translated; this needs a fix in function stty.c:visible.

su[BSD]

No info available yet.

sum

As of textutils-2.0e: OK.

tac

No info available yet.

tail

No info available yet.

talk

No info available yet.

tar

As of tar-1.13.17: OK, if user and group names are always ASCII.

tclsh

No info available yet.

tee

As of sh-utils-2.0i: OK.

telnet

No info available yet.

test

As of sh-utils-2.0i: OK.

time

No info available yet.

touch

As of fileutils-4.0u: OK.

tput

No info available yet.

tr

No info available yet.

true

As of sh-utils-2.0i: OK.

tsort

No info available yet.

tty

As of sh-utils-2.0i: OK.

type

No info available yet.

ulimit

No info available yet.

umask

No info available yet.

umount

No info available yet.

unalias

No info available yet.

uname

As of sh-utils-2.0i: OK.

uncompress

No info available yet.

unexpand

No info available yet.

uniq

No info available yet.

unlink

No info available yet.

uudecode

No info available yet.

uuencode

No info available yet.

wait

No info available yet.

wc

As of textutils-2.0e: wc cannot count characters; a patch has been submitted to the maintainer.

who

As of sh-utils-2.0i: OK.

wish

No info available yet.

write

No info available yet.

xargs

As of findutils-4.1.5: The program uses strstr; a patch has been submitted to the maintainer.

yacc

No info available yet.

zcat

No info available yet.

4.8 Other X11 applications

Owen Taylor is currently developing a library for rendering multilingual text, called pango. http://www.labs.redhat.com/~otaylor/pango/ , http://www.pango.org/ .



Next Previous Contents