22.18 Charsets
In Emacs, charset is short for “character set”. Emacs
supports most popular charsets (such as ascii
,
iso-8859-1
, cp1250
, big5
, and unicode
), in
addition to some charsets of its own (such as emacs
,
unicode-bmp
, and eight-bit
). All supported characters
belong to one or more charsets.
Emacs normally does the right thing with respect to charsets, so that you don’t have to worry about them. However, it is sometimes helpful to know some of the underlying details about charsets.
One example is font selection (see Fonts). Each language
environment (see Language Environments) defines a priority
list for the various charsets. When searching for a font, Emacs
initially attempts to find one that can display the highest-priority
charsets. For instance, in the Japanese language environment, the
charset japanese-jisx0208
has the highest priority, so Emacs
tries to use a font whose registry
property is
‘ JISX0208.1983-0
’.
There are two commands that can be used to obtain information about
charsets. The command M-x list-charset-chars
prompts for a
charset name, and displays all the characters in that character set.
The command M-x describe-character-set
prompts for a charset
name, and displays information about that charset, including its
internal representation within Emacs.
M-x list-character-sets
displays a list of all supported
charsets. The list gives the names of charsets and additional
information to identity each charset; for more details, see the
ISO International Register of Coded Character Sets to be Used with
Escape Sequences (ISO-IR) maintained by
the Information Processing Society of Japan/Information Technology
Standards Commission of Japan (IPSJ/ITSCJ). In this list,
charsets are divided into two categories: normal charsets are
listed first, followed by supplementary charsets. A
supplementary charset is one that is used to define another charset
(as a parent or a subset), or to provide backward-compatibility for
older Emacs versions.
To find out which charset a character in the buffer belongs to, put
point before it and type C-u C-x =
(see Introduction to International Character Sets).