
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] SCML: A Structural Representation for Chinese Characters
- Date: Fri, 16 Nov 2007 19:06:56 +0100
- From: Niels Kobschätzki <n.kobschaetzki@example.com>
- Subject: [tlug] SCML: A Structural Representation for Chinese Characters
Hi!
Just found this paper and it's quite interesting:
SCML: A Structural Representation for Chinese Characters
Abstract:
Chinese characters are used daily by well over a billion people.
They constitute the main writing system of China and Taiwan, form a
major part of written Japanese, and are also used in South Korea.
Anything more than a cursory glance at these characters will reveal a
high degree of structure to them, but computing systems do not
currently have a means to operate on this structure. Existing
character databases and dictionaries treat them as numerical code
points, and associate with them additional `hand-computed' data, such
as stroke count, stroke order, and other information to aid in
specific searches. Searching by a character's `shape' is effectively
impossible in these systems.
I propose a new approach to representing these characters, through an
XML-based language called SCML. This language, by encoding an abstract
form of a character, allows the direct retrieval of important
information such as stroke count and stroke order, and permits useful
but previously impossible automated analysis of characters. In
addition, the system allows the design of a view that takes abstract
SCML representations as character models and outputs glyphs based on
an aesthetic, facilitating the creation of `meta-fonts' for Chinese
characters. Finally, through the creation of a specialized database,
SCML allows for efficient structural character queries to be performed
against the body of inserted characters, thus allowing people to
search by the most obvious of a character's characteristics: its shape.
File: ftp://ftp.cs.dartmouth.edu/TR/TR2007-592.pdf
Niels
Home |
Main Index |
Thread Index