Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: text analysis



Ulrike Schmidt <798a5047@example.com> wrote,

> Does anyone know of a program that chops texts into sentences, and,
> (more important since the first thing should not be so difficult): chops
> sentences into their basic sentences? Maybe even recording the structure
> of the original sentence?

Which language do you have in mind?  Generally, parsing (ie,
extracting the syntactical structure) out of a natural
language sentence is a *very hard* problem and still very
much a research issue.

> And of a program that transforms verbs, nouns, adverbs into their
> standard dictionary forms?

For Japanese, there are some routines in the edict.el
package maintained by Steve (written in Emacs Lisp).  They
are quite ok, but also have their problems in more difficult
cases -- actually, I am not using this feature very often.
For some languages, this task also can be very hard.
Consider German,

  Dieses Thema kommt immer mal wieder auf.

(As you know) The verb is `aufkommen', which is not easily
recognized as it is split into two parts in the above
sentence. 

> And something that finds misspelled words and gives recommendations for
> the correct spelling?

Finally, this one is easy :-)  Use `ispell'.  Should
already be installed on your Linux machine.

Cheers,

Manuel
----------------------------------------------------------------
Next Nomikai: 20 November, 19:30   Tengu TokyoEkiMae 03-3275-3691
Next Technical Meeting: 12 December, 12:30 HSBC Securities Office
----------------------------------------------------------------
more info: http://tlug.linux.or.jp Sponsors: PHT, HSBC Securities


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links