Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Good XML Parser



On Tue, 12 Dec 2006 21:11:28 +0900
Edward Middleton <edward@example.com> wrote:

> Botond Botyanszki wrote:
> > On Tue, 12 Dec 2006 11:42:40 +0100
> > "David Stibbe" <dstibbe@example.com> wrote:
> >
> >   
> >> Recently I have been able to work on my dictionary application again
> >> and finished an xml parser for  JMDict, using Xerces C++
> >> (http://xml.apache.org/xerces-c/). I chose that API particularly for
> >> it's SAX abilities.
> >>
> >> However, the parser is extremely slow when parsing JMDict completely,
> >> is this because of SAX,  the size of JMDict (35 MB) or something else
> >> (the machine I tested it on is a 1.6 GHz machine) ?
> >>     
> >
> > I don't think the problem here is SAX. If you put a sleep() into all
> > callbacks, sure the program will be slow. Is it because of the
> > architecture of SAX? Not really.
> > Nobody will be able to answer your question. You should to profile your
> > code. If it really turns out that xerces is at fault then you will
> > probably need to ditch java as well and do it in C.
> >   
> 
> If he is using xerces-c it is the C++ version not xerces-j the java
> version.  Either way as  Botond said you will need to profile.  You
> might also try looking at a lighter weight less conformant library like
> libxml2[1]

Sorry I didn't read carefully, the J in JMDict made me think in java and
xerces-j.
For C++, expat could be also an option if it turns out that xerces is
slow. It's one of the fastest parsers, but is really bare-bones unlike
libxml2 or xerces. If you don't want much validation it could meet your
requirements too.

b.

Attachment: signature.asc
Description: PGP signature


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links