Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] [OT] Regular Expressions to find Japanese Text



[Dave M G (Re: [tlug] [OT] Regular Expressions to find Japanese Text) writes:]
JB > That <br> is redundant. I may remove it at some stage. Better to extract
JB > between <li> and the next <li> or the terminal </ul>.
>> >   
>> If I may make a suggestion:
>> 
>> I recommend that if you do remove the <br> tag, which is definitely
>> redundant, you should replace it with a closing </li> tag. This will
>> make it more compatible with strict XHTML. I think evolving towards
>> XHTML compliance with your HTML output would be a very good thing.

Done.

>> Also, in the case of parsing as I'm doing, finding the next <li> tag or
>> terminal </ul> tag might be complicated by line breaks between them. Not
>> insurmountable, just complicated, and simply not an issue if the HTML
>> was XHTML compliant.

That particular <ul> list should have </li> terminations on all sites within
24 hours.

JB > Well, 付ける;着ける [つける] has 25 meanings grouped in 11 senses.
JB > Readings are a bit harder to count, but I think there are entries with 
JB > 5 or 6.

>> That's very helpful to know.
>> 
>> What do you mean by "grouped in 11 senses". That there are eleven
>> semi-colons dividing up the 25 meanings?

No, 24 semi-colons, but each new sense starts with the sense-number in
parentheses, e.g.
付ける .... (v1,vt) (1) to attach; to join; to add; to append; 
to affix; to stick; to glue; to fasten; to sew on; to apply (ointment); 
(2) to  furnish (a house with); (3) to wear; to put on; (4) to keep a 
diary; ...

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大蛙触Â


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links