Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: new webpage: rikai.com
- To: tlug@example.com
- Subject: Re: new webpage: rikai.com
- From: Simon Cozens <simon@example.com>
- Date: Wed, 13 Sep 2000 22:40:26 +0100
- Content-Disposition: inline
- Content-Transfer-Encoding: 8bit
- Content-Type: text/plain; charset=iso-8859-1
- In-Reply-To: <20000913201509.C8594@example.com>; from simon@example.com on Wed, Sep 13, 2000 at 08:15:09PM +0100
- References: <20000912085446.B21156@example.com> <Pine.GSO.4.21.0009132320260.19768-100000@example.com> <20000913201509.C8594@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <eWOq3D.A.rjC.Na_v5@example.com>
- Resent-Sender: tlug-request@example.com
- Sender: Simon Cozens <simon@example.com>
- User-Agent: Mutt/1.2.5i
On Wed, Sep 13, 2000 at 08:15:09PM +0100, Simon Cozens wrote: > Ah, I see. I'll have a free equivalent GPLed by the end of next week. Or sooner. This is the back-end, which you can use already; it requires you to install ChaSen (which includes the Text::ChaSen Perl module, although that's not installed by default) and the HTML::Parser module. If you now do perl annotate < old.html > new.html new.html will be a copy of old.html with pop-up boxes giving the deinflected compound, kana reading and part of speech. Cool, huh? Suggestions (and patches!) welcome. ChaSen's installed size is less than 400k, so don't worry about having to drag down masses of stuff - you don't. Adding the front-end HTTP proxy is trivial, since there's a module called POE::Filter::HTTPD which does just that... -- I've looked at the listing, and it's right! -- Joel Halpern ---cut here--- use HTML::Parser; use Text::ChaSen; $cset = '<META http-equiv=\"Content-Type\" content=\"text/html; charset=EUC-JP\">'; $res = Text::ChaSen::getopt_argv('chasen-perl', '-j', '-F', '%m\t\t%M\t\t%y\t\t%U(%P-)\t\t%T \t%F \n'); $p = HTML::Parser->new( api_version => 3, marked_sections => 1); $p->handler(text => \&dotext, 'text'); $p->handler(default => sub { print @example.com}, 'text'); # Just spit out markup as-is $p->parse_file(*STDIN); sub dotext { $_ = shift; return unless /\S/; # Forget empty things... for (split /([\x80-\xff]+)/) { unless (/[\x80-\xff]/) { print $_; next; } # Split out non-EUC @example.com = split /\n/, Text::ChaSen::sparse_tostr($_); # Parse it! pop @example.com; for (@example.com) { my ($kanji, $deinflected, $yomi, $pos) = split /\t\t/, $_,4; if ($pos eq "̤Ãθì") { print $kanji; next; } # Pass unknowns. print <<EOF <A HREF="javascript:" onMouseOver=' mywin = window.open("","","width=200,height=200"); mywin.document.write("$cset<B>Word</B>:$kanji<P><B>Root</B>:$deinflected<P><B>Reading</B>: $yomi<P><B>Part of Speech</B>: $pos"); mywin.document.close(); ' onMouseOut='mywin.window.close(); return true;'> $kanji</A> EOF } } } # Compact, isn't it?
- References:
- Re: new webpage: rikai.com
- From: Simon Cozens <simon@example.com>
- Re: new webpage: rikai.com
- From: Todd.Rudick@example.com
- Re: new webpage: rikai.com
- From: Simon Cozens <simon@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: new webpage: rikai.com
- Next by Date: Re: new webpage: rikai.com
- Prev by thread: Re: new webpage: rikai.com
- Next by thread: Re: new webpage: rikai.com
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links