
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] [tlug-digest] re: searching for kanji strings, ignorepunctuation and end of lines
- Date: Tue, 17 Jan 2006 17:28:36 +0900
- From: Edward Middleton <edward@example.com>
- Subject: Re: [tlug] [tlug-digest] re: searching for kanji strings, ignorepunctuation and end of lines
- References: <200601160956.k0G9uYH9019349@example.com> <43CB7753.7060609@example.com>
- User-agent: Mail/News 1.5 (X11/20060113)
David Riggs wrote:
> Thomas said:
> >------
> Since you have to do it with a one line perl script.
> echo file.txt | perl -0777 -nle
> 's/((?:\np0001a0..00..)*[^\n]*A(?:\np0001a0..00..|\.)*B(?:\np0001a0..00..|\.)*C(?:\np0001a0..00..|\.)*D(?:\np0001a0..00..|\.)*E(?:\np0001a0..00..|\.)*F[^\n]*)/\n--start--\1\n--finish--/m;print'
>
>
> will give you the lines bracketed by
> --start--
> p0001a05(00)-ghi.jklmn.op.rs.AB.
> p0001a06(00)-CD.EFtuvw.xyz.
> --finish--
> ->
>
> Thanks. I will give something like this it a try.
Try this
./srch kanjistring blob
Edward
#!/usr/bin/perl -0777
require Encode;
binmode STDOUT, ":utf8";
$linestart='p........00.-';
@example.com = split(/ */, Encode::decode_utf8(shift));
$sep = join('','(?:\n',$linestart,'|\.)*');
$start = join('','(',$linestart,'[^\n]*', shift @example.com );
$end = '[^\n]*)';
while($_= shift @example.com){
$middle=join '',$middle,$sep,$_;
}
$regexp=join('',$start,$middle,$end);
while(<>){
$file=Encode::decode_utf8($_);
@example.com=($file =~/$regexp/g);
print "$ARGV\n",join("\n",@example.com),"\n\n" if (eof && @example.com>1);
}
Home |
Main Index |
Thread Index