Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] [tlug-digest] re: searching for kanji strings, ignorepunctuation and end of lines



David Riggs wrote:
> Thomas said:
> >------
> Since you have to do it with a one line perl script.
> echo file.txt | perl -0777 -nle
> 's/((?:\np0001a0..00..)*[^\n]*A(?:\np0001a0..00..|\.)*B(?:\np0001a0..00..|\.)*C(?:\np0001a0..00..|\.)*D(?:\np0001a0..00..|\.)*E(?:\np0001a0..00..|\.)*F[^\n]*)/\n--start--\1\n--finish--/m;print'
>
>
> will give you the lines bracketed by
> --start--
> p0001a05(00)-ghi.jklmn.op.rs.AB.
> p0001a06(00)-CD.EFtuvw.xyz.
> --finish--
> ->
>
> Thanks. I will give something like this it a try.
Try this

./srch kanjistring blob

Edward

#!/usr/bin/perl -0777
require Encode;
binmode STDOUT, ":utf8";
$linestart='p........00.-';
@example.com = split(/ */, Encode::decode_utf8(shift));
$sep = join('','(?:\n',$linestart,'|\.)*');
$start = join('','(',$linestart,'[^\n]*', shift @example.com );
$end = '[^\n]*)';
while($_= shift @example.com){
	$middle=join '',$middle,$sep,$_;
}
$regexp=join('',$start,$middle,$end);
while(<>){
	$file=Encode::decode_utf8($_);
	@example.com=($file =~/$regexp/g);
	print "$ARGV\n",join("\n",@example.com),"\n\n" if (eof && @example.com>1);
}

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links