Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

stripping HTML tags with Perl



Pardon me if this is too far off-topic, but I plan to buy
Simon Cozen's book soon (once I make myself worthy), so it should be my
only question like this. 

I'm getting my toes wet in Perl by trying to strip a file of some
strings, including HTML tags. What I have so far is below.  
(The name of file is nd2.)  The problem line is

 s/<.*?>//ig;

If the tag is <h3>, for example, the substitution above deletes only
the 3> portion; it leaves <h untouched.

I think I'll be on my way if someone can explain why that happens and
what I ought to be doing.  Thanks for any leads.

Drew Poulin  




@example.com="/home/poulin/scripts/nd2";

$^I=".bk";         

while (<>) {
        s/diff .*?\n//ig;     #delete lines beginning with diff(sp)
        s/[0-9].*?\n//ig;
	s/\^M//ig;
	s/<.*?>//ig;
	print;          
                        
}


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links