TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
stripping HTML tags with Perl

To: tlug@example.com

Subject: stripping HTML tags with Perl

From: "Drew C. Poulin" <poulin@example.com>

Date: Mon, 04 Dec 2000 13:30:53 -0800

Content-Transfer-Encoding: 7bit

Content-Type: Text/Plain; charset=us-ascii

Reply-To: tlug@example.com

Resent-From: tlug@example.com

Resent-Message-ID: <0rIwEC.A.CLC.BzAL6@example.com>

Resent-Sender: tlug-request@example.com
Pardon me if this is too far off-topic, but I plan to buy
Simon Cozen's book soon (once I make myself worthy), so it should be my
only question like this. 

I'm getting my toes wet in Perl by trying to strip a file of some
strings, including HTML tags. What I have so far is below.  
(The name of file is nd2.)  The problem line is

 s/<.*?>//ig;

If the tag is <h3>, for example, the substitution above deletes only
the 3> portion; it leaves <h untouched.

I think I'll be on my way if someone can explain why that happens and
what I ought to be doing.  Thanks for any leads.

Drew Poulin  




@example.com="/home/poulin/scripts/nd2";

$^I=".bk";         

while (<>) {
        s/diff .*?\n//ig;     #delete lines beginning with diff(sp)
        s/[0-9].*?\n//ig;
	s/\^M//ig;
	s/<.*?>//ig;
	print;          
                        
}
Follow-Ups:

Re: stripping HTML tags with Perl
From: Fredric Fredricson <fredric.fredriksson@example.com>

stripping HTML tags with Perl
From: "Stephen J. Turnbull" <turnbull@example.com>

Re: stripping HTML tags with Perl
From: Simon Cozens <simon@example.com>

Prev by Date: Re: kinput2 & netscape

Next by Date: Re: stripping HTML tags with Perl

Prev by thread: Re: Convert IE bookmarks to Netscape

Next by thread: Re: stripping HTML tags with Perl

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links