Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: stripping HTML tags with Perl
- To: tlug@example.com, turnbull@example.com
 - Subject: Re: stripping HTML tags with Perl
 - From: "Drew C. Poulin" <poulin@example.com>
 - Date: Mon, 04 Dec 2000 19:03:39 -0800
 - Content-Transfer-Encoding: 7bit
 - Content-Type: Text/Plain; charset=us-ascii
 - In-Reply-To: <14892.18933.990298.358983@example.com>
 - References: <20001204133053G.poulin@example.com><14892.18933.990298.358983@example.com>
 - Reply-To: tlug@example.com
 - Resent-From: tlug@example.com
 - Resent-Message-ID: <N2-dX.A.OcC.7qFL6@example.com>
 - Resent-Sender: tlug-request@example.com
 
From: "Stephen J. Turnbull" <turnbull@example.com> > The actual effect of this is to delete from the string "diff" (or > "DIFF" or "dIfF" to EOL, so that Honto da. Gulp. Since I get paid by the word, that could get expensive. > "\r" is the Perl idiom for ASCII CR (0x0D). You can use the literal > escape with arbitrary characters, but it doesn't transport well (the I see. > s/<.*?>//ig; > > This is an oops, I think. AFAIK Perl regexps are _greedy_, matching > the longest possible string. Thus As Darren Cook mentions, the ? makes it stingy, so that it matches the next > as it works forward through the string. Without the ?, it jumps to the end of the string, works backward, and matches the first > that it finds as it moves in that direction. Or so I understand. > This avoids trashing > the Pascal inequality test "<>" which is not a legal tag, but will > fail miserably on stuff like > > <address default="<phb@example.com>"> And it still fails miserably. Taking the ? out of s/<.*?>//ig; does wipe out everything between and including the two outermost < >s, but that's probably not what you'd consider success. Thanks for all the pointers. Drew Poulin
- Follow-Ups:
 
- Re: stripping HTML tags with Perl
 
- From: "Stephen J. Turnbull" <turnbull@example.com>
 
- References:
 
- stripping HTML tags with Perl
 
- From: "Drew C. Poulin" <poulin@example.com>
 - stripping HTML tags with Perl
 
- From: "Stephen J. Turnbull" <turnbull@example.com>
 Home | Main Index | Thread Index
- Prev by Date: Re: stripping HTML tags with Perl
 - Next by Date: Re: stripping HTML tags with Perl
 - Prev by thread: Re: stripping HTML tags with Perl
 - Next by thread: Re: stripping HTML tags with Perl
 - Index(es):
 
Home Page Mailing List Linux and Japan TLUG Members Links