Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: stripping HTML tags with Perl
- To: tlug@example.com, turnbull@example.com
- Subject: Re: stripping HTML tags with Perl
- From: "Drew C. Poulin" <poulin@example.com>
- Date: Mon, 04 Dec 2000 19:03:39 -0800
- Content-Transfer-Encoding: 7bit
- Content-Type: Text/Plain; charset=us-ascii
- In-Reply-To: <14892.18933.990298.358983@example.com>
- References: <20001204133053G.poulin@example.com><14892.18933.990298.358983@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <N2-dX.A.OcC.7qFL6@example.com>
- Resent-Sender: tlug-request@example.com
From: "Stephen J. Turnbull" <turnbull@example.com> > The actual effect of this is to delete from the string "diff" (or > "DIFF" or "dIfF" to EOL, so that Honto da. Gulp. Since I get paid by the word, that could get expensive. > "\r" is the Perl idiom for ASCII CR (0x0D). You can use the literal > escape with arbitrary characters, but it doesn't transport well (the I see. > s/<.*?>//ig; > > This is an oops, I think. AFAIK Perl regexps are _greedy_, matching > the longest possible string. Thus As Darren Cook mentions, the ? makes it stingy, so that it matches the next > as it works forward through the string. Without the ?, it jumps to the end of the string, works backward, and matches the first > that it finds as it moves in that direction. Or so I understand. > This avoids trashing > the Pascal inequality test "<>" which is not a legal tag, but will > fail miserably on stuff like > > <address default="<phb@example.com>"> And it still fails miserably. Taking the ? out of s/<.*?>//ig; does wipe out everything between and including the two outermost < >s, but that's probably not what you'd consider success. Thanks for all the pointers. Drew Poulin
- Follow-Ups:
- Re: stripping HTML tags with Perl
- From: "Stephen J. Turnbull" <turnbull@example.com>
- References:
- stripping HTML tags with Perl
- From: "Drew C. Poulin" <poulin@example.com>
- stripping HTML tags with Perl
- From: "Stephen J. Turnbull" <turnbull@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: stripping HTML tags with Perl
- Next by Date: Re: stripping HTML tags with Perl
- Prev by thread: Re: stripping HTML tags with Perl
- Next by thread: Re: stripping HTML tags with Perl
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links