Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stripping HTML tags with Perl



>>>>> On Tue, 5 Dec 2000 10:50:45 +0900, "Stephen J. Turnbull" <turnbull@example.com> said:

    SJT> You really want "<[^>]+>" (delete anything bracketed by "<>"
    SJT> containing some text which doesn't contain ">").  This avoids
    SJT> trashing the Pascal inequality test "<>" which is not a legal
    SJT> tag, but will fail miserably on stuff like

    SJT> <address default="<phb@example.com>">

    SJT> which may or may not be legal HTML.

So, use s/<\/?[^>]+>//g to get rid of all html tags

Viktor


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links