Cross-Site Scripting (XSS) attacks are a type of injection, in which malicious scripts are injected into otherwise benign and trusted web sites. XSS attacks occur when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user. Flaws that allow these attacks to succeed are quite widespread and occur anywhere a web application uses input from a user within the output it generates without validating or encoding it.
An attacker can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no way to know that the script should not be trusted, and will execute the script. Because it thinks the script came from a trusted source, the malicious script can access any cookies, session tokens, or other sensitive information retained by the browser and used with that site. These scripts can even rewrite the content of the HTML page.
To avoid it, some type of HTML parser can be used or there also exist open source tool like HTML PURIFIER to help us.
There are a number of open-source HTML filtering solutions out there on the web already. What sets HTML Purifier apart from them? Aren't all of these choices “secure”?
When it comes to HTML, attention to detail is key. Does it perform its filtering off a whitelist rather than an out-of-date blacklist? Does it filter every attribute in the document? Does it actually understand HTML?
Know thy enemy. Hackers have a huge arsenal of XSS vectors hidden within the depths of the HTML specification. HTML Purifier is effective because it decomposes the whole document into tokens and removing non-whitelisted elements, checking the well-formedness and nesting of tags, and validating all attributes according to their RFCs. HTML Purifier's comprehensive algorithms are complemented by a breadth of knowledge, ensuring that richly formatted documents pass through unstripped.