Html-source-code (Photo credit: Wikipedia)

Ttidy is a famous toolset for validating HTML and XML content. It identifies mistakes, and corrects them auto-magically. To use it from python, you can use pytidylib:

sudo apt-get install tidy
pip install pytidylib

Or alternatively

easy_install pytidylib

Or if you are behind a proxy:

pip install pytidylib --proxy "prxy:port"

Then in the python source code:

from tidylib import tidy_document, tidy_fragment
htmlFragment = """
<h1>An HTML example</h1>
<a href="">my link</a>
htmlFragment, errors = tidy_fragment(htmlFragment,tidyoptions)

You can pass some options to control more finely tidylib:

"indent": "auto",
"indent-spaces": 2,
"wrap": 72,
"markup": True,
"output-xml": False,
"input-xml": False,
"show-warnings": True,
"numeric-entities": True,
"quote-marks": True,
"quote-nbsp": True,
"quote-ampersand": False,
"break-before-br": False,
"uppercase-tags": False,
'uppercase-attributes': False

Leave a Reply

Your email address will not be published. Required fields are marked *