Html-source-code
Html-source-code (Photo credit: Wikipedia)

Ttidy is a famous toolset for validating HTML and XML content. It identifies mistakes, and corrects them auto-magically. To use it from python, you can use pytidylib:

sudo apt-get install tidy
pip install pytidylib
 

Or alternatively

easy_install pytidylib
 

Or if you are behind a proxy:

pip install pytidylib --proxy "prxy:port"

Then in the python source code:

from tidylib import tidy_document, tidy_fragment
htmlFragment = """
<h1>An HTML example</h1>
<a href="http://www.blogger.com/blogger.g?blogID=4882386696817687861#">my link</a>
"""
htmlFragment, errors = tidy_fragment(htmlFragment,tidyoptions)

You can pass some options to control more finely tidylib:

tidyoptions={
"indent": "auto",
"indent-spaces": 2,
"wrap": 72,
"markup": True,
"output-xml": False,
"input-xml": False,
"show-warnings": True,
"numeric-entities": True,
"quote-marks": True,
"quote-nbsp": True,
"quote-ampersand": False,
"break-before-br": False,
"uppercase-tags": False,
'uppercase-attributes': False
}
 
 

Leave a Reply

Your email address will not be published. Required fields are marked *