Html-source-code (Photo credit: Wikipedia) |
Ttidy is a famous toolset for validating HTML and XML content. It identifies mistakes, and corrects them auto-magically. To use it from python, you can use pytidylib:
sudo apt-get install tidy
pip install pytidylib
Or alternatively
easy_install pytidylib
Or if you are behind a proxy:
pip install pytidylib --proxy "prxy:port"
Then in the python source code:
from tidylib import tidy_document, tidy_fragment
htmlFragment = """
<h1>An HTML example</h1>
<a href="http://www.blogger.com/blogger.g?blogID=4882386696817687861#">my link</a>
"""
htmlFragment, errors = tidy_fragment(htmlFragment,tidyoptions)
You can pass some options to control more finely tidylib:
tidyoptions={
"indent": "auto",
"indent-spaces": 2,
"wrap": 72,
"markup": True,
"output-xml": False,
"input-xml": False,
"show-warnings": True,
"numeric-entities": True,
"quote-marks": True,
"quote-nbsp": True,
"quote-ampersand": False,
"break-before-br": False,
"uppercase-tags": False,
'uppercase-attributes': False
}
Leave a Reply