Word to HTML conversion is required in various cases, such as for embedding the document’s content on the web pages. In this article, you will learn how to convert MS Word DOCX or DOC documents to HTML using Python. Moreover, you will learn how to control the conversion of Word to HTML dynamically using different options.
Python Library to Convert Word to HTML
In order to convert Word documents to HTML, we will use Aspose.Words for Python. It is a powerful and feature-rich API for creating and manipulating Word documents. Also, it provides a high-fidelity conversion of Word documents to other formats. Aspose.Words for Python is available on PyPI and you can install it using the following pip command.
pip install aspose-words
Convert a Word DOC to HTML in Python
The following are the steps to convert a Word document to an HTML file using Python.
- Load the Word document using Document class.
- Create an object of HtmlSaveOptions class.
- Enable export of font resources using HtmlSaveOptions.export_font_resources property.
- Convert Word document to HTML using Document.save() method.
The following code sample shows how to convert a DOCX file to HTML in Python.
Customize Word to HTML Conversion in Python
Aspose.Words for Python also provides different options to customize the Word to HTML conversion. For example, you can convert documents with round-trip information, specify the folder to save the resource files, and so on.
Convert DOC to HTML with Round-trip Information
HTML doesn’t support all the features provided by MS Word, therefore, to mimic the Word document in HTML we need to save additional information termed as round-trip information. The following are the steps to turn on the export of round-trip information in Word to HTML conversion.
- Load the Word document using Document class.
- Create an object of HtmlSaveOptions class and set HtmlSaveOptions.export_roundtrip_information property to true.
- Convert Word document to HTML using Document.save() method and pass HTML file’s name and HtmlSaveOptions as parameters.
The following code sample shows how to export round-trip information in Word to HTML conversion.
Specify Resources Folder in DOC to HTML Conversion
You can also specify a folder where you want to store all the resources such as images, CSS files, and fonts. For this, you can use HtmlSaveOptions.export_font_resources property. You can also specify separate folders for fonts and images using HtmlSaveOptions.fonts_folder and HtmlSaveOptions.images_folder properties, respectively. The following are the steps to use a separate folder to save resources in Word to HTML conversion.
- Load the Word document using Document class.
- Create an object of HtmlSaveOptions class and set HtmlSaveOptions.export_font_resources property to true.
- Specify the name of the resource folder using HtmlSaveOptions.resource_folder property.
- Convert Word document to HTML using Document.save() method and pass HTML file’s name and HtmlSaveOptions as parameters.
The following code sample shows how to specify a resource folder in Word to HTML conversion.
Get a Free API License
You can get a temporary license in order to use Aspose.Words for Python without evaluation limitations.
Conclusion
In this article, you have learned how to convert Word documents to HTML using Python. Moreover, you have seen how to customize the Word to HTML conversion dynamically with different options. You can easily embed this feature in your Python applications. Besides, you can explore other features of Aspose.Words for Python using the documentation. Also, you can ask your questions via our forum.
See Also
Info: You may be interested in another Python API (Aspose.Slides for Python via NET) that allows you to convert presentations to images and import images into presentations.