Microsoft Word file formats DOC/DOCX are famous because the word processor supports a variety of features to organize and explain information. However, converting Word documents to HTML is often necessary when you want to display your documents on a website or web application. In this blog, we’ll walk you through the process of converting Word documents to HTML in Java.

Java Library to Convert Word DOC to HTML

Aspose.Words for Java is a mature and feature-rich library for working with Word documents. It allows you to read, create, modify, and convert Word documents to various formats, including HTML. You can easily configure Aspose.Words for Java API in your applications. You can download the JAR file from new Releases section where all APIs are updated almost every month.

Aspose.Words for Java dependency can be defined in your Maven project with the following configurations:

Repository:

<repositories>
    <repository>
        <id>AsposeJavaAPI</id>
        <name>Aspose Java API</name>
        <url>https://repository.aspose.com/repo/</url>
    </repository>
</repositories>

Dependency:

<dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>23.6</version>
        <classifier>jdk17</classifier>
    </dependency>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>23.6</version>
        <classifier>javadoc</classifier>
    </dependency>
</dependencies>

Now we are all set for DOCX to HTML conversion in a Java application.

Convert Word (DOC/DOCX) to HTML using Java

You can convert a Word document to HTML by following the steps below:

  1. Load source Word file with DOC or DOCX extension.
  2. Save the file as output HTML.

The code sample below shows how to convert DOCX to HTML using Java:

Input DOCX file Preview

Word to HTML in Java

Output HTML file Preview

DOCX to HTML in Java

So you can notice the high fidelity of document rendering with these screenshots. The API is capable of converting text, images, tables, and much more.

Convert Word DOCX to HTML5 in Java

HTML5 is the latest version of HTML. We have noted repeated requests for supporting HTML5 in Aspose.Words API. Therefore, DOCX to HTML5 conversion is supported and you can convert files by following steps:

  1. Firstly, load input DOCX file
  2. Set HtmlSaveOptions while setting SaveFormat
  3. Set enumeration value of HtmlVersion.HTML_5
  4. Save output file

The code snippet below shows how to convert DOCX to HTML5 in Java:

Convert Password-Protected Word file to HTML

DOC or DOCX files are sometimes password protected or encrypted using a password. You can also convert such files to HTML. However, you will need the password while loading the word file. You can follow the steps below for DOCX to HTML conversion:

  1. Firstly, initialize an object of LoadOptions class
  2. Set the password
  3. Load the encrypted DOCX file
  4. Convert DOCX to HTML

Likewise, the following code sample shows how to convert password protected DOCX file to HTML using Java:

Convert DOC to MHTML in Java

MHTML files are single files that contain embedded contents and media. You can convert word files (DOC/DOCX) to MHTML with following steps:

  1. Load input DOCX file
  2. Save output MHTML file using SaveFormat.MHTML

The code snippet below is based on this steps. Therefore, it shows how to convert DOCX to MHML with Java:

Conclusion

Converting Word documents to HTML is a common task for many Java developers. Aspose.Words for Java simplifies this process by providing a comprehensive and reliable solution. By following the steps and code examples in this blog, you can effortlessly convert your Word documents to HTML and integrate them into your Java projects with ease.

If you face any problem while setting up or testing the API then you can get in touch with us via Free Support Forums!

See Also