Word Document (DOC/DOCX) to HTML Conversion using Java

Microsoft Word file formats DOC/DOCX are famous because the word processor supports a variety of features to organize and explain information. Likewise, HTML file format is helpful to show information in web applications. In this article, we will be learning Word files (DOC/DOCX) to HTML conversion using Java. Following are some of the use cases that we will be exploring here:

Java DOCX to HTML Converter – Installation

First things first, you can easily configure Aspose.Words for Java API in your applications. You can download the JAR file from new Releases section where all APIs are updated almost every month. Moreover, all of the Java APIs, offered by Aspose, are hosted over the Maven repository. Likewise, Aspose.Words for Java dependency can be defined in your Maven project with the following configurations:

Repository

<repositories>
    <repository>
        <id>AsposeJavaAPI</id>
        <name>Aspose Java API</name>
        <url>https://repository.aspose.com/repo/</url>
    </repository>
</repositories>

Dependency

<dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>20.6</version>
        <classifier>jdk17</classifier>
    </dependency>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words</artifactId>
        <version>20.6</version>
        <classifier>javadoc</classifier>
    </dependency>
</dependencies>

Now we are all set for DOCX to HTML conversion in a Java application.

Convert Word (DOC/DOCX) to HTML using Java

You can convert Word to HTML by following the steps below:

  1. Load source Word file with DOC or DOCX extension
  2. Save the file as output HTML

The code sample below shows how to convert DOCX to HTML using Java:

Input DOCX file Preview

Word to HTML in Java

Output HTML file Preview

DOCX to HTML in Java

So you can notice the high fidelity of document rendering with these screenshots. The API is capable of converting text, images, tables, and much more.

Convert DOCX to HTML5 using Java

HTML5 is the latest version of HTML. We have noted repeated requests for supporting HTML5 in Aspose.Words API. Therefore, DOCX to HTML conversion is supported and you can convert files by following steps:

  1. Firstly, load input DOCX file
  2. Set HtmlSaveOptions while setting SaveFormat
  3. Set enumeration value of HtmlVersion.HTML_5
  4. Save output file

The code snippet below shows how to convert DOCX to HTML5 in Java:

Convert Password-Protected Word file to HTML using Java

DOC or DOCX files are sometimes password protected or encrypted using a password. You can also convert such files to HTML. However, you will need the password while loading the word file. You can follow the steps below for DOCX to HTML conversion:

  1. Firstly, initialize an object of LoadOptions class
  2. Set the password
  3. Load the encrypted DOCX file
  4. Convert DOCX to HTML

Likewise, the following code sample shows how to convert password protected DOCX file to HTML using Java:

Convert Word to MHTML using Java

MHTML files are single files that contain embedded contents and media. You can convert word files (DOC/DOCX) to MHTML with following steps:

  1. Load input DOCX file
  2. Save output MHTML file using SaveFormat.MHTML

The code snippet below is based on this steps. Therefore, it shows how to convert DOCX to MHML with Java:

Conclusion

Concludingly, we have learned conversion of word documents without needing Microsoft Word. For example, DOCX to HTML, MHTML, or HTML5 as per your requirements. Likewise, we have observed with screenshots that the conversion is performed with high fidelity and compatibility between the file formats. So you can try the API in your own Java environment. However, if you face any problem while setting up or testing the API then you can get in touch with us via Free Support Forums!

See Also

Mail Merge in Word Documents