HTML to Text Java

HTML pages are widely used over the internet. They may contain images, drawings, and text to present information. Sometimes, you may need to convert an HTML file to Text. Accordingly, this article covers how to convert HTML to text programmatically in Java.

HTML to TXT Converter – Java API Installation

Aspose.HTML for Java API supports HTML, MHTML, Text, and many other file formats. You can quickly configure the API by downloading its JAR file from the New Releases section, or via the following configurations to access it from the Aspose Repository:

Repository:

 <repositories>
     <repository>
         <id>snapshots</id>
         <name>repo</name>
         <url>http://repository.aspose.com/repo/</url>
     </repository>
</repositories>

Dependency:

 <dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-html</artifactId>
        <version>22.7</version>
        <classifier>jdk17</classifier>
    </dependency>
</dependencies>

How to Convert HTML to TXT in Java

You can convert an HTML webpage to a TXT file with the following steps:

  1. Access the source HTML webpage.
  2. Specify the required properties for conversion.
  3. Convert HTML file to TXT format.

The next section further elaborates on the conversion process.

Convert HTML to Text Programmatically in Java

The following steps show how to convert HTML to Text programmatically in Java:

  1. Load the input HTML file with HTMLDocument class.
  2. Create an object of TextSaveOptions class.
  3. Convert the HTML to a Text file.

The following sample code shows how to convert HTML to Text programmatically in Java:

Explore Aspose.HTML for Java

You can explore many other features of the API by visiting the documentation space. It contains different sections and chapters to explain the methods and properties exposed by different classes of the API.

Conclusion

In conclusion, you have explored how to convert HTML to Text programmatically in Java. It can be useful to export the text string from the HTML page to plain TXT format. Moreover, please feel free to get in touch with us via forum in case of any concerns.

See Also