Extract images from word documents using Java

Images are commonly used to represent important information in Word DOC documents. The inclusion of images alongside text makes the content more appealing. In certain cases, you may need to extract the images embedded within the DOC documents programmatically. To achieve that, this article covers how to extract images from DOC in Java.

Java API to Extract Images from DOC Files

Aspose.Words for Java is a powerful and feature-rich API for creating, manipulating, and converting MS Word documents. Therefore, we will use this API to extract images from DOC documents. You can download the API’s JAR or install it into your Java application using the following Maven configurations.

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-words</artifactId>
    <version>21.11</version>
    <type>pom</type>
</dependency>

How to Extract Images from DOC in Java

The images in a DOC document are represented using shape objects. Therefore, to retrieve images, you will have to process every shape in the document. The following are the steps to extract images from a DOC file in Java.

The following code sample shows how to extract images from a DOC document in Java.

Java DOC Image Extractor - Get a Free License

Get a free temporary license to use Aspose.Words for Java without evaluation limitations.

Conclusion

In this article, you have learned how to extract images from a DOC document in Java. Moreover, the code sample has shown how to extract the images from a DOC file and save them to the desired location. Besides, Aspose.Words for Java provides a wide range of features for document manipulation. To explore those features, you can visit the documentation. Also, you can ask your questions via our forum.

See Also

Info: You may be interested in another Java API (Aspose.Slides for Java) that allows you to convert presentations (into PDFs, word documents, etc.) and import images or other documents into presentations.