extract images from word documents in python

A picture is worth a thousand words. This is the reason images are an integral part of documents, specifically Word documents. The images are used to make the content more attractive and eye-catching. When parsing Word documents, you may come across the scenario where you need to extract images. To achieve this programmatically, this article covers how to extract images from Word DOC DOCX in Python.

Python Library to Extract Images from Word DOC DOCX Documents

Aspose.Words for Python is a powerful and feature-rich library that is used to create and manipulate Word documents. We will use this library to extract images from DOCX or DOC files. You can install it in your Python applications from PyPI using the following pip command.

pip install aspose-words

Extracting Images from Word DOC in Python

The images in Word documents are represented by the shape nodes. Therefore, to retrieve images from a document, you will have to parse the shapes. The following steps show how to extract images from a Word DOC in Python.

  • First, load the Word document using Document class.
  • Then, retrieve all the shapes into an object using Document.get_child_nodes(NodeType.SHAPE, True) method.
  • Loop through the shapes and for each shape, perform the following operations:
    • Cast the shape into Shape type using as_shape() method.
    • Check if shape has image using Shape.has_image() method.
    • Save the shape as an image using Shape.image_data.save(string) method.

The following code sample shows how to extract images from a Word DOCX document in Python.

API to Extract Images from DOC DOCX - Get a Free API License

You can get a temporary license to use Aspose.Words for Python without evaluation limitations.

Conclusion

Images are commonly used in Word documents to make the content more appealing. In various cases, images are also required to be extracted from the documents along with the text. Therefore, in this article, you have learned how to extract images from Word DOC DOCX documents in Python. Besides this, you can explore the documentation of Aspose.Words for Python. In case you would have any questions, feel free to let us know via our forum.

See Also

Info: If you ever need to get a Word document from a PowerPoint presentation, you can use Aspose Presentation to Word Document converter.