MS Word DOC and DOCX formats are commonly used to create rich text documents. You can add text, tables, graphics, animations and various other elements to DOC/DOCX documents. However, in certain cases, e.g. to parse and analyze the text in the Word documents, you have to convert DOC/DOCX files to TXT format programmatically. To achieve that, this article covers how to convert a DOC or DOCX file to TXT format in Python.
Python DOC/DOCX to TXT Converter Library
To save DOC and DOCX files in TXT format, we will use Aspose.Words for Python. It is a powerful and high-speed library that provides a bunch of features to create and manipulate text documents seamlessly. In addition, it offers a high-quality conversion of documents to other formats. You can install the library in your Python application from PyPI using the following pip command.
> pip install aspose-words
Convert DOCX to TXT in Python
Let’s see how to convert a DOCX file to TXT in Python. For this, you only need to load the DOCX file and save it as a TXT file. The following are the steps to save a DOCX file in TXT format in Python.
- Load the DOCX file using the Document class.
- Save DOCX as TXT using Document.save() method.
The following code sample shows how to perform DOCX to TXT conversion in Python.
Get a Free License
You can get a free temporary license to use Aspose.Words for Python without evaluation limitations.
In this article, you have learned how to convert DOC or DOCX files to TXT format in Python. This feature allows you to extract the text from DOCX files and save it in the form of a plain TXT file. Thus, you can analyze the text more conveniently. In addition, you can learn more about the library using the documentation. Also, you can share your questions or queries via our forum.