In various scenarios, the text is extracted from the documents for further processing such as in text analysis, classification, etc. Among other documents such as PDF and Word, PowerPoint files are also used in text extraction. Therefore, this article aims to show you how to extract text from PowerPoint PPT in Python. We will cover how to extract text from a specific slide or the whole presentation.
- Python Library to Extract Text from PowerPoint PPT
- Extract Text from PowerPoint PPT
Python Library to Extract Text from PowerPoint PPT
To extract text from PowerPoint PPT, we will use Aspose.Slides for Python via .NET. It is a feature-rich Python library to create and update PowerPoint presentations. Furthermore, it allows you to manipulate and convert the presentations seamlessly. You can install this library from PyPI using the following pip command.
> pip install aspose.slides
Extract Text from PowerPoint PPT in Python
Depending upon the scenario, you may need to extract text either from the whole PowerPoint presentation or some specific slide(s). In the following sections, we will demonstrate how to perform text extraction in both of the above-mentioned cases. So let’s proceed.
Python: Extract Text from a Specific PPT Slide
The following are the steps to extract text from a specific slide in PPT in Python.
First, use PresentationFactory().get_presentation_text(string, TextExtractionArrangingMode) method to get all types of text in the presentation.
After that, use index to extract text of a sepcific slide from slides_text array.
The following are the types of text you can extract:
Slide layout text
Slide master text
The following code sample shows how to extract text from a specific PPT slide in Python.
Python Text Extraction from Whole PowerPoint PPT
The following steps demonstrate how to extract text from all the slides of a PowerPoint presentation.
- First, use PresentationFactory().get_presentation_text(string, TextExtractionArrangingMode) method to get all types of text in presentation.
- Load presentation in a Presentation object.
- Iterate through the number of slides in the presentation.
- Extract text from each slide using slides_text array.
The following code sample shows how to extract text from a PPTX (or PPT) file in Python.
Python PPT Text Extraction Library - Get a Free License
You can use Aspose.Slides for Python without evaluation limitations by getting a temporary license.
In this article, you have learned how to extract text from PowerPoint PPT in Python. You have seen how to extract text from a specific slide or all the slides in a PowerPoint presentation. Besides, you can explore other features of Aspose.Slides for Python using the documentation. Also, you can share your queries with us via our forum.