PDF is a popular format that is widely used for sharing documents between organizations and individuals. There might be scenarios where you have to find and replace some text in the PDF documents before sharing. You can do this manually, but that would take more time and be less efficient. The better and faster option would be to do this programmatically. In this article, you will learn how to find and replace text in PDF files using C++.
- C++ API to Find and Replace Text in PDF files
- Find and Replace Text in PDF using C++
- C++ Find and Replace Text in a Specific PDF Page
- Replace Text in PDF Page Region using C++
- Find and Replace Text in PDF files using Regular Expressions
- Get a Free License
C++ API to Find and Replace Text in PDF files
Aspose.PDF for C++ is a C++ library for working with PDF files. It provides a bunch of features that help you automate various aspects of your PDF workflows. One such feature is finding and replacing text in PDF files. You can either install the API through NuGet or download it directly from the downloads section.
PM> Install-Package Aspose.PDF.Cpp
Find and Replace Text in PDF using C++
Aspose.PDF for C++ provides the TextFragmentAbsorber class for searching text in PDF documents. You initialize this class with the text you want to find and use it to retrieve all the matching text fragments. Once all the fragments are available, you loop over them and replace the text. The following are the steps to find and replace text in PDF files using C++.
- Load the PDF file using the Document class.
- Create an instance of the TextFragmentAbsorber class and initialize it with the text that you want to find in the PDF file.
- Accept the TextFragmentAbsorber for the pages using the Document->get_Pages()->Accept (System::SharedPtrText::TextFragmentAbsorber visitor) method.
- Retrieve all text occurrences using the TextFragmentAbsorber->get_TextFragments() method.
- Loop through the TextFragmentCollection and update the text using the TextFragment->set_Text (System::String value) method.
- Save the updated PDF file using the Document->Save (System::String outputFileName) method.
The following is the sample code to find and replace text in the whole PDF file using C++.
C++ Find and Replace Text in a Specific PDF Page
There might be situations where you only want to find and replace text on a specific page rather than the whole document. For this, accept the TextFragmentAbsorber object for the page where you want to replace the text. The following are the steps to find and replace text on a particular page in the PDF document.
- Load the PDF file using the Document class.
- Create an instance of the TextFragmentAbsorber class and initialize it with the text that you want to find in the PDF file.
- Accept the TextFragmentAbsorber for the particular page using the Document->get_Pages()->idx_get (int32_t index)->Accept (System::SharedPtrText::TextFragmentAbsorber visitor) method.
- Retrieve all text occurrences using the TextFragmentAbsorber->get_TextFragments() method.
- Loop through the TextFragmentCollection and update the text using the TextFragment->set_Text (System::String value) method.
- Save the updated PDF file using the Document->Save (System::String outputFileName) method.
The following is the sample code to find and replace text on a specific PDF page using C++.
Replace Text in PDF Page Region using C++
Instead of searching the whole page, you can specify the region of the page where you want to replace the text. For this, the API provides the Rectangle class. The following are the steps to find and replace text in a specific part of the PDF page.
- Load the PDF file using the Document class.
- Create an instance of the TextFragmentAbsorber class and initialize it with the text that you want to find and replace in the PDF file.
- Set the page region for searching using TextFragmentAbsorber->get_TextSearchOptions()->set_Rectangle (System::SharedPtr< Aspose::Pdf::Rectangle> value) method.
- Accept the TextFragmentAbsorber for the particular page using the Document->get_Pages()->idx_get (int32_t index)->Accept (System::SharedPtrText::TextFragmentAbsorber visitor) method.
- Retrieve all text occurrences using the TextFragmentAbsorber->get_TextFragments() method.
- Loop through the TextFragmentCollection and update the text using the TextFragment->set_Text (System::String value) method.
- Save the updated PDF file using the Document->Save (System::String outputFileName) method.
The following is the sample code to find and replace text in a specific PDF page region.
Find and Replace Text in PDF files using Regular Expressions
Aspose.PDF for C++ also provides the ability to search text using regular expressions. With regular expressions, you can find text like email addresses or phone numbers, etc. For this, you have to specify the regular expression instead of the search string and use the TextSearchOptions class to indicate that you are using a regular expression for searching. The following are the steps to find and replace text in PDF files using a regular expression.
- Load the PDF file using the Document class.
- Create an instance of the TextFragmentAbsorber class and initialize it with the regular expression you want to use.
- Initialize the TextSearchOptions class and pass true to its constructor. It will indicate that you are searching using a regular expression.
- Assign the TextSearchOptions object to the TextFragmentAbsorber class using TextFragmentAbsorber->set_TextSearchOptions (System::SharedPtrAspose::Pdf::Text::TextSearchOptions value) method.
- Accept the TextFragmentAbsorber for the pages using the Document->get_Pages()->Accept (System::SharedPtrText::TextFragmentAbsorber visitor) method.
- Retrieve all text occurrences using the TextFragmentAbsorber->get_TextFragments() method.
- Loop through the TextFragmentCollection and update the text using the TextFragment->set_Text (System::String value) method.
- Save the updated PDF file using the Document->Save (System::String outputFileName) method.
The following is the sample code to find and replace text in PDF files using a regular expression.
Get a Free License
You can try the API without evaluation limitations by requesting a free temporary license.
Conclusion
In this article, you have learned how to find and replace text in PDF files using C++. You have seen how to replace text in the whole PDF document, a specific PDF page, or a particular region of the page. Furthermore, you have learned how to search and replace text using a regular expression. Aspose.PDF for C++ is a powerful API with many additional features that make working with PDF documents a piece of cake. You can explore the API in detail by using the official documentation. If you have any questions, please feel free to contact us on the forum.