Extract Text from PDF C#

PDF format is widely used to create read-only documents for sharing and printing. Generally, the PDF documents contain images along with text and in certain cases, you may need to extract these images while parsing the PDFs. In accordance with that, this article covers how to extract images from PDF programmatically in C# .NET.

C# .NET API to Extract Images from PDF - Free Download

In order to extract images from PDF, we will use Aspose.PDF for .NET. It is a powerful API that lets you implement a wide range of PDF generation and manipulation features. In addition, it allows you to parse the PDF and extract images seamlessly. You can either download the API or install it using NuGet.

PM> Install-Package Aspose.PDF

Extracting Images from a PDF in C#

The following are the steps to extract images from a PDF in C#.

  • Load the document using the Document class.
  • Loop through the pages of the PDF document using Document.Pages collection.
  • For each page, access every XImage in the Page.Resources.Images collection.
  • Create a FileStream object for each image and save it as JPEG, PNG, etc.
  • Finally, close the FileStream.

The following code sample shows how to extract images from the PDF.

Get a Free License

You can use Aspose.PDF for .NET without evaluation limitations using a temporary license.

Conclusion

Parsing the PDF files and extracting the text or images could be required in various cases. In this article, you have learned how to extract images from PDF files programmatically in C#. You can explore more about the C# PDF API using the documentation. Also, you can post your queries on our forum.

See Also