{{< figure align=center src=“images/Convert-HTML-Text.png” alt=“Convert Extract HTML Text”>}}
HTML is a markup language widely used in websites and web applications. HTML content is organized with tags. Using C#, you can easily convert HTML to plain text by ignoring opening and closing tags. You can also skip CSS, JavaScript, or any other sections as needed. This process extracts text from an HTML document. Below are the sections you can explore:
- HTML to Plain Text Converter in C#
- Convert HTML to Text File using INodeIterator in C#
- Extract Text from HTML with Different Approaches using C#
- Convert URL Webpage HTML to Text using C#
HTML to Plain Text Converter in C#
You can convert HTML to Plain Text with Aspose.HTML for .NET API. It can easily be configured in .NET Framework-based environment using the following NuGet installation command:
PM> Install-Package Aspose.Html
Convert HTML to Text File using INodeIterator in C#
Aspose.HTML for .NET uses the DOM, allowing you to iterate nodes with the INodeIterator interface. You can define a NodeFilter to exclude style, script, or other elements and extract only the text. Follow these steps to convert HTML to a plain TXT file using C#:
- Read input HTML file
- Initialize the instance of node iterator
- Create INodeIterator instance
- Check for Style Filter
- Read Node value in a string
- Write Text contents of HTML as TXT file
The code below shows how to convert HTML to Plain Text file using C#:
{{< gist aspose-com-gists 1608e3c92e0a1838dac8a0c351567969 “NodeIterator.cs” >}}
Extract Text from HTML with Different Approaches using C#
We covered conversion with INodeIterator. You can also extract text using custom methods or the TextContent property. Follow these steps:
- Load input HTML document
- Define a user-defined method
- Check each NodeType to see if it’s an element node or text node
- Get text elements using TextContent property
- Save output TXT file
The code snippet below explains these two different approaches for converting HTML to Plain Text in C#:
{{< gist aspose-com-gists 1608e3c92e0a1838dac8a0c351567969 “Additional.cs” >}}
Convert URL Webpage HTML to Text using C#
We previously converted offline HTML files to text. Now you can convert a webpage directly from its URL without saving the file. Use C# to download the page and create a TXT file. For example, convert the Aspose.HTML for .NET product page to TXT using these steps:
- Initialize HTMLDocument object and specify the URL
- Read the text contents of the HTML format
- Write the TXT file with extracted text from webpage
The code below shows how to convert URL Webpage HTML to Text using C#:
{{< gist aspose-com-gists 1608e3c92e0a1838dac8a0c351567969 “WebpageHTMLtoTXT.cs” >}}
Conclusion
This article shows how to convert HTML files to plain text, extract text from HTML, and convert a webpage URL to a TXT file using C#. Use any of the approaches and contact us via the Free Support Forum for help.
See Also
Tip: You may be interested in a free Text to GIF Converter that allows you to generate animations from texts.