PDF to HTML – Convert PDF to HTML in Python

PDF files are widely used for data and information sharing as they possess an amazing capability of preserving document formating when viewing the document on various platforms. However, in order to view PDF documents, we need to use a specific PDF viewing application, and also, if the fonts used within the document are not available on certain platforms, the rendering of text inside the document might be compromised. Therefore, you may try using an approach to render PDF files in HTML format. In this article, we are going to further discuss the details on how to convert PDF to HTML in Python.

PDF Processing API

Aspose.PDF Cloud is our REST-based solution offering the capabilities to create, edit or transform PDF files to EPUB, PS, SVG, XLSX, PPTX, DOCX, HTML, and other supported document formats. The amazing point is that all this conversion can be performed with few code lines. To further facilitate our customers, we have created programming language-specific SDKs which are a wrapper around the REST API, so that you get all the capabilities of PDF files processing within the language of your choice. In the following section, we are going to discuss the details regarding PDF to HTML using Aspose.PDF Cloud SDK for Python.

So in order to use the SDK, the first step is its installation, and it’s available for download over PIP and GitHub repository. Execute the following command on the terminal/command prompt to install the latest version of SDK on the system.

pip install asposepdfcloud

In case you need to directly add the reference in your Python project within Visual Studio IDE, please search asposepdfcloud as a package under the Python environment window. Please follow the steps numbered in the image below to complete the installation process.

Aspose.PDF Cloud for Python package
Image 1:- Aspose.PDF Cloud for Python package.

Convert PDF to HTML in Python

Please follow the instructions below to first upload the PDF file to cloud storage and then convert it to HTML format. The resultant file is stored in the same cloud storage.

  • First we need to create an instance of ApiClient class while passing ClinetID nad Client secret details as arguments
  • Secondly, create an object of PdfApi while passing ApiClient object as an argument
  • Thirdly, specify the name of input PDF and resultant output
  • Now call the put_pdf_in_storage_to_html(…) method of PdfApi class to initiate the conversion. Upon successfull conversion, the output is also stored in cloud storage

PDF to HTML using cURL Command

The cURL commands provide an excellent mechanism for accessing REST APIs through the command line terminal. So we can also use it to access Aspose.PDF Cloud API. But before triggering the conversion operation, we need to generate a JWT access token based on client credentials. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once we have the JWT token, we can execute the following command to convert a PDF file available in Cloud storage to HTML format and as a result, the output is returned as a stream repsonse.

curl -v -X GET "https://api.aspose.cloud/v3.0/pdf/awesomeTable.pdf/convert/html?documentType=Xhtml&fixedLayout=true&splitCssIntoPages=false&splitIntoPages=false&fontSavingMode=AlwaysSaveAsTTF" \
-H  "accept: multipart/form-data" \
-H  "authorization: Bearer <JWT Token>" \
-o .\Documents\PDFConversion.zip

In case you need to convert the PDF file to HTML and also want to save the output in cloud storage, please try using the following command.

curl -v -X PUT "https://api.aspose.cloud/v3.0/pdf/completeWorkbook.pdf/convert/html?outPath=converted.html&fixedLayout=true&splitIntoPages=false&outputFormat=Zip" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"
PDF to HTML preview
Image 2:- PDF to HTML conversion preview.

Conclusion

In this article, we have discussed the details related to the conversion of PDF files to HTML format. We have explored an option to either use the python code snippet for conversion operation or, a cURL command to complete the process. Please visit the following link to learn numerous parameters supported by PutPdfInStorageToHtml API. Please note that our cloud SDKs are developed under MIT license, so the complete source code of Aspose.PDF Cloud SDK for Python is available on GitHub. In case you encounter any issues while using the API or you have any further queries, please feel free to contact us through the free product support forum.

Related Articles

We recommend visiting the following links to learn more about