Extract PDF attachments

The PDF file is comprised of text and graphics and it may contain entire files inside them as attachments. This makes exchanging sets of documents easier and more reliable. The Attachments pane provides you a central place to view, insert, delete, and export attachments. Also please note that if you move the PDF file to a new location, the attachments inside it also move with it. The attachments may include links to or from the parent document or to other attachments. Please do not confuse the attached comments with file attachments.

In this article, we are going to discuss the details on how we can read the information regarding attachments in PDF documents as well as, how we can download attachments from PDF using Python. No particular software download and installation is required and perform all required operations in the cloud.

PDF Processing API

Aspose.PDF Cloud is our REST-based API offering the capabilities to create, edit and transform various formats to PDF as well as render PDF files to formats including XLSX, PPTX, DOCX, EPUB, HTML, JPEG, etc and much more. Owing to REST architecture, the API can be accessed on any platform. Therefore, you can implement PDF processing capabilities in Desktop, Web, Mobile, Cloud, and Hybrid applications.

Python SDK for PDF Processing

In order to further facilitate our customers, we have created specific programming SDKs so that you get all PDF processing capabilities within the language of your choice. Similarly, to facilitate Python developers, we have created Python programming SDK named Aspose.PDF Cloud SDK for Python which is a wrapper around Aspose.PDF Cloud API. Now to get started, the first step is its installation. The SDK is available for free download over PIP and GitHub repository. Now execute the following command on the terminal/command prompt to install the latest version of SDK on the system.

pip install asposepdfcloud

PyCharm IDE

If you are using PyCharm IDE, you may directly add the SDK as a dependency in your project.

File -> Settings -> Project -> Python Interpreter -> asposepdfcloud

Image 1:- PyCharm settings option.

Image 1:- PyCharm settings option.

Aspose.PDF Python package

Image 2:- Aspose.Pdf Cloud Python Package.

Free Cloud Dashboard Account

After the installation, the next major step is a free subscription to our cloud services via Aspose.Cloud dashboard. The purpose of this subscription is to only allow authorized persons to access our file processing services. If you have GitHub or Google account, simply Sign Up or, click on the Create a new Account button and provide the required information. Now login to the dashboard using credentials and expand the Applications section from the dashboard and scroll down towards the Client Credentials section to see Client ID and Client Secret details.

Client credentials

Image 3:- Client Credentials on Aspose.Cloud Dashboard.

Read Attachments Information from PDF

Please follow the instructions below to read the information of attachments within the PDF document.

  • Firstly, we need to create an instance of ApiClient class while providing Client ID Client Secret as arguments
  • Secondly, create an instance of PdfApi class which takes ApiClient object as input argument
  • Now call the get_document_attachments(…) method of PdfApi to fetch PDF attachment details

For your reference, the sample PDF document used in the above example can be downloaded from PdfWithEmbeddedFiles.pdf.

Read Specific Attachment Information

The API also facilitates us to read the information of particular attachments in the PDF document. For this purpose, please try using GetDocumentAttachmentByIndex method. Please see below the details of the first attachment retrieved.

{'attachment': {'check_sum': '33DCE2EE8BD095A3C4E2A67058104D35',
                'creation_date': '11/24/2008 02:02:36.000 PM',
                'description': None,
                'links': [{'href': '/PdfWithEmbeddedFiles.pdf/attachments/1',
                           'rel': 'self',
                           'title': None,
                           'type': None}],
                'mime_type': 'application/pdf',
                'modification_date': '05/03/2007 10:37:41.000 AM',
                'name': 'example1.pdf',
                'size': 10984},
 'code': 200,
 'status': 'OK'}

Download Specific Attachment from PDF

Please follow the instructions specified below to download specific attachments from the PDF document.

  • Firstly, we need to create an instance of ApiClient class while providing Client ID Client Secret as arguments
  • Secondly, create an instance of PdfApi class which takes ApiClient object as input argument
  • Now call the GetDownloadDocumentAttachmentByIndex(…) method to download the attachment from PDF file

Read Attachment Information using cURL Command

The REST APIs are easily be accessed via cURL commands and we can use a simple terminal application to execute them. Since Aspose.PDF Cloud is developed as per REST architecture, we can also access them via cURL commands. However, as a pre-requisite, we need to generate a JSON Web Token (JWT) based on your individual client credentials specified over Aspose.Cloud dashboard. It is mandatory because our APIs are only accessible to registered users. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=bbf94a2c-6d7e-4020-b4d2-b9809741374e&client_secret=1c9379bb7d701c26cc87e741a29987bb" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Now that we have the JWT token, we can execute the following command to fetch 7information of specific attachments from PDF documents.

curl -v -X GET "https://api.aspose.cloud/v3.0/pdf/PdfWithEmbeddedFiles.pdf/attachments/1" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"

Download Specific Attachment using cURL Command

Please execute the following command to download 2nd attachment of the PDF file and save it over the local drive.

curl -v -X GET "https://api.aspose.cloud/v3.0/pdf/PdfWithEmbeddedFiles.pdf/attachments/2/download" \
-H  "accept: multipart/form-data" \
-H  "authorization: Bearer <JWT Token>" \
-o Attachment.txt

Conclusion

This article has explained the details and steps on how we can read and download attachments from PDF documents. We have explored the steps using Python code snippets as well as using the cURL commands. Apart from attachments processing, the API also provides a plethora of features related to other elements of PDF files and their details can be found in the Developer Guide. Furthermore, the complete source code of Apsose.PDF Cloud SDK for Python has been made available for download over GitHub. In case you encounter any issues while using the API or you have any further queries, please feel free to contact us via the Free product support forum.

We also recommend visiting the following links to learn more about