Redact PDF files
PDF files are commonly used for sharing documents, such as legal contracts, financial statements, or medical records, due to their secure and reliable format. However, these files can also contain sensitive information that needs to be kept confidential. If you need to share a PDF file that contains sensitive data, redaction is the best way to protect it. Redaction is a process of removing or blacking out the sensitive information from the document while keeping the rest of the content intact. In this blog post, we will show you how to redact PDF files using Python.

PDF Processing API

Aspose.PDF Cloud SDK for Python is an excellent tool for redacting PDF files online. It’s a cloud-based REST API that offers various features for working with PDF documents, such as creating, converting, and manipulating PDF files. Using this SDK, you can easily redact sensitive information from your PDF files online without having to install any additional software on your computer.

It offers several benefits over traditional redaction methods. For instance, the API’s redaction feature is faster and more accurate than manual redaction. It also ensures that the sensitive information is permanently removed from the document, preventing unauthorized access to the information.

Now the first step is to install the SDK, which is available for download over PIP and GitHub repository. Please execute the following command in the terminal to complete the installation.

pip install asposepdfcloud

PyCharm IDE

If you are using PyCharm IDE, you may directly add the SDK as a dependency in your project.

File -> Settings -> Project -> Python Interpreter -> asposepdfcloud

Image 1:- PyCharm settings option.

Image 1:- PyCharm settings option.

Aspose.PDF Python package

Image 2:- Aspose.Pdf Cloud Python Package.

After the installation, the next major step is to obtain client credentials from Dashboard. In case you do not have an account, simply Sign Up using create a new account option.

Redact PDF using Python

Please follow the instructions given below to redact PDF content using Python code snippet:

  • Create an instance of ApiClient by passing client credentials as arguments.
  • Now initialize PdfApi while passing ApiClient object as an argument.
  • Create an object of RedactionAnnotation and call post_page_redaction_annotations(..) method of PdfApi to accomplish our requirements.
def redactPDF():
try:
#Client credentials
client_secret = "1c9379bb7d701c26cc87e741a29987bb"
client_id = "bbf94a2c-6d7e-4020-b4d2-b9809741374e"
#initialize PdfApi client instance using client credetials
pdf_api_client = asposepdfcloud.api_client.ApiClient(client_secret, client_id)
# create PdfApi instance while passing PdfApiClient as argument
pdf_api = PdfApi(pdf_api_client)
# input PDF file
input_file = 'marketing.pdf'
# create an instance of RedactAnnotation class
redactAnnotation = asposepdfcloud.models.RedactionAnnotation()
redactAnnotation.contents = 'Confidential'
# set the color details for Annotation object
redactAnnotation.color = asposepdfcloud.Color(a = 0, r = 66, g = 111, b = 245)
# set the modify date for Annotation
redactAnnotation.modified = '01/01/2018 12:00:00.000 AM'
redactAnnotation.id = 1
# set annotation flag as default
redactAnnotation.flags = [asposepdfcloud.models.AnnotationFlags.DEFAULT]
redactAnnotation.name = 'redactName'
# specify the rectangular region for Annotation over page
redactAnnotation.rect = asposepdfcloud.models.Rectangle(llx = 20, lly = 700, urx = 220, ury = 650 )
redactAnnotation.page_index = 1
# ZIndex factor for annotation
redactAnnotation.z_index = 1
# set vertical and horizontal alignment as Center
redactAnnotation.horizontal_alignment = asposepdfcloud.models.HorizontalAlignment.CENTER
redactAnnotation.vertical_alignment = asposepdfcloud.models.HorizontalAlignment.CENTER
# point details for redaction annotation
redactAnnotation.quad_point = [
asposepdfcloud.models.Point(5, 40),
asposepdfcloud.models.Point(10, 60)
]
# Annotation fill color details
redactAnnotation.fill_color = asposepdfcloud.Color(a = 10, r = 50, g = 168, b = 182)
# Overlay text to be printed on redaction annotation
redactAnnotation.overlay_text = 'Confidential Data'
# repeat the annotation occurance
redactAnnotation.repeat = True
# set the text alignment information as Left aligned
redactAnnotation.text_alignment = asposepdfcloud.models.HorizontalAlignment.LEFT
# call the API to add redaction annotation to first page of document
response = pdf_api.post_page_redaction_annotations(name = input_file, page_number= 1, annotations= [redactAnnotation])
# print response code in console
print(response)
# print message in console (optional)
print('Redaction Annotation successfully added to PDF document !')
except ApiException as e:
print("Exception while calling PdfApi: {0}".format(e))
print("Code:" + str(e.code))
print("Message:" + e.message)

Blackout PDF Content using cURL Commands

With the cURL command and Aspose.PDF Cloud, redacting PDF files has become easier than ever before. The Aspose.PDF Cloud is a RESTful API that can be used with multiple programming languages, including cURL command. You can easily redact sensitive information from PDF files by blacking out text or removing it altogether. The API is secure, reliable, and scalable, making it an ideal choice for businesses of all sizes.

Now the first step is to execute the following command to generate the accessToken.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once we have accessToken, please execute the following command to redact information in PDF document at specified rectangular region (“LLX”: 20, “LLY”: 700, “URX”: 220, “URY”: 650). After the successful operation, the resultant file is saved to cloud storage.

curl -v -X POST "https://api.aspose.cloud/v3.0/pdf/{inputPDF}/pages/1/annotations/redaction?apply=true" \
-H  "accept: application/json" \
-H  "authorization: Bearer {accessToken}" \
-H  "Content-Type: application/json" \
-d "[  {    \"Color\": {      \"A\": 0,      \"R\": 158,      \"G\": 50,      \"B\": 168    },    \"Contents\": \"Confidential\",    \"Modified\": \"01/18/2022 12:00:00.000 AM\",    \"Id\": \"1\",    \"Flags\": [      \"Default\"    ],    \"Name\": \"Name\",    \"Rect\": {      \"LLX\": 20,      \"LLY\": 700,      \"URX\": 220,      \"URY\": 650    },    \"PageIndex\": 1,    \"ZIndex\": 1,    \"HorizontalAlignment\": \"CENTER\",    \"VerticalAlignment\": \"CENTER\",    \"QuadPoint\": [      {        \"X\": 5,        \"Y\": 10      }    ],    \"FillColor\": {      \"A\": 10,      \"R\": 50,      \"G\": 168,      \"B\": 182    },    \"BorderColor\": {      \"A\": 10,      \"R\": 168,      \"G\": 50,      \"B\": 141    },    \"OverlayText\": \"Sensitive data\",    \"Repeat\": true,    \"TextAlignment\": \"Left\"  }]"

Replace {inputPDF} with the name of PDF file available in cloud storage and {accessToken} with the access token generated above.

Conclusion

In conclusion, redacting PDF files is a critical task to protect sensitive information from being disclosed. Whether you choose to use Python or cURL command with Aspose.PDF Cloud, the process has become simpler and more efficient with the availability. So, whether you are a legal professional, a medical practitioner, or a financial analyst, learning how to redact PDF files using Python can help you protect your confidential information and comply with data protection regulations.

We highly recommend visiting the following articles to learn about: