When dealing with PDFs with hyperlinks, updating outdated links, fixing errors, or removing them entirely are all common requirements. While many PDF editors, like Adobe Acrobat, offer these features, manually processing multiple PDF files can still be effort-taking. Fortunately, Python provides a quicker solution. In this guide, we’ll explore how to efficiently edit or remove hyperlinks from PDF files in Python. Whether you’re maintaining professional documents or refining personal projects, these quick tips will streamline the process!
Python Library to Edit or Remove Hyperlinks from PDF
To make this process simpler and quicker, you can try some third-party Python libraries, such as Spire.PDF, Apose, PyPDF2, and some online PDF editors. Among them, Spire.PDF (Spire.PDF for Python) is recommended because of its easy-to-understand methods and safety. Developers can add, edit, and remove hyperlinks without hassle with this tool.
You can install it using the pip command: pip install Spire.PDF
.
How to Edit Hyperlinks in PDF Documents Quickly
Hyperlinks in PDFs enhance navigation, connect related information, and improve reader engagement. However, outdated or incorrect links can undermine the document's professionalism and credibility. In such cases, editing hyperlinks becomes essential. This section will show you how to efficiently edit hyperlinks in PDFs using Python, ensuring your documents remain polished and reliable.
Steps to edit hyperlinks in PDF:
Create a PdfDocument object, and use the PdfDocument.LoadFromFile() method to read a PDF document from the local storage.
Get a page using the PdfDocument.Pages.get_Item() method.
Get all annotations on the page with the PdfPageBase.AnnotationsWidget property.
Access the specified hyperlink and cast it to the PdfTextWebLinkAnnotationWidget object.
Set a new target address for the hyperlink through PdfTextWebLinkAnnotationWidget.Url property.
Save the updated PDF file using the PdfDocument.SaveToFile() method.
Here is the code example of editing the address of the second hyperlink on the first page of a PDF:
from spire.pdf.common import *
from spire.pdf import *
# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Get the first page of the document
page = pdf.Pages.get_Item(0)
# Get all annotations on the page
widgetCollection = page.AnnotationsWidget
# Get the second hyperlink annotation
annotation = widgetCollection.get_Item(1)
# Cast the hyperlink annotation to a PdfTextWebLinkAnnotationWidget object
link = PdfTextWebLinkAnnotationWidget(annotation)
# Set a new target address for the second hyperlink
link.Url = "https://www.mcafee.com/learn/understanding-trojan-viruses-and-how-to-get-rid-of-them/"
# Save the document
pdf.SaveToFile("output/ModifyPDFHyperlink.pdf")
# Close the document
pdf.Close()
How to Remove Hyperlinks from PDF Instantly
When you need to print PDFs or read them offline, hyperlinks sometimes become unnecessary, or even disruptive. At that time, deleting these links provides a clean look. In this section, we will go through how to remove hyperlinks from PDF files instantly using Python, making it easy and fast.
Steps to batch remove all hyperlinks from PDFs:
Create a PdfDocument object, and use the PdfDocument.LoadFromFile() method to read a PDF document from the local storage.
Loop through all pages in the PDF file.
Get the current page using the PdfDocument.Pages.get_Item() method.
Retrieve the annotations collection on the page through PdfPageBase.AnnotationsWidget property.
Loop through annotations and check if each of them is of PdfTextWebLinkAnnotationWidget class, if yes, remove it with the PdfAnnotationCollection.Remove() method.
Save the modified PDF document using the PdfDocument.SaveToFile() method.
Below is a code example demonstrating how to remove all hyperlinks from a PDF. The comments within the code also explain the process for removing specified hyperlinks:
from spire.pdf import *
from spire.pdf.common import *
# Create an object of PdfDocument class and load a PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")
# Remove the first hyperlink on the first page
#page = pdf.Pages.get_Item(0)
#page.AnnotationsWidget.RemoveAt(0)
# Remove all hyperlinks
# Loop through the pages in the document
for j in range(pdf.Pages.Count):
# Get each page
page = pdf.Pages.get_Item(j)
# Get the annotations on each page
annotations = page.AnnotationsWidget
# Check if there is any annotations on a page
if annotations.Count > 0:
# Loop through the annotations
i = annotations.Count - 1
while i >= 0:
# Get an annotation
annotation = annotations.get_Item(i)
# Check if each annotation is a hyperlink
if isinstance(annotation, PdfTextWebLinkAnnotationWidget):
# Remove hyperlink annotations
annotations.Remove(annotation)
i -= 1
# Save the document
pdf.SaveToFile("output/RemovePDFHyperlink.pdf")
# Release the resource
pdf.Close()
The Bottom Line
This page provides a comprehensive guide on editing or removing hyperlinks from PDF documents. Each section includes detailed steps and practical code examples for your reference. By the end of this article, you'll discover how simple and straightforward it is to manage hyperlinks in PDFs!
ALSO READ
[Python] How to Add Hyperlink to PDF without Acrobat | Detailed Instructions
Extract Hyperlinks from Word Documents with Python [Latest Guide]
How to Add, Edit, and Delete Bookmarks in PDF with Python [Full Guide]
How to Highlight Text in PDF with Python: Step-by-step Instructions