Download multiple pdf files from website

amala · December 4, 2020, 1:16am

Hi Team,

I am using Katalon for the 1st time. Please bear with me.

Can anyone point me at steps on how to automatically download all pdf files from a website.

kazurayam · December 4, 2020, 1:57am

First you should try search this forum with key "download file". You will find plenty of previous posts to read.

Timo_Kuisma1 · December 4, 2020, 4:06pm

hello,

check this

Timo_Kuisma1 · December 4, 2020, 6:22pm

ok,
here is code how to download all .pdf links from the page

download Jsoup .jar from

and pdfBox .jar
https://pdfbox.apache.org/download.cgi

copy them to Katalon project Drivers folder

TESCASE:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

//get all pdf links from page
def basicUrlrl = 'http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/'
List<String> linksArray = new ArrayList<>()
linksArray = readAllPdfLinks(basicUrlrl)
println linksArray

//read pdf contents
String pdf = readPdfFile(linksArray.get(0)) // in this example only one .pdf link in page
println (pdf)
Assert.assertTrue(pdf.contains('Open the setting.xml, you can see it is like this:'))
Assert.assertTrue(pdf.contains('Please add the following sentence in setting.xml before'))
Assert.assertTrue(pdf.contains('You can see that I have modified the setting.xml, and if open the file in IE, it is like this:'))

/*
parameter basic url
return list of .pdf links
*/
public List<String> readAllPdfLinks(def url){
	
	List<String> linksArray = new ArrayList<>()
	
	Document doc = Jsoup.connect(url).get();
	print((doc.title()));

	Elements links = doc.select("a[href]");

	println("\nLinks: "+ links.size());
	for (Element link : links) {
		if (link.absUrl("href").contains(".pdf")) {
			println("a: "+link.attr("abs:href"));
			linksArray.add(link.attr("abs:href"))
		}
	}
	return linksArray
	
}

/*
parameter pdf url
return pdf content
*/
public String readPdfFile(String pdfUrl){
	
	URL TestURL = new URL(pdfUrl);
	BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
	PDDocument doc = PDDocument.load(bis);
	String pdfText = new PDFTextStripper().getText(doc);
	doc.close();
	bis.close();
	println(pdfText);
	return pdfText;
}

squreshi1500 · October 20, 2025, 2:50am

Hi, yes it is easy. Just download Bulk PDF Downloader Chrome Extension from chrome webstore. You can download multiple pdfs from any website in few seconds.

Topic		Replies	Views
[KShare] Using Katalon Studio to read PDF files directly on a webpage Kshare start-page , katalon-studio , support , gatedknowledge , jordan-bartley	4	1352	January 24, 2024
Taking screenshots from PDF file with Apache PDFBox Web Testing katalon-studio	25	4726	January 7, 2020
Is it possible to read pdf file from url ex : "http://site.com/thispdf" Web Testing katalon-studio , web-testing	6	3945	March 22, 2019
Reading PDF in Chrome Browser Tab and verify text Web Testing katalon-studio	2	2234	September 29, 2020
Is this possible to use Katalon to download a file? Web Testing katalon-studio	15	8201	October 27, 2023

Download multiple pdf files from website

Related topics