How to download PDF files in Katalon Studio? — this seems to be a FAQ that has never been resolved. I would propose a solution here. I have made a GitHub repository that demonstrates it
Problem to solve
On the Internet, some web resources are distributed via URL prefixed with blob: scheme. The original poster found an example blob:https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200. She/he wanted to develop a Katalon Test Case which downloads the PDF file to save into local disk, but the PO experienced a difficulty. The blob: URL requires a special treatment.
What is blob URL, why is it used? — There is a nice Stackoverflow thread that explain it:
Solution
Extract the string as blob URL out of web page, trim off the prepended blob: string. Out of the blob URL, you can extract an URL string that starts with https:// string. Once you can grasp an ordinary https: URL, you can easily consume it.
Description
I have setup a web path for demonstration:
This page contains an HTML fragment as this:
...
<li id="blobURL">blob:https://kazurayam.github.io/ks_reading_pdf_from_blob_url/nisa_guidebook_202307.pdf</li>
...
Please find that this fragment contains a blob URL as a content text of the <li id="blobURL"> element.
I wrote a Katalon Test Case TC1
Read the source code for detail.
Some comments about the tricks in it:
- The script gets a string of “
blob:https://host/resource” out of an HTML element, and extract a string “https://host/resource”. This string manipulation requires fair amount of Groovy programming using thejava.util.regexpackage. - The script download the PDF using the Apache HttpClient API.
- The script does not use Katalon’s built-in
WSkeywords, as it has some bugs to deal with binary files. See http://forum.katalon.com/t/download-the-image-file-via-rest-api/120898/2
By running the Test Case/TC1, I could get the PDF file downloaded and saved into a directory ./output/downloded.pdf, as this
