Unable to download pdf from blob url

How to download PDF files in Katalon Studio? — this seems to be a FAQ that has never been resolved. I would propose a solution here. I have made a GitHub repository that demonstrates it


Problem to solve

On the Internet, some web resources are distributed via URL prefixed with blob: scheme. The original poster found an example blob:https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200. She/he wanted to develop a Katalon Test Case which downloads the PDF file to save into local disk, but the PO experienced a difficulty. The blob: URL requires a special treatment.

What is blob URL, why is it used? — There is a nice Stackoverflow thread that explain it:

Solution

Extract the string as blob URL out of web page, trim off the prepended blob: string. Out of the blob URL, you can extract an URL string that starts with https:// string. Once you can grasp an ordinary https: URL, you can easily consume it.

Description

I have setup a web path for demonstration:

This page contains an HTML fragment as this:

    ...
    <li id="blobURL">blob:https://kazurayam.github.io/ks_reading_pdf_from_blob_url/nisa_guidebook_202307.pdf</li>
    ...

Please find that this fragment contains a blob URL as a content text of the <li id="blobURL"> element.

I wrote a Katalon Test Case TC1

Read the source code for detail.

Some comments about the tricks in it:

  1. The script gets a string of “blob:https://host/resource” out of an HTML element, and extract a string “https://host/resource”. This string manipulation requires fair amount of Groovy programming using the java.util.regex package.
  2. The script download the PDF using the Apache HttpClient API.
  3. The script does not use Katalon’s built-in WS keywords, as it has some bugs to deal with binary files. See http://forum.katalon.com/t/download-the-image-file-via-rest-api/120898/2

By running the Test Case/TC1, I could get the PDF file downloaded and saved into a directory ./output/downloded.pdf, as this