Unable to download pdf from blob url

Hi All,

Need urgent help to download a pdf file from blob url.
When i click on the button to access the file,a new tab is opened for the pdf with blob url. something like this:
blob:https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200

I’m unable to save the file using keyword simulation also, Katalon is not recognizing the keystrokes.
Have tried many solutions, but nothing helped.
Please help.

1 Like

Hi,

I found this similar discussion: Read PDF having URL starting with blob - #13 by kazurayam. Can you please take a look?

I don’t think @renuka.srivastav need to dig into my previous post about "data:" URL. No need to work on java.net.URL class for "blob:" URL.


@renuka.srivastav has a string somewhere in the target web page something like:

blob:https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200

Then her/his test script should be able to extract the string to save it into a Groovy variable. The script wants to strip the prefix “blob:” off. Then it will get a string as https URL:

https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200

With this https: URL, the test script should be able to get access to the resource easily.

I would recommend Apache HttpClient API to download and save a binary file like PDF and PNG. Katalon’s built-in keywords WS.* can not save binary files properly. See Download the image file via Rest API - #3 by kazurayam for a sample code.

How to download PDF files in Katalon Studio? — this seems to be a FAQ that has never been resolved. I would propose a solution here. I have made a GitHub repository that demonstrates it


Problem to solve

On the Internet, some web resources are distributed via URL prefixed with blob: scheme. The original poster found an example blob:https://uncd215.duckcreekondemand.com/7c3bc4d2-74a4-4289-a645-6241c622f200. She/he wanted to develop a Katalon Test Case which downloads the PDF file to save into local disk, but the PO experienced a difficulty. The blob: URL requires a special treatment.

What is blob URL, why is it used? — There is a nice Stackoverflow thread that explain it:

Solution

Extract the string as blob URL out of web page, trim off the prepended blob: string. Out of the blob URL, you can extract an URL string that starts with https:// string. Once you can grasp an ordinary https: URL, you can easily consume it.

Description

I have setup a web path for demonstration:

This page contains an HTML fragment as this:

    ...
    <li id="blobURL">blob:https://kazurayam.github.io/ks_reading_pdf_from_blob_url/nisa_guidebook_202307.pdf</li>
    ...

Please find that this fragment contains a blob URL as a content text of the <li id="blobURL"> element.

I wrote a Katalon Test Case TC1

Read the source code for detail.

Some comments about the tricks in it:

  1. The script gets a string of “blob:https://host/resource” out of an HTML element, and extract a string “https://host/resource”. This string manipulation requires fair amount of Groovy programming using the java.util.regex package.
  2. The script download the PDF using the Apache HttpClient API.
  3. The script does not use Katalon’s built-in WS keywords, as it has some bugs to deal with binary files. See Download the image file via Rest API - #2 by kazurayam

By running the Test Case/TC1, I could get the PDF file downloaded and saved into a directory ./output/downloded.pdf, as this

success

Hi everyone,

Thanks for all your time and effort towards solving this issue, I was able to download the pdf successfully by making some changes in the browser preference as per the below thread:

Thanks again.