Is it possible to copy content of BLOB Pdf file (open in new tab) to newly created excel sheet using Katalon?

jkaur · July 15, 2025, 2:35am

I am working on a script where clicking on a link opens a BLOB PDF file in new tab of window
Now I want to create a new excel sheet and copy the content of BLOB PDF file to the newly created excel sheet.
So far I am able to download the PDF file to system but I dont know why but my script is unable to locate it and open it.
Can someone help me with How to Open and copy paste the content od PDF to Excel sheet part of code.

dineshh · July 15, 2025, 6:33am

To automate copying content from a BLOB PDF to an Excel sheet using Katalon, follow these steps:

1. Download the PDF File

Use Katalon’s built-in keywords to download the PDF:

groovy

// Set download directory
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'

// Create directory if not exists
new File(downloadDir).mkdirs()

// Trigger download (adjust locator as needed)
WebUI.click(findTestObject('your_pdf_link_object'))

// Wait for download to complete (adjust timeout as needed)
WebUI.delay(30)

2. Locate the Downloaded PDF

Find the most recent PDF file in the download directory:

groovy

import org.apache.commons.io.comparator.LastModifiedFileComparator

File downloadDir = new File(downloadDir)
File[] files = downloadDir.listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)

if (files.length == 0) {
    throw new Exception("No PDF found in download directory")
}

// Sort by last modified and get latest file
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]

3. Extract Text from PDF

Use Apache PDFBox (add JARs to Katalon’s drivers folder):

groovy

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper

String pdfText = ""
PDDocument document = PDDocument.load(pdfFile)
PDFTextStripper stripper = new PDFTextStripper()
pdfText = stripper.getText(document)
document.close()

4. Create Excel Sheet and Paste Content

Use Katalon’s Excel keywords:

groovy

import com.kms.katalon.keyword.excel.ExcelKeywords

// Create new Excel file
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)

// Create a sheet
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])

// Write PDF text to Excel (each line in a row)
String[] lines = pdfText.split('\n')
for (int i = 0; i < lines.size(); i++) {
    ExcelKeywords.setValueToCellByAddress(excelPath, 'PDF_Content!A' + (i+1), lines[i])
}

5. Full Workflow Example

groovy

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import com.kms.katalon.keyword.excel.ExcelKeywords
import org.apache.commons.io.comparator.LastModifiedFileComparator

// Download PDF
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'
new File(downloadDir).mkdirs()
WebUI.click(findTestObject('pdf_link_object'))
WebUI.delay(30)

// Find latest PDF
File[] files = new File(downloadDir).listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)
if (!files) {
    throw new Exception("PDF download failed")
}
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]

// Extract text
String pdfText = new PDFTextStripper().getText(PDDocument.load(pdfFile))

// Write to Excel
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])
pdfText.split('\n').eachWithIndex { line, idx ->
    ExcelKeywords.setValueToCellByAddress(excelPath, "PDF_Content!A${idx+1}", line)
}

Key Troubleshooting Tips:

PDF Download Issues:

Ensure no conflicting Firefox/Chrome download dialogs exist (disable them via Katalon’s browser preferences).
Add explicit waits (WebUI.delay()) if downloads are slow.

PDFBox Setup:

Download PDFBox JARs and dependencies (fontbox, commons-logging, etc.).
Place all JARs in your Katalon project’s drivers folder.

Text Extraction:

Works best with text-based PDFs (not scanned images).
For tabular data, consider libraries like Tabula (requires additional setup).

Excel Path:

Use absolute paths (e.g., 'C:/data/export.xlsx') to avoid path resolution issues.

Elly_Tran · July 23, 2025, 4:22am

Hi @jkaur,

Welcome to our community . Have you tried the suggestions from @dineshh? Please let us know if you solve/not. Thank you

Topic		Replies	Views
Unable to download pdf from blob url Miscellaneous	4	2956	March 6, 2024
Is Katalon Studio Right for My Usecase? Web Testing katalon-studio	1	373	July 30, 2020
[KShare] How to read PDF files when working with "blob:https://" web pages Kshare katalon-studio , web-testing , support , gatedknowledge , jordan-bartley	1	633	March 12, 2024
How to Compare Contents in Files in Katalon Studio Web Testing katalon-studio	2	853	December 26, 2020
Get values of the PDF Web Testing katalon-studio	3	1102	September 27, 2023

Is it possible to copy content of BLOB Pdf file (open in new tab) to newly created excel sheet using Katalon?

1. Download the PDF File

2. Locate the Downloaded PDF

3. Extract Text from PDF

4. Create Excel Sheet and Paste Content

5. Full Workflow Example

Key Troubleshooting Tips:

Related topics