Is it possible to copy content of BLOB Pdf file (open in new tab) to newly created excel sheet using Katalon?

I am working on a script where clicking on a link opens a BLOB PDF file in new tab of window
Now I want to create a new excel sheet and copy the content of BLOB PDF file to the newly created excel sheet.
So far I am able to download the PDF file to system but I dont know why but my script is unable to locate it and open it.
Can someone help me with How to Open and copy paste the content od PDF to Excel sheet part of code.

2 Likes

To automate copying content from a BLOB PDF to an Excel sheet using Katalon, follow these steps:

1. Download the PDF File

Use Katalon’s built-in keywords to download the PDF:

groovy

// Set download directory
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'

// Create directory if not exists
new File(downloadDir).mkdirs()

// Trigger download (adjust locator as needed)
WebUI.click(findTestObject('your_pdf_link_object'))

// Wait for download to complete (adjust timeout as needed)
WebUI.delay(30)

2. Locate the Downloaded PDF

Find the most recent PDF file in the download directory:

groovy

import org.apache.commons.io.comparator.LastModifiedFileComparator

File downloadDir = new File(downloadDir)
File[] files = downloadDir.listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)

if (files.length == 0) {
    throw new Exception("No PDF found in download directory")
}

// Sort by last modified and get latest file
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]

3. Extract Text from PDF

Use Apache PDFBox (add JARs to Katalon’s drivers folder):

groovy

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper

String pdfText = ""
PDDocument document = PDDocument.load(pdfFile)
PDFTextStripper stripper = new PDFTextStripper()
pdfText = stripper.getText(document)
document.close()

4. Create Excel Sheet and Paste Content

Use Katalon’s Excel keywords:

groovy

import com.kms.katalon.keyword.excel.ExcelKeywords

// Create new Excel file
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)

// Create a sheet
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])

// Write PDF text to Excel (each line in a row)
String[] lines = pdfText.split('\n')
for (int i = 0; i < lines.size(); i++) {
    ExcelKeywords.setValueToCellByAddress(excelPath, 'PDF_Content!A' + (i+1), lines[i])
}

5. Full Workflow Example

groovy

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import com.kms.katalon.keyword.excel.ExcelKeywords
import org.apache.commons.io.comparator.LastModifiedFileComparator

// Download PDF
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'
new File(downloadDir).mkdirs()
WebUI.click(findTestObject('pdf_link_object'))
WebUI.delay(30)

// Find latest PDF
File[] files = new File(downloadDir).listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)
if (!files) {
    throw new Exception("PDF download failed")
}
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]

// Extract text
String pdfText = new PDFTextStripper().getText(PDDocument.load(pdfFile))

// Write to Excel
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])
pdfText.split('\n').eachWithIndex { line, idx ->
    ExcelKeywords.setValueToCellByAddress(excelPath, "PDF_Content!A${idx+1}", line)
}

Key Troubleshooting Tips:

  1. PDF Download Issues:
  • Ensure no conflicting Firefox/Chrome download dialogs exist (disable them via Katalon’s browser preferences).
  • Add explicit waits (WebUI.delay()) if downloads are slow.
  1. PDFBox Setup:
  • Download PDFBox JARs and dependencies (fontbox, commons-logging, etc.).
  • Place all JARs in your Katalon project’s drivers folder.
  1. Text Extraction:
  • Works best with text-based PDFs (not scanned images).
  • For tabular data, consider libraries like Tabula (requires additional setup).
  1. Excel Path:
  • Use absolute paths (e.g., 'C:/data/export.xlsx') to avoid path resolution issues.

Hi @jkaur,

Welcome to our community . Have you tried the suggestions from @dineshh? Please let us know if you solve/not. Thank you