I am working on a script where clicking on a link opens a BLOB PDF file in new tab of window
Now I want to create a new excel sheet and copy the content of BLOB PDF file to the newly created excel sheet.
So far I am able to download the PDF file to system but I dont know why but my script is unable to locate it and open it.
Can someone help me with How to Open and copy paste the content od PDF to Excel sheet part of code.
2 Likes
To automate copying content from a BLOB PDF to an Excel sheet using Katalon, follow these steps:
1. Download the PDF File
Use Katalon’s built-in keywords to download the PDF:
groovy
// Set download directory
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'
// Create directory if not exists
new File(downloadDir).mkdirs()
// Trigger download (adjust locator as needed)
WebUI.click(findTestObject('your_pdf_link_object'))
// Wait for download to complete (adjust timeout as needed)
WebUI.delay(30)
2. Locate the Downloaded PDF
Find the most recent PDF file in the download directory:
groovy
import org.apache.commons.io.comparator.LastModifiedFileComparator
File downloadDir = new File(downloadDir)
File[] files = downloadDir.listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)
if (files.length == 0) {
throw new Exception("No PDF found in download directory")
}
// Sort by last modified and get latest file
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]
3. Extract Text from PDF
Use Apache PDFBox (add JARs to Katalon’s drivers folder):
groovy
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
String pdfText = ""
PDDocument document = PDDocument.load(pdfFile)
PDFTextStripper stripper = new PDFTextStripper()
pdfText = stripper.getText(document)
document.close()
4. Create Excel Sheet and Paste Content
Use Katalon’s Excel keywords:
groovy
import com.kms.katalon.keyword.excel.ExcelKeywords
// Create new Excel file
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)
// Create a sheet
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])
// Write PDF text to Excel (each line in a row)
String[] lines = pdfText.split('\n')
for (int i = 0; i < lines.size(); i++) {
ExcelKeywords.setValueToCellByAddress(excelPath, 'PDF_Content!A' + (i+1), lines[i])
}
5. Full Workflow Example
groovy
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import com.kms.katalon.keyword.excel.ExcelKeywords
import org.apache.commons.io.comparator.LastModifiedFileComparator
// Download PDF
String downloadDir = System.getProperty('user.dir') + '/DownloadedFiles'
new File(downloadDir).mkdirs()
WebUI.click(findTestObject('pdf_link_object'))
WebUI.delay(30)
// Find latest PDF
File[] files = new File(downloadDir).listFiles({ dir, name -> name.endsWith('.pdf') } as FilenameFilter)
if (!files) {
throw new Exception("PDF download failed")
}
Arrays.sort(files, LastModifiedFileComparator.LASTMODIFIED_REVERSE)
File pdfFile = files[0]
// Extract text
String pdfText = new PDFTextStripper().getText(PDDocument.load(pdfFile))
// Write to Excel
String excelPath = downloadDir + '/ExtractedData.xlsx'
ExcelKeywords.createExcelFile(excelPath)
ExcelKeywords.createExcelSheets(excelPath, ['PDF_Content'])
pdfText.split('\n').eachWithIndex { line, idx ->
ExcelKeywords.setValueToCellByAddress(excelPath, "PDF_Content!A${idx+1}", line)
}
Key Troubleshooting Tips:
- PDF Download Issues:
- Ensure no conflicting Firefox/Chrome download dialogs exist (disable them via Katalon’s browser preferences).
- Add explicit waits (
WebUI.delay()) if downloads are slow.
- PDFBox Setup:
- Download PDFBox JARs and dependencies (fontbox, commons-logging, etc.).
- Place all JARs in your Katalon project’s
driversfolder.
- Text Extraction:
- Works best with text-based PDFs (not scanned images).
- For tabular data, consider libraries like Tabula (requires additional setup).
- Excel Path:
- Use absolute paths (e.g.,
'C:/data/export.xlsx') to avoid path resolution issues.