Hi Community members,
Are you looking to get text of information from a PDF file during a WebUI test, such as in the example below?
Then, in this topic, we will be showing you how to set up and use Katalon Studio to verify information from within a PDF file hosted on a website.
Due to PDF files not containing HTML, they do not have web objects/elements that can be used by the Katalon Studio Object Spy and Recorder. In order to workaround this restraint, code can be implemented to parse the contents on the PDF file directly from the website during a test case.
1. Adding the Test Case code
The first step to set up the code for parsing PDF files is to copy the following into your test case:
Imports:
import org.openqa.selenium.WebDriver as WebDriver
import org.openqa.selenium.remote.LocalFileDetector as LocalFileDetector
import org.openqa.selenium.support.events.EventFiringWebDriver as EventFiringWebDriver
import com.kms.katalon.core.configuration.RunConfiguration as RunConfiguration
import com.kms.katalon.core.webui.driver.DriverFactory as DriverFactory
import com.kms.katalon.selenium.driver.CRemoteWebDriver as CRemoteWebDriver
import com.kms.katalon.core.webui.driver.WebUIDriverType as WebUIDriverType
import com.kms.katalon.core.windows.keyword.WindowsBuiltinKeywords as Windows
import static com.kms.katalon.core.testobject.ObjectRepository.findWindowsObject
Function/Method Code:
// Identify the driver
EventFiringWebDriver driver = DriverFactory.getWebDriver()
// PDF Keyword call
def pdf = CustomKeywords.'com.pdf.reader.ReadPdfFromBrowser.PdfReaderUtil'(url, driver)
// Create each line of text from the .PDF file
def lines = pdf.split('\\r?\\n')
// Parse & print each individual line, at this point you can modify the code
// within the loop to look for a specific piece of text or collect the data
for (String line : lines) {
System.out.println(line)
}
2. Adding the Custom Keyword code
After adding the previous code to your test case, you will then need to implement a Custom Keyword that contains further code for parsing the .PDF file. Custom Keywords require a package to be contained in within Katalon Studio. You can see from the screenshots below how to create a package and a Custom Keyword:
In this example, it is in a package we have created called โcom.pdf.readerโ:
package com.pdf.reader
import static com.kms.katalon.core.checkpoint.CheckpointFactory.findCheckpoint
import static com.kms.katalon.core.testcase.TestCaseFactory.findTestCase
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
import static com.kms.katalon.core.testobject.ObjectRepository.findTestObject
import com.kms.katalon.core.annotation.Keyword
import com.kms.katalon.core.checkpoint.Checkpoint
import com.kms.katalon.core.cucumber.keyword.CucumberBuiltinKeywords as CucumberKW
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as Mobile
import com.kms.katalon.core.model.FailureHandling
import com.kms.katalon.core.testcase.TestCase
import com.kms.katalon.core.testdata.TestData
import com.kms.katalon.core.testobject.TestObject
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WS
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI
import internal.GlobalVariable
import java.io.BufferedInputStream;
import java.io.File;
import java.io.RandomAccessFile;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public class ReadPdfFromBrowser {
PDDocument pdDoc;
@Keyword
public String PdfReaderUtil(String html, WebDriver driver){
String pdfFileInText = "";
Thread.sleep(5000);
URL url = new URL(html);
BufferedInputStream fileToParse = new BufferedInputStream(
url.openStream());
pdDoc = PDDocument.load(fileToParse);
pdDoc.getClass();
if (!pdDoc.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
pdfFileInText = tStripper.getText(pdDoc);
}
driver.close();
return pdfFileInText;
}
}
3. Adding the Apache PDFBox
Apache PDFBox is needed in order to handle specific .PDF commands within Katalon Studio. It can be downloaded here. After downloading, you need to add the .jar
as an external library within Katalon Studio:
Please note that this guide works best for PDF files that primarily contain text. And the more stylings and images that are included in the file, the more likely the parser will run into an error.
After this is all implemented, the test case should be able to open the PDF file, parse the information within it, and print each individual line of text from the file.
4. Other helpful resources
The two links below are for Custom Keywords for Katalon Studio that allow for further PDF management, including comparing, extracting, and saving parts of a PDF file.
https://plugin-docs.katalon.com/docs/pdf-custom-keywords/com/kms/katalon/keyword/pdf/PDF.html