Extract particular value from PDF

Hi,
I have a PDF in the URL.
I am able to switch to the URL and read the full content of the PDF.
My requirement is
I need to fetch the value for the field Total from the PDF.
image

1 Like

hello

something like this way

PDDocument document = PDDocument.load(new File("C:\\Users\\xxxx\\Desktop\\pdf\\pdfcontent.pdf"))

document.getClass();

if (!document.isEncrypted()) {

	PDFTextStripperByArea stripper = new PDFTextStripperByArea();
	stripper.setSortByPosition(true);

	PDFTextStripper tStripper = new PDFTextStripper();

	String pdfFileInText = tStripper.getText(document);
	//println("Text:" + pdfFileInText);

	// split by whitespace
	def lines = pdfFileInText.split("\\r?\\n");
	//println("Textlines:" + lines);
	//define list of lists
	List<ArrayList<String>> listOfLists = new ArrayList<ArrayList<String>>();

	for (String line : lines) {
		println line
		//create dynamic list
		ArrayList<String> l = new ArrayList<>();
		l.add(line);
		listOfLists.add(l);
	}

	def word = listOfLists.get(0).get(0).split(" ");
	println(word[3]);
	def word1 = listOfLists.get(1).get(0).split(" ");
	println(word1[5]);
	def word2 = listOfLists.get(2).get(0).split(" ");
	println(word2[5]);

	List<String> total = new ArrayList<>();
	//start from line 1 cause 0 line is header line
	for(int i = 1; i < listOfLists.size()-1; i++){
		def wd = listOfLists.get(i).get(0).split(" ");
		//add here index what you will need
		total.add(wd[5]);
	}
	println(total);
}

Outcome
GREEN
YELLOW
[GREEN, YELLOW]

my pdf content is

1 Like

Thank you for the code.
It worked well.

Hi,
It seems that with the new update of katalon this code returns an error:
unable to resolve class org.apache.pdfbox.pdmodel.PDDocument.

Any help?

Hi,

download correct package and add it to the project Drivers folder
https://pdfbox.apache.org/download.cgi

1 Like