How to assert on content of downloaded PDF file

Hi,

I am trying to find a way to assert the actual content of a PDF. I know there is such a thing as PDFBox (Apache PDFBox | A Java PDF Library) , but don’t know how it fits into the Katalon eco system. Do I need to include a jar file into my katalon project? Which version fits? How do I write the custom keyword to do a simple content check?
I would hope that this has already been done by someone on the community, so there is no need for me to start inventing that wheel again…

What do I have already?
I have scripts in place to:
1° Cleanup a custom download directory (chromer browser) at startup.
2° Download a PDF into that custom download directory.
3° Do some assertions on this download:

  • download is available and results in only 1 file
  • the file is a PDF,
  • filename is as expected,
  • meta data title is as expected,
  • the files size is as expected,
  • base64 encoded matches an expected baseline base64 string (if it’s a static pdf)
    4° Run cleanup of the custom download folder to have proper ending situation

I can provide these scripts for anyone interested, but the only thing I’d like to do more is really assert the actual content of the PDF (presumably via PDFBox or other libraries).
Thanks in advance!

I would show a sample code that converts a PDF into an HTML.

Once you have converted it into an HTML, you can write your Test Cases to assert the HTML content as you ordinarily do using WebUI.* keywords.

See materialstore-mapper/build.gradle at master · kazurayam/materialstore-mapper · GitHub

Please note that the project materialstore-mapper is NOT a Katalon project.

1 Like

I have created a new Katalon project that demonstrates how to assert text content in a downloaded PDF file.

This project is a Katalon Studio project.
It uses the com.kazurayam.materialstore.mapper.PDF2HTMLMapper class that I introduced above.

I found a few more jars which are indirectry required to run the application. Please find it in the following:

2 Likes

Thank you very much @kazurayam ! I will dive into this as soon as I have some free time and report back here once done.

Hi @kazurayam . I am struggling a bit with the gradle part. I am new to this, so don’t know how that actually works normally. From your readme file I can see:

So, I have your Katalon project in my pdfCheck folder and I try in my cmd to execute the “$ gradle drivers” command (after installing gradle), but that isn’t working:
C:\pdfCheck\VerifyPDFContent-master>gradle drivers
‘gradle’ is not recognized as an internal or external command,
operable program or batch file.

Or do I need to adapt my “build.gradle” file to match yours?

So far I only tackled dependencies by copy pasting manually the .jar’s into the drivers folder.
Sorry for this, might be something that is obvious from your end, but I am a bit clueless.

It is obvious to me that it would take you long time to learn Gradle from scratch. It could be longer than getting familia with Katalon Studio.

I suppose you do not want to spend days and nights to be skilled for Gradle.

You can do the same. All you need to know for now is which jars are necessary. You can find that information in the build.gradle I provided.

1 Like

This tells me that you installed Gradle wrongly.

How wrongly? I do not see it. It’s only you who can find it.

In order to find what’s wrong, you need to learn Gradle. If you want to find the answer, study some Gradle tutorial.

Ok, I managed without Gradle, tweaked the script it a bit to get my downloaded file converted to the html and it’s working perfect.

On my questions above regarding Gradle (since I detect a hint of frustration at your end*): I’ll definitely will look into learning those Gradle skills, perhaps find some time with developers here that use it on a regular basis, as that seems like a really handy way to also get those jars also into the build/test pipeline machine without the copy paste hassle. Whether it would take me a long time and/or I would need days and nights is then my problem.

My questions on that topic were in no way an attempt for you to provide me a full blown Gradle tutorial. I just thought that it was needed for your solution to work. (That’s the problem with not knowing a topic: The answer might be either just as simple as “do this”, or -like in this case- there might be a whole lot more of complexity, impossible to be covered in a forum post.) You then perfectly pointed out that I could just download and copy-paste the jars mentioned in the post.

Anyway, kudos for the solution, I definitely learned something new and appreciate your valuable time spend on this!

*this could be just me, or could just be something lost-in-translation (English not my native language), or perhaps could be the sign-of-times where everyone on the internet just replies a bit more assertive than IRL

@joost.degeyndt

This time, you didn’t need any skill for Gradle because I told you which jar files you need. In future when you get a similar problem (which jars are needed?) and want to find an answer for yourself, you need to have good skill for Gradle (or Maven).