Regex not working in Katalon

I am trying to test something using regex.
My pattern is - (CR0\d{2})|(Godkänt|Avslag)

Group 1 will have, texts like CR001, CR002 etc
Group 2 will have only Strings - Godkänt OR Avslag

It is working fine in regex testing websites but not in Katalon. In Katalon it is returning ‘null’ for Group 2. What I am doing wrong?

Hi @suvankar.chandra

Please provide the test script so that we can see what you’re doing.

Hi, I am working with PDFBox library. I am extracting text from PDF and verifying a table which is present in the PDF. @ThanhTo. Also attached the PDF file for your reference.Kreditbeslutsfil_500786.7z (13.9 KB)

i mport java.util.regex.Matcher
import java.util.regex.Pattern

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper

//This below lines are to get the PDF file text
File file = new File('C:\\**FileLocation**\PDFFile.pdf')
PDDocument document = PDDocument.load(file)
PDFTextStripper stripper = new PDFTextStripper()
text = stripper.getText(document)
System.out.println("Text:" + text);
document.close()

//I am splitting the PDF text with new lines and spaces
def lines = text.split('(\r\n|\r|\n|\\s)', -1)

println(lines)

//regex pattern to find out Kreditregel ID and Resultat
String pattern = "(CR0\\d{2})|(Godkänt|Avslag)";
String rule = ""
String ssn =""
String outcome = ""
Map<String, String, String> rulesOutcomes = new HashMap<>();

// Create a Pattern object
Pattern r = Pattern.compile(pattern);

for(String line:lines){
	Matcher m = r.matcher(line);
	if (m.find( )) {
	   System.out.println("Found value: " + m.group(1) ); //Kreditregel ID
	   rule = m.group(1).replaceAll("\\s","")
	   
	   System.out.println("Found value: " + m.group(2) ); //Resultat
	   string = m.group(2).replaceAll("\\s","")
	}else {
	   System.out.println("NO MATCH");
	}
}

hi,

with this regex none real matches
.matches() method: false
.lookingAt() method: false

I tried to reproduce your problem using the code you showed and the PDF file as input.
I got the following output:

line "Beslutsunderlag" has NO MATCH
line "Ansökningsnummer" has NO MATCH
line "500786" has NO MATCH
line "Ansökningsdatum" has NO MATCH
line "2020-03-10" has NO MATCH
line "Produkt" has NO MATCH
line "Lånelöfte" has NO MATCH
line "Skandia" has NO MATCH
line "106" has NO MATCH
line "55" has NO MATCH
line "Stockholm" has NO MATCH
line "Telefon:" has NO MATCH
line "08" has NO MATCH
line "788" has NO MATCH
line "10" has NO MATCH
line "00" has NO MATCH
line "skandia.se" has NO MATCH
line "1/5Intern" has NO MATCH
line "SidaKlassificering" has NO MATCH
line "1" has NO MATCH
line "Kundinformation" has NO MATCH
line "Kunduppgifter" has NO MATCH
line "huvudlåntagare" has NO MATCH
line "Personnummer" has NO MATCH
line "199003122455" has NO MATCH
line "Förnamn" has NO MATCH
line "Skandia" has NO MATCH
line "Extranamn" has NO MATCH
line "Efternamn" has NO MATCH
line "Mocksson" has NO MATCH
line "Civilstånd" has NO MATCH
line "Ensamstående" has NO MATCH
line "C/O" has NO MATCH
line "Adress" has NO MATCH
line "123456" has NO MATCH
line "Gatuadress" has NO MATCH
line "Lindhagensgatan" has NO MATCH
line "86" has NO MATCH
line "Postnummer" has NO MATCH
line "11218" has NO MATCH
line "Postort" has NO MATCH
line "Stockholm" has NO MATCH
line "Mobilnummer" has NO MATCH
line "0796765985" has NO MATCH
line "E-postadress" has NO MATCH
line "suvankar1990@gmail.com" has NO MATCH
line "Sysselsättning" has NO MATCH
line "Fast/Tillsvidareanställd" has NO MATCH
line "Arbetsgivare" has NO MATCH
line "Capgemini" has NO MATCH
line "Inkomst" has NO MATCH
line "(från" has NO MATCH
line "UC)" has NO MATCH
line "0" has NO MATCH
line "Inkomst" has NO MATCH
line "(Angiven)" has NO MATCH
line "50" has NO MATCH
line "000" has NO MATCH
line "Valuta" has NO MATCH
line "SEK" has NO MATCH
line "Totalt" has NO MATCH
line "antal" has NO MATCH
line "barn" has NO MATCH
line "i" has NO MATCH
line "hushållet" has NO MATCH
line "Totalt" has NO MATCH
line "antal" has NO MATCH
line "barn" has NO MATCH
line "i" has NO MATCH
line "hushållet," has NO MATCH
line "heltid" has NO MATCH
line "2" has NO MATCH
line "Kreditrisk" has NO MATCH
line "Kreditrisk" has NO MATCH
line "PD" has NO MATCH
line "Skandia" has NO MATCH
line "106" has NO MATCH
line "55" has NO MATCH
line "Stockholm" has NO MATCH
line "Telefon:" has NO MATCH
line "08" has NO MATCH
line "788" has NO MATCH
line "10" has NO MATCH
line "00" has NO MATCH
line "skandia.se" has NO MATCH
line "2/5Intern" has NO MATCH
line "SidaKlassificering" has NO MATCH
line "Skuldkvot" has NO MATCH
line "Belåningsgrad" has NO MATCH
line "(sökt" has NO MATCH
line "belopp" has NO MATCH
line "inräknat)" has NO MATCH
line "Riskklass" has NO MATCH
line "Lånelöftesbelopp" has NO MATCH
line "kreditbeslut" has NO MATCH
line "baserats" has NO MATCH
line "på" has NO MATCH
line "350" has NO MATCH
line "000" has NO MATCH
line "Överskott/underskott" has NO MATCH
line "(KALP)" has NO MATCH
line "3" has NO MATCH
line "Beslut" has NO MATCH
line "3.1" has NO MATCH
line "Kreditregler" has NO MATCH
line "Datum" has NO MATCH
line "Kreditregel" has NO MATCH
line "ID" has NO MATCH
line "Beskrivning" has NO MATCH
line "Handläggare" has NO MATCH
line "Kommentar" has NO MATCH
line "Resultat" has NO MATCH
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR030
line ""Internal" has NO MATCH
line "engagement" has NO MATCH
line "check" has NO MATCH
line "(only" has NO MATCH
line "for" has NO MATCH
line "Private" has NO MATCH
line "loan" has NO MATCH
line "and" has NO MATCH
line "Mortgage" has NO MATCH
line "loans)"" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR077
line ""Applicant" has NO MATCH
line "is" has NO MATCH
line "in" has NO MATCH
line "Fraud" has NO MATCH
line "list"" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR081
line "Risk" has NO MATCH
line "för" has NO MATCH
line "bedrägeri" has NO MATCH
line "Företag" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Avslag
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR032
line "Temporary" has NO MATCH
line "or" has NO MATCH
line "project" has NO MATCH
line "based" has NO MATCH
line "employment" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR007
line "Skyddad" has NO MATCH
line "personuppgift" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR008
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR011
line "'Customer" has NO MATCH
line "has" has NO MATCH
line "BOX-" has NO MATCH
line "adress" has NO MATCH
line "in" has NO MATCH
line "big" has NO MATCH
line "cities'" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR012
line "Customer" has NO MATCH
line "has" has NO MATCH
line "FACK-" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "Skandia" has NO MATCH
line "106" has NO MATCH
line "55" has NO MATCH
line "Stockholm" has NO MATCH
line "Telefon:" has NO MATCH
line "08" has NO MATCH
line "788" has NO MATCH
line "10" has NO MATCH
line "00" has NO MATCH
line "skandia.se" has NO MATCH
line "3/5Intern" has NO MATCH
line "SidaKlassificering" has NO MATCH
line "adress" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR013
line "Customer" has NO MATCH
line "has" has NO MATCH
line "Poste" has NO MATCH
line "restante-address" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR015
line "Foreign" has NO MATCH
line "resident" has NO MATCH
line "and" has NO MATCH
line "have" has NO MATCH
line "at" has NO MATCH
line "least" has NO MATCH
line "ONE" has NO MATCH
line "late-" has NO MATCH
line "payment" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR024
line "Kreditupplysning" has NO MATCH
line "saknas" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Avslag
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR035
line "Debt" has NO MATCH
line "remediation" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR036
line "Skuldsaldo" has NO MATCH
line "UC" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR047
line "If" has NO MATCH
line "customer" has NO MATCH
line "has" has NO MATCH
line "lost" has NO MATCH
line "their" has NO MATCH
line "Drivers" has NO MATCH
line "license" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR048
line "If" has NO MATCH
line "customer" has NO MATCH
line "has" has NO MATCH
line "lost" has NO MATCH
line "their" has NO MATCH
line "Passport" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR049
line "If" has NO MATCH
line "customer" has NO MATCH
line "has" has NO MATCH
line "lost" has NO MATCH
line "their" has NO MATCH
line "ID" has NO MATCH
line "document" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR050
line "Marital" has NO MATCH
line "status" has NO MATCH
line "differs" has NO MATCH
line "from" has NO MATCH
line "what" has NO MATCH
line "the" has NO MATCH
line "customer" has NO MATCH
line "has" has NO MATCH
line "entered" has NO MATCH
line "in" has NO MATCH
line "application" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR051
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR071
line "Skyddad" has NO MATCH
line "adress" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "Skandia" has NO MATCH
line "106" has NO MATCH
line "55" has NO MATCH
line "Stockholm" has NO MATCH
line "Telefon:" has NO MATCH
line "08" has NO MATCH
line "788" has NO MATCH
line "10" has NO MATCH
line "00" has NO MATCH
line "skandia.se" has NO MATCH
line "4/5Intern" has NO MATCH
line "SidaKlassificering" has NO MATCH
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR072
line "customer" has NO MATCH
line "is" has NO MATCH
line "emigrated" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR073
line "customer" has NO MATCH
line "is" has NO MATCH
line "deceased" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR074
line "UC" has NO MATCH
line "Investigation" has NO MATCH
line "real" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR075
line "UC" has NO MATCH
line "investigation" has NO MATCH
line "spec" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR076
line "Lost" has NO MATCH
line "id" has NO MATCH
line "documents" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR019
line "PD" has NO MATCH
line "för" has NO MATCH
line "högt" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR028
line "PD" has NO MATCH
line "saknas" has NO MATCH
line "System" has NO MATCH
Found value: Avslag
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR041
line "All" has NO MATCH
line "applicants" has NO MATCH
line "have" has NO MATCH
line "currency" has NO MATCH
line "SEK" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
Found value: CR052
line "Kontrollera" has NO MATCH
line "inkomst" has NO MATCH
line "-" has NO MATCH
line "199003122455" has NO MATCH
line "System" has NO MATCH
Found value: Godkänt
line "3.2" has NO MATCH
line "Kreditbeslut" has NO MATCH
line "Datum" has NO MATCH
line "Handläggare" has NO MATCH
line "Kommentar" has NO MATCH
line "Resultat" has NO MATCH
line "20-03-10" has NO MATCH
line "13:12" has NO MATCH
line "System" has NO MATCH
Found value: Avslag
line "Skandia" has NO MATCH
line "106" has NO MATCH
line "55" has NO MATCH
line "Stockholm" has NO MATCH
line "Telefon:" has NO MATCH
line "08" has NO MATCH
line "788" has NO MATCH
line "10" has NO MATCH
line "00" has NO MATCH
line "skandia.se" has NO MATCH
line "5/5Intern" has NO MATCH
line "SidaKlassificering" has NO MATCH
line "" has NO MATCH

This contains some lines of successful MATCH:

...
Found value: Godkänt
...
Found value: CR052
...
Found value: Godkänt
...

Your code works OK, doesn’t it?
Do you find any other problem?

Nopes! The group 2 is not working.

I tried to replaceAll new lines, are made the input as single line string from Multi line. Then also it didnt work.

But the same code does not work in my system. It is not able to find values for group(2) - (Godkänt/Avslag).
Did you change anything in the code? How did it work for you?

hi,

what about example what i have sent to you, is it working one?

These 2 are different. | or /.
Significant difference, isn’t it?

Hi! @Timo_Kuisma1

Yes, thank you for your program. That is working perfectly fine.
But what I was trying to achieve, was to write the regex in a single pattern, so that I can get the intended result in a single loop.
In my previous code I shared, and in your code as well, it is using 2 different pattern (1 for Rule, 1 for Outcome) thus reducing performance and efficiency.
I guess, I have to live with these now. But your code is solving my purpose!

Can you please share your code. I am wondering how it worked for you. This is exactly what I need!

Yes. I am using this regex - (CR0\d{2})|(Godkänt|Avslag)

(Godkänt/Avslag) - By this I meant (Godkänt OR Avslag).
Apologies for the confusion.

hi,

(CR0\d{2})|(Godkänt|Avslag)

this is not return any match when used pattern re.match

Yes. I am testing the text from the PDF file in https://regex101.com/

My regex is able to find the text I need to find. Also, it is able to put them in different group in the mentioned website. I do not understand, what am I doing wrong here.

Screenshot from regex101.com for your reference!

hello,

this works
Map<String, String> testValues = new HashMap<>();
rulePattern = "(?m)^(\\d+:\\d+)\\s(\\w+\\d+).* System (\\w.*)";
// Create a Pattern object
r = Pattern.compile(rulePattern);
// Now create matcher object.
for (String line : spaceMoved) {
    m = r.matcher(line);
    if (m.find()) {
        System.out.println("Found value: " + m.group(0));
        System.out.println("Found value: " + m.group(1));
        System.out.println("Found value: " + m.group(2));
        System.out.println("Found value: " + m.group(3));
        testValues.put(m.group(2),m.group(3));
    }
}
System.out.println(testValues);

RESULT
{CR041=Godkänt, CR081=Avslag, CR036=Godkänt, CR015=Godkänt, CR012=Godkänt, CR013=Godkänt, CR035=Godkänt, CR032=Godkänt, CR076=Godkänt, CR077=Godkänt, CR011=Godkänt, CR019=Godkänt, CR030=Godkänt, CR074=Godkänt, CR052=Godkänt, CR075=Godkänt, CR050=Godkänt, CR072=Godkänt, CR051=Godkänt, CR073=Godkänt, CR071=Godkänt, CR049=Godkänt, CR028=Avslag, CR047=Godkänt, CR048=Godkänt, CR024=Avslag, CR007=Godkänt, CR008=Godkänt}

Here is the code I used.

Script1583964528873.groovy.zip (826 Bytes)

It is 99% same as your original. No significant difference.

Strange! I do not understand, why it doesn’t work for me. Can it be anything related to Katalon version? I am using version 7.2.6

It failed for group(2). I just changed, my file path and rest I copied from your code.

Ah, I made a mistake. I sent an old version.

I made a change to your original code a bit. Please try this:

Script1583964528873.groovy.zip (860 Bytes)


I added a if (m.group(2) != null) { ... } else { ... } for robustness.

Yes, m.group(2) returned null. Your original code did not have a guarding if (m.group(2)) { ... } else {...} so that your code may throw a NPE.


Why m.group(2) returned null? … I do not know. I haven’t looked at it carefully enough. But it would depend on how the input PDF is.

Anyway I think your code should be more defensive for the case where m.group(x) returning null, as it is quite likely to happen.

Did you mean that you saw the stackTrace with messagejava.lang.NullPointerException: Cannot invoke method replaceAll() on null object ?

If so, I regret that you did not show us first the screenshot of the stackTrace.

Yes! I should have considered that! Thanks, this is the solution, I was looking for!