I’ve been searching through the Web Testing forms to see if I could find what would be the best practice for dealing with web content where Katalon appears to either:
-Does not recognize a character
-Consider it a special character.
-Or its a reserved character (where you use some sort of escape character.
That being said, in my below example I am reading through a CSV file and then comparing the Page Title on the https://duckduckgo.com/ website.
When Katalon reads my CSV file it does not appear to recognize the dash like character in the title.
On the other hand, the title of DuckDuckGo site contains U+002D - HYPHEN-MINUS.
These 2 characters are different.
Why your CSV file contains U+2014 — [EM DASH] rather than U+002D - [HYPHEN-MINUS]?
— I do not know. It is most probable that you manually typed it, though you may not be aware of.
package my
public class StringUtils {
/**
* convert the input string
* while escaping all non ASCII characters of which UNICODE code point is larger than 128
*
* E.g.,
* String s = "Hello\u2010world"
* println(s) // --> "Hello‐world"
* println(StringUtils.escapeNonAsciiChars(s)) // --> "Hello\u2010world"
*/
static String escapeNonAsciiChars(String str) {
StringBuilder sb = new StringBuilder()
for (int i = 0; i < str.length(); i++) {
int codepoint = str.codePointAt(i)
if (codepoint < 128) {
sb.append(str.charAt(i))
} else {
sb.append("\\u").append(String.format("%04X", codepoint))
}
}
return sb.toString()
}
}
Hi Russ. Sorry I did not reply. For some reason I did not receive any alert e-mails from Katalon. I will attach my sample CSV at the very latest post. Thanks for sharing your observations.
In between (1) and (2), you must have used some GUI tool opened and you pasted a string which was copied at (1). I guess, the tool you used converted a UNICODE U+002D - [HYPHEN-MINUS] into something else when you pasted the string.
Some of sophisticated GUI applications for NON-programmers are interfering and do convert some characters to other silently. Especially UNICODE U+002D - [HYPHEN-MINUS] could be troublesome. Some applications want to treat UNICODE U+002D - [HYPHEN-MINUS] in special manner.
Let me show you an example, MARKDOWN language of Discourse, upon which this Katalon Forum is hosted, renders consecutive UNICODE U+002D - [HYPHEN-MINUS] characters in a markup document:
HYPHENS IN --- BETWEEN
into an EM-DASH in presentation view
HYPHENS IN — BETWEEN
Which tool did you use? I guess you used famous Microsoft Excel, but I am not sure. If you remember which GUI tool you used, then why not you try to reproduce your mischievous CSV?
After the last issue with Katalon and Excel I switched to using CSV files. Since I have to use Windows at work, I’m using Notepad++ as the text editor. I copied the element from Chrome Developer to the CSV using the below command.
I am not sure what “Katalon Data Viewer” is. It seems I have never used it.
But I guess the “Katalon Data Viewer” is not careful enough (has a bug) for character encoding. It is reading characters streams as encoded by ISO-8859-1 (Latin-1), not by UTF-8, possibly.
I think, Katalon Team should be notified of this bug.