Working with Page Titles and Content with Special Characters


I’ve been searching through the Web Testing forms to see if I could find what would be the best practice for dealing with web content where Katalon appears to either:

-Does not recognize a character
-Consider it a special character.
-Or its a reserved character (where you use some sort of escape character.

That being said, in my below example I am reading through a CSV file and then comparing the Page Title on the website.

When Katalon reads my CSV file it does not appear to recognize the dash like character in the title.

This is an example of my CSV file:

“DuckDuckGo — Privacy, simplified.”,,Tired at DuckDuckGo of being tracked online? We can help.

I copied the content directly via Google Chrome Developer (copy element)

This is an example of the failed compare message when the character is not recognized:

Test Cases/1. Test Setup/1. Smoke Test - URL, Page Title and Content check from a file FAILED.
Assertion failed: 

assert WebUI.getWindowTitle() == Test_Title
             |                |  |
             |                |  DuckDuckGo � Privacy, simplified.
             |                false
             DuckDuckGo — Privacy, simplified.

Any recommendations\documentation I should read to handle situations like this are welcome.

Thank you.

Perhaps it’s a bug - but I’m not sure what encoding Katalon expects/supports for text files.

Do you happen to know what encoding your csv is using? utf-8? ANSI? Are you able to change it, or are you not in control of its production?

There are several hyphen-like characters. Please learn the following:


You showd the following string contained in your CSV file:

In this string, there is a U+2014 — EM DASH.

On the other hand, the title of DuckDuckGo site contains U+002D - HYPHEN-MINUS.

These 2 characters are different.

Why your CSV file contains U+2014 — [EM DASH] rather than U+002D - [HYPHEN-MINUS]?
— I do not know. It is most probable that you manually typed it, though you may not be aware of.

How did I find this?

I will tell you.

I made a custom Keyword: my.StringUtils:

package my

public class StringUtils {

	 * convert the input string
	 * while escaping all non ASCII characters of which UNICODE code point is larger than 128
	 * E.g.,
	 * String s = "Hello\u2010world"
	 * println(s)                                    // --> "Hello‐world"
	 * println(StringUtils.escapeNonAsciiChars(s))   // --> "Hello\u2010world"
	static String escapeNonAsciiChars(String str) {
		StringBuilder sb = new StringBuilder()
		for (int i = 0; i < str.length(); i++) {
			int codepoint = str.codePointAt(i)
			if (codepoint < 128) {
			} else {
				sb.append("\\u").append(String.format("%04X", codepoint))
		return sb.toString()

I made a Test Case TC1:

def Test_Title = 'DuckDuckGo — Privacy, simplified.'
def escapedTitle = my.StringUtils.escapeNonAsciiChars(Test_Title)
println escapedTitle

When I ran the test case, I got the following output in the console.

2021-02-11 09:52:05.134 INFO  c.k.katalon.core.main.TestCaseExecutor   - START Test Cases/checkDuckDuckGoPageTitle
2021-02-11 09:52:06.901 DEBUG testcase.checkDuckDuckGoPageTitle        - 1: Test_Title = "DuckDuckGo — Privacy, simplified."
2021-02-11 09:52:06.910 DEBUG testcase.checkDuckDuckGoPageTitle        - 2: escapedTitle = StringUtils.escapeNonAsciiChars(Test_Title)
2021-02-11 09:52:06.961 DEBUG testcase.checkDuckDuckGoPageTitle        - 3: println(escapedTitle)
DuckDuckGo \u2014 Privacy, simplified.
2021-02-11 09:52:06.980 INFO  c.k.katalon.core.main.TestCaseExecutor   - END Test Cases/checkDuckDuckGoPageTitle

I am sure you have \u2014 (EM Dash) in your CSV file.

Hi Russ. Sorry I did not reply. For some reason I did not receive any alert e-mails from Katalon. I will attach my sample CSV at the very latest post. Thanks for sharing your observations.

Did you ever noticed the message at the top of the forum page?

I did see visual alert after i signed into the Katalon forum. Just no e-mail alert. (613 Bytes)

I’ve attached a copy of my CSV for your reference.


Just so you two know I am inherently Super Efficient :slight_smile: … I used copy and paste while:

(1) Chrome Developer to get the page title.

(2) Then from error message in Katalon.

So … not as smart as Kaz … but Highly Efficient … LOL

Is the solution that I should encode my CSV file to a certain format? I assume you want to review my CSV first.

Thank you for the zip file disclosed.
In the CSV file I did find a strange character.

I do not see the reason why you have the strange character. It’s you who should know it. Nobody else will do.

How to fix this? — You can edit the CSV file with your favourites text editor manually. Will you require any other method?

I am sure, Katalon is not guilty.

In between (1) and (2), you must have used some GUI tool opened and you pasted a string which was copied at (1). I guess, the tool you used converted a UNICODE U+002D - [HYPHEN-MINUS] into something else when you pasted the string.

Some of sophisticated GUI applications for NON-programmers are interfering and do convert some characters to other silently. Especially UNICODE U+002D - [HYPHEN-MINUS] could be troublesome. Some applications want to treat UNICODE U+002D - [HYPHEN-MINUS] in special manner.

Let me show you an example, MARKDOWN language of Discourse, upon which this Katalon Forum is hosted, renders consecutive UNICODE U+002D - [HYPHEN-MINUS] characters in a markup document:


into an EM-DASH in presentation view


Which tool did you use? I guess you used famous Microsoft Excel, but I am not sure. If you remember which GUI tool you used, then why not you try to reproduce your mischievous CSV?

Hi Kaz,

After the last issue with Katalon and Excel I switched to using CSV files. Since I have to use Windows at work, I’m using Notepad++ as the text editor. I copied the element from Chrome Developer to the CSV using the below command.

Do you have a recommended text editor if I am in Windows?

I was wrong.

In the screenshot you provided last, I found a EM-DASH like character ー in the <title> element.

I use my PC with LANG=ja_JP, not LANG=en_US.

When I open the page, the <title> text contained EM-DASH like character as well.

<title>DuckDuckGo — プライバシー保護をシンプルに。</title>

Which language do you use on your PC? not LANG=en_US?

If you open the URL on a different PC with LANG=en_US, then the <title> text may be different depending on the language setting.

For Reference this is what Notepad++ is showing when I open the CSV

Looks like my employer has the default US English chosen in Chrome:

@kazurayam @Russ_Thomas

Hi Guys,

I ended up copying the character from the CSV into this web character identification tool and was able to confirm it is an EM DASH:

But when you look at the Katalon Data Viewer look at how it is not recognizing the character:

I am not sure what “Katalon Data Viewer” is. It seems I have never used it.

But I guess the “Katalon Data Viewer” is not careful enough (has a bug) for character encoding. It is reading characters streams as encoded by ISO-8859-1 (Latin-1), not by UTF-8, possibly.

I think, Katalon Team should be notified of this bug.


I thought so all along…


Could you put this to the “Bug” category?