TestClosure --- executing Groovy Closures in multi-Threads in a Test Case simultaneously

The following post may be relevant:

Solution
We have to use different user-data directory for each profile. We no need to create profile manually in chrome browser and also no need to provide --profile-directory argument in chrome options. But you can maintain the sessions and history by mentioning different user-data-dir path for each chrome driver instance

In the TestClosure project I have a Test Case which opens 4 Chrome browser windows and visits several URLs starting with http://www.katalon.com/. The Test Case drivers 4 threads in parallel. I expected that test runs faster than 1 thread.

However, to my surpise, the test run very slowly. See the following measurement.

It took 60 seconds for this test case script to open https://www.katalon.com/. When I manually open the same URL, it opens in a few seconds. It does not take 60 seconds. Why does it take so long for my test case script to open https://www.katalon.com/?

I have studied this issue in the last 10 months. Now I have found out the reason.

My program opened a Chrome browser like this:

			// open a browser window for this TestClosure
			WebUI.openBrowser('')

This is the usual way in Katalon Studio scripting. This call opens a Chrome browser with completely empty cache.

As you know, this browser has no HTML files cached, no CSS cached, no JavaScript cached, no GIF/JPEG/PNG images cached. When my script navigate to the URL https://www.katalon.com/, yes, the page will eventually open. But it takes long seconds to download all these resources.

Now I have realised that WebU.openBrowser('') is not enough for me to make my test scripts with TestClosure framework run faster. I need to invent a way to launch Chrome browser specifying custom profiles Katalon1, Katalon2, Katalon3 and Katalon4 which keep the web resources are cached and reusable.

Provided with a new way of launchig Chrome with custom profile, I will prepare these 4 profiles to cache web resources of the target URL https://www.katalon.com prior to testing. Then my test script should launch 4 Chrome browser with these well-prepared custom profiles which supply cached web resources. Then the test script will run much faster.

But 60 seconds??? It shouldn’t take that long, regardless.

Good eyes you have.

60 second is the timeout limit I set to Katalon Studio to wait for the page load. It seemed the page did not finish loading within that duration.

I noticed that “http://www.katalon.com/” has some links to external host’s URL, and somehow HTTP session to that external URL does keep on going longer than the timeout seconds; or maybe HTTP session is kept unclosed forever, so that Katalon Studio does not recognise the end of page loading. But I find this long page loading only when I opened browser by automation script using “WebUI.openBrowser(‘’)”; I do not see it when I open the page manually; it is puzzling. I think that this puzzle has nothing to do with multi-threading. This puzzle would have something to do with the way how I launch Chrome browser ---- with profile or without it, how desired-capabilities are set, etc.

Anyway, I haven’t look at the nature of this external URL careful enough. I will look into the external URLs that seem to be never finishing to load. If I find the session is really never finishing and not really important, then I would change the timeout setting more quicker: 60 → 20 seconds for example.

1 Like

I have found out a reason of 60 seconds.

I used a secret method of activating Chrome browser using existing User Profile, not a default profile. I developed a class to do this. The class took very long time to copy files under an existing profile’s directory to a temporary directory. The files include browsing history, cached pages+images, etc. When I specified my “Default” Chrome profile which contained so many files (nearly 30,000), it took 30 seconds to copy files to a temporary directory. I examined a different case. I altered my code to use another Chrome profile, which contains almost zero history and cache files, to launch Chrome. Then it took just a few seconds to open browser. Well, there is a lot of pitfalls before launching a browser with XxxxDriver.

1 Like

Now I do not recommend for others to follow my TestClosure approach any longer.

See the following post for the reason why I don’t.

Primarily because

However I do not recommend you to follow this way. Because any problems in this system are difficult to analyse and fix. I haven’t developed any easy-to-use debugging facility for “multi-threaded Test Cases” (I am not capable of developing such magical thing). I only have the Stack Trace messages emitted by Exceptions. I am OK with it, but you would be not. I do not think I can support you remotely.

2 Likes

worth to mention also, quoting:
I learned a simple & fundamental lesson: I need more machine + network resources to process more tasks in a limited time frame.

A year and 10 months have passed since I wrote:

Today, I happened to find the cause why my tests ran so slow. The Smart Wait was enabled in this project. That was the reason why it took so long for my tests to open https://www.katalon.com/.

When I disabled the Smart Wait, my tests ran jumping fast!

I guess that http://www.katalon.com/ pages employ some AJAX technique. The HTML DOM of the pages continue moving; AJAX event stream never stops. Therefore the Smart Wait continues waiting long until the timer expires after 30 seconds.

My mistake! I should have been more cautious about the Smart Wait and the nature of the target web application. Now I am pleased that I could resolve this long-outstanding issue for me.

I understand that multi-threading was not the cause of slowness. Multi-Threaded execution of Test Cases by “TestClosure” might be a useful trick though I previously thought it isn’t.

I have updated the project

and published a new releases 0.25.1. With it I did an experiment how much multi-threading can improve the performance of a test case. I used my MacBook Air, M1, 8GB memory. Katalon Studio Arm64 9.0 with Internet connection 6Mbps

By executing the Test Suites/demo/demo3D_1_2_4Threads, a test case processes 18 URLs. The test case runs multiple Java Threads. Each thread executes a Groovy closure. Each closure visits a URL and take a screenshoot and save it into disk. I executed the test case with 1 thread, 2 threads and 4 threads. The durations measured were as follows:

ID duration graph
1 Thread 05:06 #########1#########2#########3#
2 Threads 03:54 #########1#########2####
4 Threads 02:51 #########1########

A single # means 10 seconds.

It took approximately 5 minutes to process 18 URLs with a single thread.
With 2 threads, it took approximately 4 minutes.
With 4 threads, it took approximately 3 minutes.

@kazurayam interresting …
anyway, I saw this sentence in your git project:

It is pointless to give the maximum number of threads for TestClosureCollectionExceutor with a value larger (8, 16, 32) than the number of cores you have.

Is this based on experiments, or an assumption?
Why do I ask? Well, there are two known methods for parallelization in Java (and not only in Java).
Processes and threads, a short description about can be found here:
Process vs Thread: What's the Difference? - javatpoint.

Not sure exactly how ExecutorService actually works, but it seems to be an interface to help create and execute threads (and async top of that, which is yet another clever mechanism for this case as there is no obvious reason for inter-thread communication)

In the case of processes, it make sense to limit the number of them to numproc (or numproc -1 sometime) because those are handled by the kernel scheduler. So having more processes than the number of cores, some will be queued until resources are released.

However, with threads, the limitations usually came from other factors (mem available / heapsize, iowaits etc)

So, if i am guessing right and the ExecutorService is using just threading, you can try to experiment a bit by going with the pool size with larger values, until you find an optimum (it is not a guarantee it will perform better, since, as I mentioned, there are lot of other factors involved, but worth to try)

You can also keep an eye on the taskmanager and see how many java processes are running when you exectute the tests.

If you see just a single process (but with plenty threads, ofcourse), then everything is just threaded, if you see more than one, it may be that under-the-hood it is forking processes, so the limitation to numproc start to make sense.

Just a guess, no backing experiments.

I would erase this sometime in future.

I am going to do one more variation of experiments. In the current experiments, the controller class repeats opening/closing a browser window for each invokation of Groovy Closure. I took this design because it is easy to implement. But, as you know, launching a browser window is a heavy processing. It would be a good idea to launch a limited number of browser windows (as many as the number of threads), and to reuse a single browser window for multiple times of closure invocation. I expect that the test will run a bit faster.

worth to try but you may face another issue.
not sure if next thread in queue will know what browser instance to use.
so, you may have to pass also the driver instance from one thread to another and in this case async thread may not help.
you will have to use interthread comunication but the botleneck may be the browser instance.
some funny stuff may happen if you start the browser from the main thread and execute two or more ‘navigate to’ at the same time. so each thread will have to wait until the previous one releases the browser, which may not help at all.

will dig a bit, i think for such forking processes may be a better aproach, so each process to use a dedicated browser, and further execute the tasks either sequentially or threaded.
with processes, you should actually benefit by the cpu cores available.
i think i saw once some doc about a process executor similar to the one you used (or i may confuse it with python…)
anyway, it is not a guarantee this aproach will be better, since multiprocessing it usually used to solve cpu bound / compute issues, threads are better when the problem to solve is IO bound

@kazurayam I thing the equivalent of ExecutorService for multiprocessing may be

worth to explore, imho

My code uses java.util.concurrent.ExecutorService

Line#115:

		// create Thread pool
		ExecutorService executorService = Executors.newFixedThreadPool(capacity)

		List<Future<String>> futures = executorService.invokeAll(loadedTestClosures)
                ...

I won’t do that. It makes the code too messy.

Unfortunately I am not able to find a pretty explanatory guide about Java and when to use thread or process.
And how. And when
So, despite is for python, a good introductory article can be found here, from where I learned a lot about such:

… and from there, following the links, are more example on how to choose the right solution, how to tune, and so on.

Since there is not a clear guidance for java (or at least I am not able to find such), on this matter …I just throw that link and this is it, at this moment.
I played with the python examples and make sense what is written there at least for me, however I don’t have the needed bandwith right now to achieve the same with java, once the basic concepts are understood.
So I will just let that link here for somebody willing to actually explore.

Well, I modified the code and did a measurement. The resullt was as follows.

ID duration graph
Renew browser windows, 1 Thread 05:06 #########1#########2#########3#
Renew browser windows, 2 Threads 03:54 #########1#########2####
Renew browser windows, 4 Threads 02:51 #########1########
Reuse a browser window, 1 Thread 04:05 #########1#########2#####
Reuse 2 browser windows, 2 Threads 02:57 #########1########
Reuse 4 browser windows, 4 Threads 02:00 #########1###

Previously, my code renewed browser window for every page browsing, 1 Thread processed 18 URLs in approximately 5 minuites.
This time, my new code resused browser windows for multiple page browsing, 1 Thread processed the same job in approximately 4 minuites; 4 Threads processed it in 2 minutes. I think the code change was effective enough for faster run.


Why the test which reuses a browser window for multiple page visits can run faster than the test which renews browser windows for every single page visit?

The reason is simple: Browser’s content cache worked.

Browser can cache the web resources (images, css, js, etc) and reuse them to respond faster. In fact my test visits a single site http://www.katalon.com/ and its sub-pages which share a lot of web resources. If my test reuses a browser window to visit multiple sub-pages, the cached resources contributes for faster response.

On the other hand, WebDriver always opens a browser with empty cache. Therefore if I repeat open/close browser window for every single visit to the site, the cache would NOT contribute for faster response.

By the way, the Parallel mode of Test Suite Collection uses the multi-processes approach, not multi-threads. See the following screeshot:

I have a Test Suite Collection named TCS, which comprises with 4 Test Suites TS1, TS2, TS3 and TS4. Each Test Suite calls the same Test Case named blocker which just continues

    WebUI.delay(1)
    println "Hi"

for 100 times. When I execute it, I saw 4 Processes were created. I could see it in the monitor app of OS.

So, if you want to try a parallel execution of of Test Cases with multi-processes, the easiest way is to use Test Suite Collection in parallel mode.

Why am I not satisfied with the parallelizm by Test Suite Collection? Why did I challenge the TestClosure project? What was my motivation?

I am not concerned about the choice of Multi-Process / Multi-Thread at all.

I wanted to implement the following “init - Map - Reduce” architecture, where the “Map” stages are executed parallelly, and the “Reduce” stage must be executed after all of the “Map” stages finished.

+------+           +-------+
| init | ----+---> | Map 1 | ----+
+------+     |     +-------+     |      +--------+
            ...                  +----> | Reduce |
             |     +-------+     |      +--------+
             +---> | Map N | ----+
                   +-------+

I was not satisfied with the parallel mode of Test Suite Collection because it does not let me implement the “Reduce” step.

So, I designed the “TestClosure” project let me do this MapReduce-like architecture cheaply. See the following source code if you are interested in

This is just a single Test Case script (which uses my custom Groovy classes); this implements the “init - Map - Reduce” architecture that I wanted. TestClosure is cheap. This does not require any change in Katalon products; does not require any priced Cloud-based infrastructures.

1 Like