Overview
I am tryna do scraping with Katalon Studio ![]()
I do this because this is what I have, that is already set up (else I would have just used my own solution of Selenium WebDriver, Groovy, Apache POI, ⦠right from the start).
I have quickly realized that, due to the nature of this project, and the website that I am scraping, that I should get a rotating residential proxy or something similar, set up with the Selenium WebDriver (to minimize the risk of me being IP-banned).
What I have came up with
I have made the decision to use OxyLabs, as it offers city-specific residential rotating proxies which is perfect for my use case.
How you tried integrating with it
I created a utils class called WebDriverUtils, to handle all WebDriver concerns, including the proxy business. Here is the code:
package me.mikewarren.myCaseScraper.utils
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions
import com.kms.katalon.core.configuration.RunConfiguration
import com.kms.katalon.core.webui.driver.DriverFactory
import net.lightbody.bmp.BrowserMobProxyServer
import net.lightbody.bmp.client.ClientUtil
import net.lightbody.bmp.proxy.auth.AuthType
public final class WebDriverUtils {
private static BrowserMobProxyServer proxy;
public static void SetUpDriver() {
// create if not exists the download directory
File downloadDir = new File(this.GetDownloadDirectory());
if (!downloadDir.exists()) {
downloadDir.mkdirs();
}
System.setProperty('webdriver.chrome.driver', DriverFactory.getChromeDriverPath())
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption('prefs', [
'download.prompt_for_download' : false,
'download.default_directory' : downloadDir.getPath(),
]);
options.setProxy(ClientUtil.createSeleniumProxy(this.getProxy()))
options.addArguments("--disable-web-security");
options.addArguments("--ignore-urlfetcher-cert-requests");
options.setAcceptInsecureCerts(true)
WebDriver driver = new ChromeDriver(options);
driver.manage().window().maximize();
DriverFactory.changeWebDriver(driver);
}
public static BrowserMobProxyServer getProxy() {
if (proxy != null)
return proxy;
proxy = new BrowserMobProxyServer();
proxy.setTrustAllServers(true);
proxy.setChainedProxy(new InetSocketAddress("pr.oxylabs.io", 7777));
proxy.chainedProxyAuthorization("customer-${System.getenv('OXYLABS_USERNAME')}-cc-us-st-us_indiana-city-indianapolis".toString(),
System.getenv("OXYLABS_PASSWORD"),
AuthType.BASIC);
proxy.start(0);
return proxy;
}
public static String GetDownloadDirectory() {
return "${RunConfiguration.getProjectDir()}/downloads";
}
public static void CloseDriver() {
DriverFactory.closeWebDriver();
if (this.proxy != null)
this.proxy.stop();
}
}
and in a Test Listener, do something like:
import com.kms.katalon.core.annotation.AfterTestCase
import com.kms.katalon.core.annotation.BeforeTestCase
import com.kms.katalon.core.context.TestCaseContext
import me.mikewarren.myCaseScraper.utils.FailureReporter
import me.mikewarren.myCaseScraper.utils.WebDriverUtils
import me.mikewarren.myCaseScraper.utils.openAI.OpenAIUtils
class NewTestListener {
private List<String> getNoBrowserOpenList() {
return [
/^Test Cases\/Unit Tests\/.+$/,
];
}
private boolean isOnTestCaseList(TestCaseContext testCaseContext, List<String> testCaseList) {
for (String regex : testCaseList) {
if ((testCaseContext.getTestCaseId() =~ regex).matches()) {
return true;
}
}
return false;
}
/**
* Executes before every test case starts.
* @param testCaseContext related information of the executed test case.
*/
@BeforeTestCase
def sampleBeforeTestCase(TestCaseContext testCaseContext) {
if (this.isOnTestCaseList(testCaseContext, getNoBrowserOpenList()))
return;
WebDriverUtils.SetUpDriver();
}
/**
* Executes after every test case ends.
* @param testCaseContext related information of the executed test case.
*/
@AfterTestCase
def sampleAfterTestCase(TestCaseContext testCaseContext) {
OpenAIUtils.GetInstance().close();
if (this.isOnTestCaseList(testCaseContext, getNoBrowserOpenList()))
return;
if (!testCaseContext.getTestCaseStatus().equals("PASSED"))
FailureReporter.GetInstance().report(testCaseContext);
WebDriverUtils.CloseDriver();
}
}
What happened when you used this?
Unfortunately, when I create Test Case to⦠test this outā¦
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.WebElement
import com.kms.katalon.core.webui.driver.DriverFactory
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI
WebUI.navigateToUrl("https://ip.oxylabs.io/")
final WebDriver driver = DriverFactory.getWebDriver()
WebElement preElement = driver.findElement(By.cssSelector("pre"));
final String ipAddress = preElement.getText();
assert ipAddress != null
System.out.println("Your IP is ${ipAddress}");
It fails!
When I take a look at the browser window, I see:
How do we know that this fail isnāt because of an internet issue?
Because I ran this Test Case, and wrote this post, from my apartment machine that I have remoted into. Iām not even in front of the machine that I worked on, and ran, this testing project (and even this very post!!) on!
How do we know that the third-party proxy is even working?!
When I do the cURL request version of this Test Case, I get the following:
FCP@LAPTOP-ELPA5ODM MINGW64 ~/OneDrive/Desktop/Software development/MyCaseScraper (refactor/miwarren/testingDirectory)
$ curl -x pr.oxylabs.io:7777 -U "customer-$OXYLABS_USERNAME-cc-us-st-us_indiana-city-indianapolis:$OXYLABS_PASSWORD" https://ip.oxylabs.io/location
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 980 100 980 0 0 1415 0 --:--:-- --:--:-- --:--:-- 1416{"ip":"174.194.5.250","providers":{"dbip":{"country":"US","asn":"AS6167","org_name":"Verizon Business","city":"Plainfield","zip_code":"","time_zone":"","meta":"\u003ca href='https://db-ip.com'\u003eIP Geolocation by DB-IP\u003c/a\u003e"},"ip2location":{"country":"US","asn":"","org_name":"","city":"London","zip_code":"40741","time_zone":"-04:00","meta":"This site or product includes IP2Location LITE data available from \u003ca href=\"https://lite.ip2location.com\"\u003ehttps://lite.ip2location.com\u003c/a\u003e."},"ipinfo":{"country":"US","asn":"AS6167","org_name":"Verizon Business","city":"","zip_code":"","time_zone":"","meta":"\u003cp\u003eIP address data powered by \u003ca href=\"https://ipinfo.io\" \u003eIPinfo\u003c/a\u003e\u003c/p\u003e"},"maxmind":{"country":"US","asn":"AS6167","org_name":"CELLCO-PART","city":"Fishers","zip_code":"","time_zone":"-04:00","meta":"This product includes GeoLite2 Data created by MaxMind, available from https://www.maxmind.com."}}}
How do we know that this is even close to something to do with Katalon Studio?!
I create a brand new Groovy project in IntelliJ IDEA.
I make the dependencies section of that projectās build.gradle look like:
dependencies {
implementation 'org.apache.groovy:groovy:4.0.14'
testImplementation platform('org.junit:junit-bom:5.9.1')
testImplementation 'org.junit.jupiter:junit-jupiter'
// https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java
implementation 'org.seleniumhq.selenium:selenium-java:3.141.59'
// https://mvnrepository.com/artifact/net.lightbody.bmp/browsermob-core
implementation 'net.lightbody.bmp:browsermob-core:2.1.5'
}
I copy and paste my WebDriverUtils class over to that brand-new project. Obviously, thereās some stuff that is Katalon-specific, that I had to stub out:
import org.openqa.selenium.WebDriver
final class DriverFactory {
private static WebDriver driver;
public static void changeWebDriver(WebDriver driver) {
this.driver = driver;
}
public static String getChromeDriverPath() {
return "${System.getProperty("user.home")}\\.katalon\\packages\\Katalon_Studio_Windows_64-9.6.0\\Katalon_Studio_Windows_64-9.6.0\\configuration\\resources\\drivers\\chromedriver_win32\\chromedriver.exe";
}
public static WebDriver getWebDriver() {
return driver;
}
public static void closeWebDriver() {
if (this.driver != null) {
this.driver.quit();
}
}
}
final class RunConfiguration {
public static String getProjectDir() {
return "${System.getProperty('user.home')}\\Desktop\\Software development\\MyCaseScraper_Selenium\\src\\test"
}
}
I bring over the Test Case, translate it into JUnit:
import com.mikewarren.myCaseScraperSelenium.utils.DriverFactory
import com.mikewarren.myCaseScraperSelenium.utils.WebDriverUtils;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.WebElement;
class OxyLabsIntegrationTest {
@BeforeEach
void setUp() {
WebDriverUtils.SetUpDriver();
}
@Test
void test() {
final WebDriver driver = DriverFactory.getWebDriver()
driver.navigate().to('https://ip.oxylabs.io/')
sleep(5 * 1000)
WebElement preElement = driver.findElement(By.cssSelector("pre"));
assert preElement.getText() != ''
System.out.println("Your IP is " + preElement.getText());
}
@AfterEach
void tearDown() {
WebDriverUtils.CloseDriver();
}
}
I run that JUnit Test Case in that brand-new project, and it passes.
Over here in Katalon Studio land, however, that test case failsā¦
What should we do here?!

