Integrating OxyLabs Proxy with Katalon Studio

Overview

I am tryna do scraping with Katalon Studio :astonished:

I do this because this is what I have, that is already set up (else I would have just used my own solution of Selenium WebDriver, Groovy, Apache POI, ā€¦ right from the start).

I have quickly realized that, due to the nature of this project, and the website that I am scraping, that I should get a rotating residential proxy or something similar, set up with the Selenium WebDriver (to minimize the risk of me being IP-banned).

What I have came up with

I have made the decision to use OxyLabs, as it offers city-specific residential rotating proxies which is perfect for my use case.

How you tried integrating with it

I created a utils class called WebDriverUtils, to handle all WebDriver concerns, including the proxy business. Here is the code:

package me.mikewarren.myCaseScraper.utils

import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeOptions

import com.kms.katalon.core.configuration.RunConfiguration
import com.kms.katalon.core.webui.driver.DriverFactory

import net.lightbody.bmp.BrowserMobProxyServer
import net.lightbody.bmp.client.ClientUtil
import net.lightbody.bmp.proxy.auth.AuthType






public final class WebDriverUtils {
	private static BrowserMobProxyServer proxy;

	public static void SetUpDriver() {
		// create if not exists the download directory
		File downloadDir = new File(this.GetDownloadDirectory());
		if (!downloadDir.exists()) {
			downloadDir.mkdirs();
		}

		System.setProperty('webdriver.chrome.driver', DriverFactory.getChromeDriverPath())

		ChromeOptions options = new ChromeOptions();

		options.setExperimentalOption('prefs', [
			'download.prompt_for_download' : false,
			'download.default_directory' : downloadDir.getPath(),
		]);

		options.setProxy(ClientUtil.createSeleniumProxy(this.getProxy()))
		
		options.addArguments("--disable-web-security");
		options.addArguments("--ignore-urlfetcher-cert-requests");

		options.setAcceptInsecureCerts(true)

		WebDriver driver = new ChromeDriver(options);
		driver.manage().window().maximize();

		DriverFactory.changeWebDriver(driver);
	}

	public static BrowserMobProxyServer getProxy() {
		if (proxy != null)
			return proxy;

		proxy = new BrowserMobProxyServer();

		proxy.setTrustAllServers(true);

		proxy.setChainedProxy(new InetSocketAddress("pr.oxylabs.io", 7777));
		proxy.chainedProxyAuthorization("customer-${System.getenv('OXYLABS_USERNAME')}-cc-us-st-us_indiana-city-indianapolis".toString(),
				System.getenv("OXYLABS_PASSWORD"),
				AuthType.BASIC);

		proxy.start(0);
		return proxy;
	}

	public static String GetDownloadDirectory() {
		return "${RunConfiguration.getProjectDir()}/downloads";
	}

	public static void CloseDriver() {
		DriverFactory.closeWebDriver();
		if (this.proxy != null)
			this.proxy.stop();
	}
}

and in a Test Listener, do something like:

import com.kms.katalon.core.annotation.AfterTestCase
import com.kms.katalon.core.annotation.BeforeTestCase
import com.kms.katalon.core.context.TestCaseContext

import me.mikewarren.myCaseScraper.utils.FailureReporter
import me.mikewarren.myCaseScraper.utils.WebDriverUtils
import me.mikewarren.myCaseScraper.utils.openAI.OpenAIUtils

class NewTestListener {
	private List<String> getNoBrowserOpenList() {
		return [
			/^Test Cases\/Unit Tests\/.+$/,
		];
	}
	
	private boolean isOnTestCaseList(TestCaseContext testCaseContext, List<String> testCaseList) {
		for (String regex : testCaseList) {
			if ((testCaseContext.getTestCaseId() =~ regex).matches()) {
				return true;
			}
		}
		
		return false;
	}
	/**
	 * Executes before every test case starts.
	 * @param testCaseContext related information of the executed test case.
	 */
	@BeforeTestCase
	def sampleBeforeTestCase(TestCaseContext testCaseContext) {
		if (this.isOnTestCaseList(testCaseContext, getNoBrowserOpenList()))
			return;
		
		WebDriverUtils.SetUpDriver();
	}

	/**
	 * Executes after every test case ends.
	 * @param testCaseContext related information of the executed test case.
	 */
	@AfterTestCase
	def sampleAfterTestCase(TestCaseContext testCaseContext) {
		OpenAIUtils.GetInstance().close();
		
		if (this.isOnTestCaseList(testCaseContext, getNoBrowserOpenList()))
			return;
		
		if (!testCaseContext.getTestCaseStatus().equals("PASSED"))
			FailureReporter.GetInstance().report(testCaseContext);
		
		WebDriverUtils.CloseDriver();
	}
}

What happened when you used this?

Unfortunately, when I create Test Case toā€¦ test this outā€¦

import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.WebElement

import com.kms.katalon.core.webui.driver.DriverFactory
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI

WebUI.navigateToUrl("https://ip.oxylabs.io/")

final WebDriver driver = DriverFactory.getWebDriver()

WebElement preElement = driver.findElement(By.cssSelector("pre"));

final String ipAddress = preElement.getText();
assert ipAddress != null

System.out.println("Your IP is ${ipAddress}");

It fails!

When I take a look at the browser window, I see:

How do we know that this fail isnā€™t because of an internet issue?

Because I ran this Test Case, and wrote this post, from my apartment machine that I have remoted into. Iā€™m not even in front of the machine that I worked on, and ran, this testing project (and even this very post!!) on!

How do we know that the third-party proxy is even working?!

When I do the cURL request version of this Test Case, I get the following:


FCP@LAPTOP-ELPA5ODM MINGW64 ~/OneDrive/Desktop/Software development/MyCaseScraper (refactor/miwarren/testingDirectory)
$ curl -x pr.oxylabs.io:7777 -U "customer-$OXYLABS_USERNAME-cc-us-st-us_indiana-city-indianapolis:$OXYLABS_PASSWORD" https://ip.oxylabs.io/location
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   980  100   980    0     0   1415      0 --:--:-- --:--:-- --:--:--  1416{"ip":"174.194.5.250","providers":{"dbip":{"country":"US","asn":"AS6167","org_name":"Verizon Business","city":"Plainfield","zip_code":"","time_zone":"","meta":"\u003ca href='https://db-ip.com'\u003eIP Geolocation by DB-IP\u003c/a\u003e"},"ip2location":{"country":"US","asn":"","org_name":"","city":"London","zip_code":"40741","time_zone":"-04:00","meta":"This site or product includes IP2Location LITE data available from \u003ca href=\"https://lite.ip2location.com\"\u003ehttps://lite.ip2location.com\u003c/a\u003e."},"ipinfo":{"country":"US","asn":"AS6167","org_name":"Verizon Business","city":"","zip_code":"","time_zone":"","meta":"\u003cp\u003eIP address data powered by \u003ca href=\"https://ipinfo.io\" \u003eIPinfo\u003c/a\u003e\u003c/p\u003e"},"maxmind":{"country":"US","asn":"AS6167","org_name":"CELLCO-PART","city":"Fishers","zip_code":"","time_zone":"-04:00","meta":"This product includes GeoLite2 Data created by MaxMind, available from https://www.maxmind.com."}}}

How do we know that this is even close to something to do with Katalon Studio?!

I create a brand new Groovy project in IntelliJ IDEA.

I make the dependencies section of that projectā€™s build.gradle look like:

dependencies {
    implementation 'org.apache.groovy:groovy:4.0.14'
    testImplementation platform('org.junit:junit-bom:5.9.1')
    testImplementation 'org.junit.jupiter:junit-jupiter'

    // https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java
    implementation 'org.seleniumhq.selenium:selenium-java:3.141.59'
       // https://mvnrepository.com/artifact/net.lightbody.bmp/browsermob-core
    implementation 'net.lightbody.bmp:browsermob-core:2.1.5'

}

I copy and paste my WebDriverUtils class over to that brand-new project. Obviously, thereā€™s some stuff that is Katalon-specific, that I had to stub out:

import org.openqa.selenium.WebDriver

final class DriverFactory {
    private static WebDriver driver;

    public static void changeWebDriver(WebDriver driver) {
        this.driver = driver;
    }

    public static String getChromeDriverPath() {
        return "${System.getProperty("user.home")}\\.katalon\\packages\\Katalon_Studio_Windows_64-9.6.0\\Katalon_Studio_Windows_64-9.6.0\\configuration\\resources\\drivers\\chromedriver_win32\\chromedriver.exe";
    }

    public static WebDriver getWebDriver() {
        return driver;
    }

    public static void closeWebDriver() {
        if (this.driver != null) {
            this.driver.quit();
        }
    }
}
final class RunConfiguration {
    public static String getProjectDir() {
        return "${System.getProperty('user.home')}\\Desktop\\Software development\\MyCaseScraper_Selenium\\src\\test"
    }
}

I bring over the Test Case, translate it into JUnit:

import com.mikewarren.myCaseScraperSelenium.utils.DriverFactory
import com.mikewarren.myCaseScraperSelenium.utils.WebDriverUtils;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.WebElement;

class OxyLabsIntegrationTest {

    @BeforeEach
    void setUp() {
        WebDriverUtils.SetUpDriver();
    }

    @Test
    void test() {
        final WebDriver driver = DriverFactory.getWebDriver()

        driver.navigate().to('https://ip.oxylabs.io/')

        sleep(5 * 1000)

        WebElement preElement = driver.findElement(By.cssSelector("pre"));

        assert preElement.getText() != ''

        System.out.println("Your IP is " + preElement.getText());
    }


    @AfterEach
    void tearDown() {
        WebDriverUtils.CloseDriver();
    }
}

I run that JUnit Test Case in that brand-new project, and it passes.

Over here in Katalon Studio land, however, that test case failsā€¦

What should we do here?!

@kazurayam

2 Likes

Hi there,

Thank you very much for your topic. Please note that it may take a little while before a member of our community or from Katalon team responds to you.

Thanks!

UPDATE

When I ā€œmake double sure thatā€ the HTTP and SSL addresses are set to the proxy address,

final String proxyAddress = 'pr.oxylabs.io:7777';

Proxy proxy = ClientUtil.createSeleniumProxy(this.getProxy())
proxy.setHttpProxy(proxyAddress)
proxy.setSslProxy(proxyAddress)
	
options.setProxy(proxy)

and run the Test Case, I see some login page from the pr.oxylabs.io:7777 page:

I wasnā€™t seeing this page when running them on my IntellJ IDEA project.

@kazurayam

Do you happen to know what I can do to solve this?

Sorry, I wouldnā€™t dare go into another proxy Labyrinth.

I found the answer within this blog post! Letā€™s explain it, and clean up my codeā€¦

What was going on

In my attempts up to now, I got it to where an authorization alert spawned on the page.

WhyWasThatHappening.png

Turns out that, there may be something wrong with the version of the BrowserMob Proxy that Katalon Studio is usingā€¦

Looking under the hoodā€¦

Letā€™s look at the source code for the ClientUtil.createSeleniumProxy methods:

	public static Proxy createSeleniumProxy(BrowserMobProxy browserMobProxy) {
		return createSeleniumProxy(browserMobProxy, getConnectableAddress());// 72
	}

	public static Proxy createSeleniumProxy(BrowserMobProxy browserMobProxy, InetAddress connectableAddress) {
		return createSeleniumProxy(new InetSocketAddress(connectableAddress, browserMobProxy.getPort()));// 85
	}

	public static Proxy createSeleniumProxy(InetSocketAddress connectableAddressAndPort) {
		Proxy proxy = new Proxy();// 96
		proxy.setProxyType(ProxyType.MANUAL);// 97
		String proxyStr = String.format("%s:%d", connectableAddressAndPort.getHostString(),
				connectableAddressAndPort.getPort());// 99
		proxy.setHttpProxy(proxyStr);// 100
		proxy.setSslProxy(proxyStr);// 101
		return proxy;// 103
	}

	public static InetAddress getConnectableAddress() {
		try {
			return InetAddress.getLocalHost();// 115
		} catch (UnknownHostException var1) {// 116
			throw new RuntimeException("Could not resolve localhost", var1);// 117
		}
	}

Several things are worth pointing out here:

  • getConnectableAddress() is stubbed out to return the localhost
  • nowhere in the createSeleniumProxy(InetSocketAddress connectableAddressAndPort) is any of the chained proxy authorization that we set in our getProxy() !

Whatā€™s the answer?

Examining the code in that blog post:

        // Set proxy authentication
        String proxyAuth = proxyUsername + ":" + proxyPassword;
        proxy.setProxyType(Proxy.ProxyType.MANUAL);
        proxy.setHttpProxy(proxyAuth + "@" + proxyAddress + ":" + proxyPort);
        proxy.setSslProxy(proxyAuth + "@" + proxyAddress + ":" + proxyPort);

we see our answer:

We should prefix the proxy address string with the "${userName}:${password}" proxy credentials! Itā€™s that simple!

Time to clean up my codeā€¦

UPDATE

It seems that, in the Test Case, the IP address that comes back is my real oneā€¦ :astonished:

but in the Selenium Web Driver (non-Katalon-Studio) one returns a proxy addressā€¦

1 Like

I was able to finally get this working, AND pass the IP address test!

First I made change to the IP Address Test itself:

The change to the IP Address Test Case

I change the assertion in the IP Address Test Case to be an assertion that the ipAddress doesnā€™t start with the first two octets of my computerā€™s IP address (which I hard-code into the Test Case. Obviously it would be different for different machines)

The changes that I made to the WebDriverUtils codeā€¦

I made several changes to the WebDriverUtils class, especiallyā€¦

The changes I made to WebDriverUtils.SetUpDriver()

First change was doing what caused the authorization modal to spawn:

setting the HTTP and SSL proxy URLs to the url of the proxy:

Proxy proxy = ClientUtil.createSeleniumProxy(this.getProxy())
proxy.setHttpProxy(proxyAddress)
proxy.setSslProxy(proxyAddress)
		
options.setProxy(proxy)

Second change was to wrap my driver, which I made a field, into an EventFiringWebDriver eventDriver, which gets passed to the DriverFactory:

EventFiringWebDriver eventDriver = new EventFiringWebDriver(driver);
eventDriver.register(new InitialNavigationEventListener());

DriverFactory.changeWebDriver(eventDriver);

Why? Because we need to register some initial navigation event listener, to, as the name implies, listen for an initial navigation and perform action only on that initial navigation!

We implement thus:

package me.mikewarren.myCaseScraper.utils

import org.openqa.selenium.By
import org.openqa.selenium.OutputType
import org.openqa.selenium.WebDriver
import org.openqa.selenium.WebElement
import org.openqa.selenium.support.events.WebDriverEventListener

import com.kms.katalon.core.configuration.RunConfiguration















public class InitialNavigationEventListener implements WebDriverEventListener {
	
	private boolean hasInitialNavigationHappened = false;

	@Override
	public void afterAlertAccept(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterAlertDismiss(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterChangeValueOf(WebElement arg0, WebDriver arg1, CharSequence[] arg2) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterClickOn(WebElement arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public <X> void afterGetScreenshotAs(OutputType<X> arg0, X arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterGetText(WebElement arg0, WebDriver arg1, String arg2) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterNavigateBack(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterNavigateForward(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterNavigateRefresh(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterNavigateTo(String url, WebDriver driver) {
		if (this.hasInitialNavigationHappened)
			return;
			
		Runtime.getRuntime().exec("${RunConfiguration.getProjectDir()}/proxyLogin.exe");
			
		this.hasInitialNavigationHappened = true;
		WebDriverUtils.WaitForProxyAuthentication(5);
	}

	@Override
	public void afterScript(String arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void afterSwitchToWindow(String arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeAlertAccept(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeAlertDismiss(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeChangeValueOf(WebElement arg0, WebDriver arg1, CharSequence[] arg2) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeClickOn(WebElement arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeFindBy(By arg0, WebElement arg1, WebDriver arg2) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public <X> void beforeGetScreenshotAs(OutputType<X> arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeGetText(WebElement arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeNavigateBack(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeNavigateForward(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeNavigateRefresh(WebDriver arg0) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeNavigateTo(String arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeScript(String arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void beforeSwitchToWindow(String arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}

	@Override
	public void onException(Throwable arg0, WebDriver arg1) {
		// TODO Auto-generated method stub
		
	}
	
}

Thereā€™s a lot of methods, all but one of them empty, but the important one is this one:

	@Override
	public void afterNavigateTo(String url, WebDriver driver) {
		if (this.hasInitialNavigationHappened)
			return;
			
		Runtime.getRuntime().exec("${RunConfiguration.getProjectDir()}/proxyLogin.exe");
			
		this.hasInitialNavigationHappened = true;
		WebDriverUtils.WaitForProxyAuthentication(5);
	}

There is quite a bit happening here:

  • we have some flag: hasInitialNavigationHappened, which only gets set once the initial navigation happens
  • we run our AutoIT executable proxyLogin.exe. (Weā€™ll discuss this later)
  • finally, we wait for the proxy authentication. Right now, we define that as simply as there being no more modals presentā€¦

WebDriverUtils.WaitForProxyAuthentication()

The code for that is thus:

	public static boolean WaitForProxyAuthentication(int timeOut) { 
		final long startTime = System.currentTimeMillis();
		while ((!this.wasProxyAuthenticationDone) || (System.currentTimeMillis() < startTime + timeOut * 1000)) { 
			this.wasProxyAuthenticationDone = WebUI.verifyAlertNotPresent(1, FailureHandling.OPTIONAL);
		}
			
		if (!this.wasProxyAuthenticationDone)
			KeywordUtil.markFailed("Proxy authentication was NOT done after ${(System.currentTimeMillis() - startTime) / 1000} seconds...")	
		
		return this.wasProxyAuthenticationDone;
	}

AutoIT

This is a Windows program, almost as old as Java and the internet itself. Itā€™s older than YouTube!

It is meant to build BASIC-like Scripts that automate tasks in Windows. This makes it perfect for handling stuff that Selenium WebDriver/Katalon Studio canā€™t.

To start, download the AutoIT full installation here . Once downloaded, we go through the installation instructionsā€¦

On the Start Menu, we open the SciTE Script Editor under the folder AutoIT v3. Then, we create our script:

Send(EnvGet("OXYLABS_USERNAME"))
Send("{TAB}")
Send(EnvGet("OXYLABS_PASSWORD"))
Send("{ENTER}")

which we save as proxyLogin.au3 in our Projectā€™s folder.

Of note, EnvGet is the method for getting environment variables.

Once saved, we go to Tools > Build to build our executable program. We should build it to our Projectā€™s folder as wellā€¦

We run our Test Case, but we see stray ChromeDriver openā€¦

If we were to run the Test Case, as the WebDriverUtils is right now, we will have a stray Chrome driver window open, every time we run it.

Why is that?

Remember that we have driver, but havenā€™t done anything with it except wrap an EventFiringWebDriver around it? That is the cause of this. Letā€™s fix it right now:

  • first make driver a private static WebDriver field
  • second, in our CloseDriver() method, we driver.quit() if driver != null
	public static void CloseDriver() {
		DriverFactory.closeWebDriver();
		
		if (this.driver != null)
			this.driver.quit();
		
		if (this.proxy != null)
			this.proxy.stop();
	}

Conclusion

Now you are ready to begin scraping, without fear of exposing your machineā€™s IP address to the site(s) you are scraping!

There are other measures that you should take, to avoid getting caught (and blocked) by the site(s) you want to scrape, but we can cover those in other postsā€¦

IMPORTANT UPDATE

For some odd reason, the proxy in this post isnā€™t based in Indianapolis (which is what I was looking for), despite specifying it in getProxy():

		proxy.chainedProxyAuthorization("customer-${System.getenv('OXYLABS_USERNAME')}-cc-us-st-us_indiana-city-indianapolis".toString(),
				System.getenv("OXYLABS_PASSWORD"),
				AuthType.BASIC);

To fix this, we will have to do it in the AutoIT script, replacing

Send(EnvGet("OXYLABS_USERNAME"))

with

Send("customer-" & EnvGet("OXYLABS_USERNAME") & "-cc-us-st-us_indiana-city-indianapolis")

I did this and the Indiana specific target I was tryna scrape, was working!

NOTE: For anyone following these instructions, who wish to scrape based in another CIty, I encourage you to follow the original link.

1 Like