Hey @Monty_Bagati — thanks for the detailed breakdown. I’d like to add some nuance because a few of these points conflate two very different things, which matters a lot for anyone trying to pick a strategy for their automation.
Quick intro for context : I’m Julien Mer, creator of OculiX, the modernized fork of SikuliX (the image-based automation project born at MIT CSAIL in 2009). I’ve been working on visual test automation for 20+ years, and OculiX is my attempt to bring the 15-year-old SikuliX codebase into the 2026 stack — Java 25, modern Recorder, MCP server, Paddle OCR, Tesseract, VNC/SSH/Citrix remote, Apple Silicon support. MIT license, actively maintained, and Katalon’s internal bundle of Sikuli 1.0.2 has been superseded by OculiX for a while now — relevant to one of your own “known limitations” in 11.0.0 release notes that explicitly mentions Sikuli 1.0.2 × Java 21 incompatibility. That’s literally fixed in OculiX because we are on Java 25.
Now to the technical points :
1. Image comparison ≠ image recognition. The whole post treats “visual testing” as one block. It isn’t. There are two fundamentally different techniques that get lumped together :
- Visual regression testing (pixel-diff against a baseline) — à la Applitools, Percy, Katalon Visual Testing. This one IS resolution-sensitive because you’re comparing the whole viewport byte-by-byte.
- Visual automation via template matching (OpenCV cross-correlation with similarity threshold) — à la SikuliX/OculiX. This one is NOT fundamentally resolution-sensitive. Similarity < 1.0 tolerates sub-pixel variance, scaling, anti-aliasing differences. Multi-scale matching (resize the template by ±20%) handles DPI scaling natively. Region-constrained search handles scrollable elements without a fixed viewport.
These are not the same technology and conflating them leads to advice that’s right for one and wrong for the other.
2. “XPath is resolution-independent” — true but irrelevant for the use cases that matter. The whole premise of visual automation is the scenarios where there is no DOM at all :
- Citrix / RDP / ICA remote desktops (the actual thread context, I suspect)
- Legacy Flash / Silverlight / Java Applet apps
- Canvas-based apps (CAD tools, map viewers, games, dashboards, engineering software)
- Electron / Unity / WPF native windows that expose nothing over accessibility trees
- Mainframe 3270 emulators
None of these have XPath or CSS selectors. Recommending “use stable locators” doesn’t help because there are no locators. This is the exact niche where SikuliX/OculiX have been the default answer since 2009.
3. DPI scaling is handled at the framework level in mature visual tools, not worked around by the user. Telling users to “standardize 100% DPI, use headless Chrome, force 1920×1080” is a workflow constraint, not a framework solution. Proper visual automation abstracts this :
Pattern("x.png").similar(0.85) absorbs minor rendering differences from anti-aliasing and sub-pixel positioning.
- Multi-scale matching resolves DPI scaling variance without user intervention.
- OCR (Tesseract, PaddleOCR) is the orthogonal answer to “find this text” : screen-rendered text stays readable across resolutions much better than pixel-matching.
- Region-of-interest constraints let you search inside a known window even if its position shifts.
In OculiX specifically, we expose all four primitives as MCP tools so any LLM agent can do visual navigation without caring about the underlying resolution.
4. Offset-based clicks are a last resort, not a recommended pattern. The whole Solution 2 section about WebUI.clickOffset(testObject, offsetX, offsetY) is the brittle path. A well-designed visual automation flow doesn’t click offsets from the element center — it captures a smaller, stable sub-image of the actual target button and clicks the match center. Completely resolution-adaptive by construction.
In practice, if someone is facing “tests break when I change screen resolution”, the question is rarely about “standardizing the environment” — it’s about the right tool for the job :
- Standard web app with accessible DOM → locator-based ( Selenium, Playwright, Katalon Web UI ).
- Remote desktop / no-DOM app / legacy UI / cross-OS visual flow → template-matching-based ( OculiX, SikuliX ).
- Text-heavy UI where you want “click the button that says X” → OCR-based search (OculiX exposes this natively via the
oculix_find_text MCP tool).
Mixing the wrong approach and the wrong tool is where resolution-fragility comes from, not from visual automation itself.
Happy to go deeper on any of these if you want and if Katalon is interested in updating its bundled SikuliX version past the 2015 vintage (your release notes hint at that being a pain point), we’re very open to collaboration. OculiX is MIT, so you could literally ship a fresh Java-25-compatible oculixapi.jar in your next release and the Sikuli 1.0.2 × Java 21 incompatibility goes away by itself.
Cheers.
Julien