Comparing 2 Excel Files for differences

Hi,

This is the first time I’ve tried this so I’ll attempt to explain it as best I can:

We run an overnight webjob that create new billings which is then output into excel and saved locally.
What I need to be able to do is compare yesterdays excel file with todays file and check for differences if this is possible at all.
The filename is Bill_2020-10-13 based on the todays date - the single sheet inside that file will always have the same name.

I’ve got literally nothing in place yet so any starting point would be great

hello,

read both files row by row to list and compare then results
here some where to start
https://www.codejava.net/coding/how-to-read-excel-files-in-java-using-apache-poi

I would use WinMerge with xdocdiff plugin

http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html

hi @kazurayam

I guess this Winmerge use is not easy to automate

As you know, we can invoke any command in the commandline from Katalon’s TestCase script:

And WinMerge has the Commandline Mode:

https://manual.winmerge.org/en/Command_line.html

So, a Katalon’s TestCase script can start WinMerge.

But how to let it compile another format of diff report, how to stop it automatically? — I do not know. Not necessary for @christian perhaps.

Thanks both,

I will update if I run into any issues along the way but thanks for the help.

Hello,

I did some playing with this…
something like this way

TESTCASE:
import static com.kms.katalon.core.checkpoint.CheckpointFactory.findCheckpoint
import static com.kms.katalon.core.testcase.TestCaseFactory.findTestCase
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
import static com.kms.katalon.core.testobject.ObjectRepository.findTestObject
import static com.kms.katalon.core.testobject.ObjectRepository.findWindowsObject

import java.io.File

import com.kms.katalon.core.checkpoint.Checkpoint as Checkpoint
import com.kms.katalon.core.cucumber.keyword.CucumberBuiltinKeywords as CucumberKW
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as Mobile
import com.kms.katalon.core.model.FailureHandling as FailureHandling
import com.kms.katalon.core.testcase.TestCase as TestCase
import com.kms.katalon.core.testdata.TestData as TestData
import com.kms.katalon.core.testng.keyword.TestNGBuiltinKeywords as TestNGKW
import com.kms.katalon.core.testobject.TestObject as TestObject
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WS
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI
import com.kms.katalon.core.windows.keyword.WindowsBuiltinKeywords as Windows
import internal.GlobalVariable as GlobalVariable

import org.apache.commons.io.FileUtils

String downloadPath = "C:/Users/" + System.getProperty("user.name")+ "/Downloads/"
String yesterdayFilePath = "C:/Users/" + System.getProperty("user.name")+ "/Downloads/yesterday/"
String currentFilePath = "C:/Users/" + System.getProperty("user.name")+ "/Downloads/current/"


//get yesterday filepath & filename
def fileLatest = getLastDownloadedFile(downloadPath)

println fileLatest[0]
println fileLatest[1]

String yesterdayFile = fileLatest[0]
String yesterdayFileName = fileLatest[1]

//copy latest file to another directory
copyFile(yesterdayFile, yesterdayFilePath)

//TODO download current file from UI to download folder

//get current file from download folder
fileLatest = getLastDownloadedFile(downloadPath)

println fileLatest[0]
println fileLatest[1]

String currentFile = fileLatest[0]
String currentFileName = fileLatest[1]

//copy latest file to another directory
copyFile(currentFile, currentFilePath)

String yFilePath = yesterdayFile
String cFilePath = currentFile

def count =  CustomKeywords.'readExcelRows.ReadRows.getCountOfRows'(yFilePath)
println count

int failureCounter = 0

List<String> yesterday = new ArrayList<>()
List<String> today = new ArrayList<>()
boolean isEqual = false
int startRow = 1
int endRow = 2
for (int i = 1; i < 10; i++){ //TODO change value 10 to value of count of used rows in excel sheet 0
	
	//read yesterday
	yesterday = CustomKeywords.'readExcelRows.ReadRows.readExcelRows'(startRow, endRow, yFilePath )
	
	//read current excel
	today = CustomKeywords.'readExcelRows.ReadRows.readExcelRows'(startRow, endRow, cFilePath )
	
	//do diff
	isEqual = compareLists(yesterday, today)
	if (isEqual){
		println("rows ["+startRow+"] are equal")
	}
	else{
		println("rows ["+startRow+"] are not equal")
		failureCounter++
	}
	startRow++
	endRow++
}

//if no failures then remove files from the temp folders
if(failureCounter == 0){
	File yes = new File(yesterdayFilePath)
	File curr = new File(currentFilePath)
	deleteFileFromFolder(yes)
	deleteFileFromFolder(curr)
}

//if test pass then delete temp files from folder tomorrow & current implemnted in listener
public static deleteFileFromFolder(File directory){
	
	FileUtils.cleanDirectory(directory);
}

public static boolean compareLists(List<String> yesterday, List<String>today){
	
	// intersection as set
	Set<String> intersect = new HashSet<String>(yesterday);
	intersect.retainAll(today);

	// intersection/union as list
	List<String> intersectList = new ArrayList<String>();
	intersectList.addAll(yesterday);
	intersectList.addAll(today);
	intersectList.retainAll(intersect);

	// original lists are structurally unmodified
	System.out.println(yesterday);
	System.out.println(today);
	
	boolean isEqual = yesterday.equals(today);
	
	return isEqual
}

public static copyFile(String file1, String file2) throws IOException
{
	File srcFile = new File(file1);
	File trgDir = new File(file2);

	FileUtils.copyFileToDirectory(srcFile, trgDir);
}

public static deleteFile(String path)
{
	File file = new File(path);
	  
	if(file.delete())
	{
		System.out.println("File deleted successfully");
	}
	else
	{
		System.out.println("Failed to delete the file");
	}
}


public static List<String> getLastDownloadedFile(String downloadPath) {
	
	List<String> array = new ArrayList<>()
	
	File choice = null;
	try {
		File fl = new File(downloadPath);
		File[] files = fl.listFiles(new FileFilter() {
			public boolean accept(File file) {
				return file.isFile();
			}
		});
		//Sleep to download file if not required can be removed
		Thread.sleep(2000);
		long lastMod = Long.MIN_VALUE;

		for (File file : files) {
			if (file.lastModified() > lastMod) {
				choice = file;
				lastMod = file.lastModified();
			}
		}
	} catch (Exception e) {
		System.out.println("Exception while getting the last download file :"
				+ e.getMessage());
	}
	array.add(choice)
	array.add(choice.getName())
	return array;
}

CUSTOM KEYWORD:
package readExcelRows

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

import com.kms.katalon.core.annotation.Keyword

public class ReadRows {

	Workbook workbook

	@Keyword
	public int getCountOfRows(String path){

		// Creating a Workbook from an Excel file (.xls or .xlsx)
		workbook = WorkbookFactory.create(new File(path));
		//Sheet sheet = workbook.getSheet("sheet0");
		Sheet sheet = workbook.getSheetAt(0);
		int lastRowNum = sheet.getLastRowNum()

		int used = sheet.getRow(0).getPhysicalNumberOfCells();
		println "rows used "+used

		int last = sheet.getRow(0).getLastCellNum();
		println "rows last "+last

		return lastRowNum;
	}

	@Keyword
	public List<String> readExcelRows(int start, int end, String path) throws IOException, InvalidFormatException{

		List<String> excelValues = new ArrayList<String>();

		// Creating a Workbook from an Excel file (.xls or .xlsx)
		workbook = WorkbookFactory.create(new File(path));

		boolean last = false

		boolean actualValue;
		int lastRowNum = sheet.getLastRowNum()
		lastRowNum++
		if (start == lastRowNum){
			last = true
		}
		//System.out.println("sheet rows "+lastRowNum);
		int startRow = 0
		int endRow = 0;
		for (Row row: sheet) {
			for(Cell cell: row) {
				if (startRow >= start && endRow < end && last==false){

					if (cell == null || cell.getCellType() == Cell.CELL_TYPE_BLANK) {
						excelValues.add("**no value**");
					}else{
						String cellValue = dataFormatter.formatCellValue(cell);
						excelValues.add(cellValue);
					}

				}
				if (last){
					String cellValue = dataFormatter.formatCellValue(cell);
					excelValues.add(cellValue);
				}
				//}
			}
			startRow++;
			endRow++;
		}


		// Closing the workbook
		//workbook.close();

		return excelValues;
	}
}

some sample results
[1, CA-2016-152156, 08/11/2016, 11/11/2016, Second Class, CG-12520, Claire Gute, Consumer, United States, Henderson, Kentucky, 42420, South, FUR-BO-10001798, Furniture, Bookcases, Bush Somerset Collection Bookcase, 261,96, 2, 0, 41,9136]
[1, CA-2016-152156, 08/11/2016, 11/11/2016, Second Class, CG-12520, Claire Gute, Consumer, United States, Henderson, Kentucky, 42420, South, FUR-BO-10001798, Furniture, Bookcases, Bush Somerset Collection Bookcase, 261,96, 2, 0, 41,9136]
rows [1] are equal
2020-10-24 16:05:41.684 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:42.284 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[2, CA-2016-152156, 08/11/2016, 11/11/2016, Second Class, CG-12520, Claire Gute, Consumer, United States, Henderson, Kentucky, 42420, South, FUR-CH-10000454, Furniture, Chairs, Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back, 731,94, 3, 0, 219,582]
[2, CA-2016-152156, 08/11/2016, 11/11/2016, Second Class, CG-12520, Claire Gute, Consumer, United States, Henderson, Kentucky, 42420, South, FUR-CH-10000454, Furniture, Chairs, Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back, 731,94, 3, 0, 219,582]
rows [2] are equal
2020-10-24 16:05:42.805 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:43.367 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[3, CA-2016-138688, 12/06/2016, 16/06/2016, Second Class, DV-13045, Darrin Van Huff, Corporate, United States, Los Angeles, California, 90036, West, OFF-LA-10000240, Office Supplies, Labels, Self-Adhesive Address Labels for Typewriters by Universal, 14,62, 2, 0, 6,8714]
[3, CA-2016-138688, 12/06/2016, 16/06/2016, Second Class, DV-13045, Darrin Van Huff, Corporate, United States, Los Angeles, California, 90036, West, OFF-LA-10000240, Office Supplies, Labels, Self-Adhesive Address Labels for Typewriters by Universal, 14,62, 2, 0, 6,8714]
rows [3] are equal
2020-10-24 16:05:43.914 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:44.377 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[4, US-2015-108966, 11/10/2015, 18/10/2015, Standard Class, SO-20335, Sean O’Donnell, Consumer, United States, Fort Lauderdale, Florida, 33311, South, FUR-TA-10000577, Furniture, Tables, Bretford CR4500 Series Slim Rectangular Table, 957,5775, 5, 0,45, -383,031]
[4, US-2015-108966, 11/10/2015, 18/10/2015, Standard Class, SO-20335, Sean O’Donnell, Consumer, United States, Fort Lauderdale, Florida, 33311, South, FUR-TA-10000577, Furniture, Tables, Bretford CR4500 Series Slim Rectangular Table, 957,5775, 5, 0,45, -383,031]
rows [4] are equal
2020-10-24 16:05:45.188 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:45.793 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[5, US-2015-108966, 11/10/2015, 18/10/2015, Standard Class, SO-20335, Sean O’Donnell, Consumer, United States, Fort Lauderdale, Florida, 33311, South, OFF-ST-10000760, Office Supplies, Storage, Eldon Fold 'N Roll Cart System, 22,368, 2, 0,2, 2,5164]
[5, US-2015-108966, 11/10/2015, 18/10/2015, Standard Class, SO-20335, Sean O’Donnell, Consumer, United States, Fort Lauderdale, Florida, 33311, South, OFF-ST-10000760, Office Supplies, Storage, Eldon Fold 'N Roll Cart System, 22,368, 2, 0,2, 2,5164]
rows [5] are equal
2020-10-24 16:05:46.263 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:46.706 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[6, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, FUR-FU-10001487, Furniture, Furnishings, Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood, 48,86, 7, 0, 14,1694]
[6, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, FUR-FU-10001487, Furniture, Furnishings, Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood, 48,86, 7, 0, 14,1694]
rows [6] are equal
2020-10-24 16:05:47.372 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:48.014 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[7, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, OFF-AR-10002833, Office Supplies, Art, Newell 322, 7,28, 4, 0, 1,9656]
[7, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, OFF-AR-10002833, Office Supplies, Art, Newell 322, 7,28, 4, 0, 1,9656]
rows [7] are equal
2020-10-24 16:05:48.586 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:49.061 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[8, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, TEC-PH-10002275, Technology, Phones, Mitel 5320 IP Phone VoIP phone, 907,152, 6, 0,2, 90,7152]
[8, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, TEC-PH-10002275, Technology, Phones, Mitel 5320 IP Phone VoIP phone, 907,152, 6, 0,2, 90,7152]
rows [8] are equal
2020-10-24 16:05:49.602 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
2020-10-24 16:05:50.241 INFO k.k.c.m.CustomKeywordDelegatingMetaClass - readExcelRows.ReadRows.readExcelRows is PASSED
[9, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, OFF-BI-10003910, Office Supplies, Binders, DXL Angle-View Binders with Locking Rings by Samsill, 18,504, 3, 0,2, 5,7825]
[9, CA-2014-115812, 09/06/2014, 14/06/2014, Standard Class, BH-11710, Brosina Hoffman, Consumer, United States, Los Angeles, California, 90032, West, OFF-BI-10003910, Office Supplies, Binders, DXL Angle-View Binders with Locking Rings by Samsill, 18,504, 3, 0,2, 5,7825]
rows [9] are equal