Interesting problem.
According to wikipedia there are many UNICODE characters that look -
or ー
, very similar each other but have different UNICODE code point.
U+002D - HYPHEN-MINUS
U+2010 ‐ HYPHEN
U+2212 − MINUS SIGN
U+2013 – EN DASH
U+2014 — EM DASH
In your target HTML, which -
like character is used? — it is difficult to know.
How can you verify a message with those ambiguous -
like characters? My bet is to disregard them. You need a trick. I will post my solution later.