> Another nuance was found in ruby, which cannot scan the haystack with invalid UTF-8 byte sequences.
This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.
This person is a senior engineer on their Team page. All they had to do was google "ArgumentError: invalid byte sequence in UTF-8". Or ask a coworker... the company has Ruby on Rails applications. headdesk
burntsushi 2 hours ago [-]
The nuance is specifically relevant here because neither of the other two regex engines benchmarked have this requirement. It's doubly relevant because that means running a regex search doesn't require a UTF-8 validation step, and is therefore likely beneficial from a perf perspective, dependening on the workload.
kayodelycaon 1 hours ago [-]
That’s a good point. I hadn’t considered it because I’ve hit the validation error long before getting to search. It is possible to avoid string operations with careful coding prior to the search.
Edit: After a little testing, the strings can be read from and written to files without triggering validation. Presumably this applies to sockets as well.
yxhuvud 4 hours ago [-]
Eww, pretending to support utf8 matchers while not supporting them at all was not pretty to see.
gitroom 3 hours ago [-]
Honestly that part bugs me, fake support is worse than no support imo
This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.
This person is a senior engineer on their Team page. All they had to do was google "ArgumentError: invalid byte sequence in UTF-8". Or ask a coworker... the company has Ruby on Rails applications. headdeskEdit: After a little testing, the strings can be read from and written to files without triggering validation. Presumably this applies to sockets as well.