bug#43598: replace-in-string: finishing touches

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#43598: replace-in-string: finishing touches

Lars Ingebrigtsen
Lars Ingebrigtsen <[hidden email]> writes:

> That is, we could just say "the results are undefined if the strings
> contain raw bytes".  Well, rather, if both strings are raw bytes, or
> none of them are, then it's well-defined, but not otherwise.

Or...  OK, I've never actually looked at the strings this closely, I've
just used the various accessors which hide all the complexity.

So: "a\377ø" is a multibyte string with five bytes (the "raw byte" is in
the private plane).

"a\377a" is a unibyte string with three bytes.

So searching for "\377" (one-byte unibyte string) and (make-string 1
255) (two-byte multibyte string) should be well-defined in either
combination here?

"\377" is in both "a\377ø" and "a\377a".

(make-string 1 255) is in neither "a\377ø", nor "a\377a".

And:

(eq (elt (make-string 1 255) 0) (elt "\377" 0))
=> t

But, like, whatevs.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#43598: replace-in-string: finishing touches

Eli Zaretskii
> From: Lars Ingebrigtsen <[hidden email]>
> Date: Fri, 25 Sep 2020 01:18:13 +0200
> Cc: [hidden email]
>
> So: "a\377ø" is a multibyte string with five bytes (the "raw byte" is in
> the private plane).
>
> "a\377a" is a unibyte string with three bytes.
>
> So searching for "\377" (one-byte unibyte string) and (make-string 1
> 255) (two-byte multibyte string) should be well-defined in either
> combination here?
>
> "\377" is in both "a\377ø" and "a\377a".
>
> (make-string 1 255) is in neither "a\377ø", nor "a\377a".
>
> And:
>
> (eq (elt (make-string 1 255) 0) (elt "\377" 0))
> => t

Would it help to always convert the first argument of
replace-in-string to a multibyte string, before replacing?



Reply | Threaded
Open this post in threaded view
|

bug#43598: replace-in-string: finishing touches

Lars Ingebrigtsen
Eli Zaretskii <[hidden email]> writes:

> Would it help to always convert the first argument of
> replace-in-string to a multibyte string, before replacing?

Yes, but not when the third argument is a unibyte string.

I've now done the conversion in the new string-search C-level function,
converting the search string both ways, depending on what the HAYSTACK
string is.  I'm not 100% sure that I'm doing the right thing here,
though, but it seems to pass all the test cases I could come up with.  I
wrote it very late last night, though, so...  :-/

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no