bug#43598: replace-in-string: finishing touches

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

bug#43598: replace-in-string: finishing touches

Lars Ingebrigtsen
Mattias Engdegård <[hidden email]> writes:

> The new replace-in-string function is welcome but needs a few tweaks
> before we can call it done:
> 1. It doesn't quite work correctly with raw bytes:
>   (replace-in-string "\377" "x" "a\377b")
>   => "axb"
>   (replace-in-string "\377" "x" "a\377ø")
>   => "a\377ø"
> The easiest solution is to reimplement it in terms of
> replace-regexp-in-string for now, and optimise it later (although I
> feel a bit bad undoing Lars's pretty handiwork...)

The point of the function is to have something very lightweight, so if
it's reimplemented on top of replace-regexp-in-string, there's not much
point of the function.

> We have messy semantics here, because string-equal does not equate
> "\377" and (string-to-multibyte "\377"), but string-match-p does...

Yes, I don't even know what the semantics should be.

(string-replace "\377" "x" "a\377ø")
=> "axø"

would make sense, but what about

(string-replace "\270" "x" "a\377ø")
=> ?

(\270 is the last byte in the ø.)

Doing anything here wouldn't make much sense at all, which means...  we
could just throw up our hands and say "don't do that, then", which is
approx. what string-equal does.

> 2. It is documented always to return a new string, but that's a tad
> over-generous nowadays; very few string functions do that. If we drop
> that guarantee, we get some optimisation opportunities:
> - it can return the input string itself if no matches were found (a
> fairly common case)
> - it can be marked pure, not just side-effect-free, so that the byte
> compiler can constant-propagate through calls to it

Yup, good idea.

> 3. The name is somewhat unfortunate since a function by that name in
> XEmacs uses regexp matching.
> In fact, the new function probably broke prolog-mode because of that
> (see prolog-replace-in-string).
> While we can fix prolog-mode, we can't easily fix code outside the
> Emacs tree that may have similar problems.
> Perhaps we should rename it to string-replace, in line with the modern
> naming convention discussed some time ago.

string-replace seems like a good name.

(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no