Oddities with dynamic modules

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Oddities with dynamic modules

Eli Zaretskii
Having written the documentation of the module API, I couldn't help
but notice a few oddities about its repertory.  I list below the
issues that caused me to raise a brow, for the record:

 . Why do we have functions to access vectors, but not functions to
   access lists?  I always thought lists were more important for Emacs
   than vectors.  If we are asking users to use 'intern' to access
   'car' and 'cdr', why not 'aref' and 'aset'?

 . Why aren't there API functions to _create_ lists and vectors?

 . Using 'funcall' is unnecessarily cumbersome, because the function
   to be called is passed as an 'emacs_value'.  Why don't we have a
   variant that just accepts a name of a Lisp-callable function as a C
   string?

 . Why does 'intern' only handle pure ASCII symbol names?  It's not
   like supporting non-ASCII names is hard.

 . I could understand why equality predicates are not provided in the
   API, but I don't understand why we do provide 'eq' there.  Is it
   that much more important than the other predicates?

IOW, if the API was supposed to be minimal, it looks like it isn't;
and if it wasn't supposed to be minimal, then some important/popular
functions are strangely missing, for reasons I couldn't wrap my head
around.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Kaushal Modi
On Thu, Oct 11, 2018 at 2:14 PM Eli Zaretskii <[hidden email]> wrote:
>
> Having written the documentation of the module API,

Thanks for writing up all that documentation!
https://git.savannah.gnu.org/cgit/emacs.git/commit/?h=emacs-26&id=ce8b4584a3c69e5c4abad8a0a9c3781ce8c0c1f8

I am not in the capacity to comment on most of your questions as I am
using the Modules feature to get around my lack of C knowledge :)

>  . Why aren't there API functions to _create_ lists and vectors?
>
>  . Using 'funcall' is unnecessarily cumbersome, because the function
>    to be called is passed as an 'emacs_value'.  Why don't we have a
>    variant that just accepts a name of a Lisp-callable function as a C
>    string?

+1

I needed to create some sugar syntax in Nim (which compiles to C) to
get around that limitation:

proc MakeList*(env: ptr emacs_env; listArray: openArray[emacs_value]):
emacs_value =
## Return an Emacs-Lisp ``list``.
Funcall(env, "list", listArray)


proc MakeCons*(env: ptr emacs_env; consCar, consCdr: emacs_value): emacs_value =
## Return an Emacs-Lisp ``cons``.
Funcall(env, "cons", [consCar, consCdr])

It would be nice to have API for list (and cons).

>  . Why does 'intern' only handle pure ASCII symbol names?  It's not
>    like supporting non-ASCII names is hard.
>
>  . I could understand why equality predicates are not provided in the
>    API, but I don't understand why we do provide 'eq' there.  Is it
>    that much more important than the other predicates?

I had the same question too. I find equal more useful than eq.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Do., 11. Okt. 2018 um 20:13 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> Having written the documentation of the module API, I couldn't help
> but notice a few oddities about its repertory.  I list below the
> issues that caused me to raise a brow, for the record:
>
>  . Why do we have functions to access vectors, but not functions to
>    access lists?  I always thought lists were more important for Emacs
>    than vectors.  If we are asking users to use 'intern' to access
>    'car' and 'cdr', why not 'aref' and 'aset'?
>
>  . Why aren't there API functions to _create_ lists and vectors?

I guess these are mostly historical. These were introduced in
https://github.com/aaptel/emacs-dynamic-module/commit/016e8b6ffdfb861806957bb84c419a3d65caedb7,
but I don't remember the background.

>
>  . Using 'funcall' is unnecessarily cumbersome, because the function
>    to be called is passed as an 'emacs_value'.  Why don't we have a
>    variant that just accepts a name of a Lisp-callable function as a C
>    string?

Convenience is not a design goal of the module API. The primary design
goals are robustness, stability, simplicity, and minimalism.

>
>  . Why does 'intern' only handle pure ASCII symbol names?  It's not
>    like supporting non-ASCII names is hard.

Unfortunately it is, due to Emacs underspecifying encoding. If we can
manage to write an 'intern' function that accepts UTF-8 strings and
only UTF-8 strings, I'm all for it.

>
>  . I could understand why equality predicates are not provided in the
>    API, but I don't understand why we do provide 'eq' there.  Is it
>    that much more important than the other predicates?

Yes, it represents a fundamental property of objects.

>
> IOW, if the API was supposed to be minimal, it looks like it isn't;
> and if it wasn't supposed to be minimal, then some important/popular
> functions are strangely missing, for reasons I couldn't wrap my head
> around.

It is *mostly* minimal. A *completely* minimal API would not even have
integer and floating-point conversion functions, as those can be
written using the string functions. But that would be far less simple
and robust.
"eq" and "is_not_nil" are special in that they implement access to
fundamental object properties and can't fail, so they are fundamental
enough to deserve an entry in the module table.

The best source to answer the "why" questions is still the original
design document:
https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
> From: Philipp Stephani <[hidden email]>
> Date: Sun, 10 Feb 2019 21:23:18 +0100
> Cc: Emacs developers <[hidden email]>
>
> >  . Using 'funcall' is unnecessarily cumbersome, because the function
> >    to be called is passed as an 'emacs_value'.  Why don't we have a
> >    variant that just accepts a name of a Lisp-callable function as a C
> >    string?
>
> Convenience is not a design goal of the module API. The primary design
> goals are robustness, stability, simplicity, and minimalism.

I thought simplicity and convenience of use tramps simplicity of the
implementation, so it is strange to read arguments to the contrary.
IME, inconvenient interfaces are the main reason for them being
unstable, but that's me.

> >  . Why does 'intern' only handle pure ASCII symbol names?  It's not
> >    like supporting non-ASCII names is hard.
>
> Unfortunately it is, due to Emacs underspecifying encoding. If we can
> manage to write an 'intern' function that accepts UTF-8 strings and
> only UTF-8 strings, I'm all for it.

What are the problems you have in mind?  After all, this is already
possible by means of 2 more function calls, as the example in the
manual shows.  Are there any problems to do the same under the hood,
instead of requiring users to do that explicitly in their module code?

> >  . I could understand why equality predicates are not provided in the
> >    API, but I don't understand why we do provide 'eq' there.  Is it
> >    that much more important than the other predicates?
>
> Yes, it represents a fundamental property of objects.

How is that relevant?  Equality predicates are used very frequently
when dealing with Lisp objects; 'eq' is not different from others in
that respect.

> > IOW, if the API was supposed to be minimal, it looks like it isn't;
> > and if it wasn't supposed to be minimal, then some important/popular
> > functions are strangely missing, for reasons I couldn't wrap my head
> > around.
>
> It is *mostly* minimal. A *completely* minimal API would not even have
> integer and floating-point conversion functions, as those can be
> written using the string functions. But that would be far less simple
> and robust.
> "eq" and "is_not_nil" are special in that they implement access to
> fundamental object properties and can't fail, so they are fundamental
> enough to deserve an entry in the module table.

I cannot follow this reasoning, sorry.  It sounds like you are saying
that the decision what to implement and what not justifies itself
because it's there.  All I can say is that as someone who wrote a
couple of lines of code in Emacs, the stuff that is in the API and the
omissions look quite arbitrary to me.

> The best source to answer the "why" questions is still the original
> design document:
> https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html

Which part(s) of that long document answer these questions, please?

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Yuri Khan
On Mon, Feb 11, 2019 at 10:46 PM Eli Zaretskii <[hidden email]> wrote:

> > Convenience is not a design goal of the module API. The primary design
> > goals are robustness, stability, simplicity, and minimalism.
>
> I thought simplicity and convenience of use tramps simplicity of the
> implementation, so it is strange to read arguments to the contrary.
> IME, inconvenient interfaces are the main reason for them being
> unstable, but that's me.

A good middle ground is a minimalistic (but sufficient) API at the
host side, plus idiomatic convenience wrapper libraries for each
client language. (The latter need not be maintained by the host API
maintainer.)

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
Am Mo., 11. Feb. 2019 um 17:04 Uhr schrieb Yuri Khan <[hidden email]>:

>
> On Mon, Feb 11, 2019 at 10:46 PM Eli Zaretskii <[hidden email]> wrote:
>
> > > Convenience is not a design goal of the module API. The primary design
> > > goals are robustness, stability, simplicity, and minimalism.
> >
> > I thought simplicity and convenience of use tramps simplicity of the
> > implementation, so it is strange to read arguments to the contrary.
> > IME, inconvenient interfaces are the main reason for them being
> > unstable, but that's me.
>
> A good middle ground is a minimalistic (but sufficient) API at the
> host side, plus idiomatic convenience wrapper libraries for each
> client language. (The latter need not be maintained by the host API
> maintainer.)

Yes, that's exactly what Daniel suggested in his initial design. It's
also what largely seems to be happening: people have written idiomatic
wrappers for Rust, Go, and probably other languages. Thus I'd say this
is working as inteded.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Mo., 11. Feb. 2019 um 16:46 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> > From: Philipp Stephani <[hidden email]>
> > Date: Sun, 10 Feb 2019 21:23:18 +0100
> > Cc: Emacs developers <[hidden email]>
> >
> > >  . Using 'funcall' is unnecessarily cumbersome, because the function
> > >    to be called is passed as an 'emacs_value'.  Why don't we have a
> > >    variant that just accepts a name of a Lisp-callable function as a C
> > >    string?
> >
> > Convenience is not a design goal of the module API. The primary design
> > goals are robustness, stability, simplicity, and minimalism.
>
> I thought simplicity and convenience of use tramps simplicity of the
> implementation, so it is strange to read arguments to the contrary.

Simplicity tends to be the opposite of convenience. Interface
simplicity definitely trumps convenience and implementation
simplicity.

>
> > >  . Why does 'intern' only handle pure ASCII symbol names?  It's not
> > >    like supporting non-ASCII names is hard.
> >
> > Unfortunately it is, due to Emacs underspecifying encoding. If we can
> > manage to write an 'intern' function that accepts UTF-8 strings and
> > only UTF-8 strings, I'm all for it.
>
> What are the problems you have in mind?  After all, this is already
> possible by means of 2 more function calls, as the example in the
> manual shows.  Are there any problems to do the same under the hood,
> instead of requiring users to do that explicitly in their module code?

If users want to have a truly generic "intern" function, they need to
do some legwork anyway because the signature of the "intern"
environment function doesn't allow embedded null bytes. It's also
unclear how non-Unicode symbols should be represented (if at all).
Given that almost all uses of "intern" will use ASCII symbols, it's
fine to restrict the API in this way. I've provided an example for a
wrapper function that allows at least interning of arbitrary Unicode
strings: https://phst.eu/emacs-modules#intern.

>
> > >  . I could understand why equality predicates are not provided in the
> > >    API, but I don't understand why we do provide 'eq' there.  Is it
> > >    that much more important than the other predicates?
> >
> > Yes, it represents a fundamental property of objects.
>
> How is that relevant?  Equality predicates are used very frequently
> when dealing with Lisp objects; 'eq' is not different from others in
> that respect.

I don't recollect the reasoning, but Daniel stated that "eq" is
strictly necessary, so you might want to ask him.

>
> > > IOW, if the API was supposed to be minimal, it looks like it isn't;
> > > and if it wasn't supposed to be minimal, then some important/popular
> > > functions are strangely missing, for reasons I couldn't wrap my head
> > > around.
> >
> > It is *mostly* minimal. A *completely* minimal API would not even have
> > integer and floating-point conversion functions, as those can be
> > written using the string functions. But that would be far less simple
> > and robust.
> > "eq" and "is_not_nil" are special in that they implement access to
> > fundamental object properties and can't fail, so they are fundamental
> > enough to deserve an entry in the module table.
>
> I cannot follow this reasoning, sorry.  It sounds like you are saying
> that the decision what to implement and what not justifies itself
> because it's there.  All I can say is that as someone who wrote a
> couple of lines of code in Emacs, the stuff that is in the API and the
> omissions look quite arbitrary to me.

Please see Daniel's original reasoning for the design. I'm not
claiming perfection for the module API, but it definitely strikes a
very good balance between minimalism and robustness.

>
> > The best source to answer the "why" questions is still the original
> > design document:
> > https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00960.html
>
> Which part(s) of that long document answer these questions, please?

Mostly the first sentence "We want an ABI powerful enough to let C
modules interact with Emacs, but decoupled enough to let the Emacs
core evolve independently."

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
In reply to this post by Philipp Stephani
> From: Philipp Stephani <[hidden email]>
> Date: Thu, 21 Mar 2019 21:04:07 +0100
> Cc: Eli Zaretskii <[hidden email]>, Emacs developers <[hidden email]>
>
> > > I thought simplicity and convenience of use tramps simplicity of the
> > > implementation, so it is strange to read arguments to the contrary.
> > > IME, inconvenient interfaces are the main reason for them being
> > > unstable, but that's me.
> >
> > A good middle ground is a minimalistic (but sufficient) API at the
> > host side, plus idiomatic convenience wrapper libraries for each
> > client language. (The latter need not be maintained by the host API
> > maintainer.)
>
> Yes, that's exactly what Daniel suggested in his initial design. It's
> also what largely seems to be happening: people have written idiomatic
> wrappers for Rust, Go, and probably other languages.

Where's the wrapper for C?

> Thus I'd say this is working as inteded.

I still find it strange (and can already hardly remember what I wrote
several months ago, so perhaps try to respond sooner next time?).

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
In reply to this post by Philipp Stephani
> From: Philipp Stephani <[hidden email]>
> Date: Thu, 21 Mar 2019 21:12:05 +0100
> Cc: Emacs developers <[hidden email]>
>
> > How is that relevant?  Equality predicates are used very frequently
> > when dealing with Lisp objects; 'eq' is not different from others in
> > that respect.
>
> I don't recollect the reasoning, but Daniel stated that "eq" is
> strictly necessary, so you might want to ask him.
> [...]
> > > It is *mostly* minimal. A *completely* minimal API would not even have
> > > integer and floating-point conversion functions, as those can be
> > > written using the string functions. But that would be far less simple
> > > and robust.
> > > "eq" and "is_not_nil" are special in that they implement access to
> > > fundamental object properties and can't fail, so they are fundamental
> > > enough to deserve an entry in the module table.
> >
> > I cannot follow this reasoning, sorry.  It sounds like you are saying
> > that the decision what to implement and what not justifies itself
> > because it's there.  All I can say is that as someone who wrote a
> > couple of lines of code in Emacs, the stuff that is in the API and the
> > omissions look quite arbitrary to me.
>
> Please see Daniel's original reasoning for the design.

I asked the questions after reading that, so please believe me that I
didn't find answers to my questions there.

> Mostly the first sentence "We want an ABI powerful enough to let C
> modules interact with Emacs, but decoupled enough to let the Emacs
> core evolve independently."

That means a judgment call, and I was questioning the judgment.
Saying that someone made a call doesn't explain why the decision was
what it was.


Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Do., 21. März 2019 um 21:17 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> > From: Philipp Stephani <[hidden email]>
> > Date: Thu, 21 Mar 2019 21:04:07 +0100
> > Cc: Eli Zaretskii <[hidden email]>, Emacs developers <[hidden email]>
> >
> > > > I thought simplicity and convenience of use tramps simplicity of the
> > > > implementation, so it is strange to read arguments to the contrary.
> > > > IME, inconvenient interfaces are the main reason for them being
> > > > unstable, but that's me.
> > >
> > > A good middle ground is a minimalistic (but sufficient) API at the
> > > host side, plus idiomatic convenience wrapper libraries for each
> > > client language. (The latter need not be maintained by the host API
> > > maintainer.)
> >
> > Yes, that's exactly what Daniel suggested in his initial design. It's
> > also what largely seems to be happening: people have written idiomatic
> > wrappers for Rust, Go, and probably other languages.
>
> Where's the wrapper for C?

It seems like there isn't enough interest in modules written in C for
such a wrapper. Or there is a wrapper and I just don't know about it.
The module API is usable as-is, and writing and maintaining a wrapper
is nontrivial work. For other languages wrappers are necessary, but C
users can use the plain API directly, and that's often a reasonable
choice. For example, the FFI module
(https://github.com/tromey/emacs-ffi/blob/master/ffi-module.c) is
complex enough that a wrapper library probably wouldn't make it much
shorter.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Do., 21. März 2019 um 21:25 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> > From: Philipp Stephani <[hidden email]>
> > Mostly the first sentence "We want an ABI powerful enough to let C
> > modules interact with Emacs, but decoupled enough to let the Emacs
> > core evolve independently."
>
> That means a judgment call, and I was questioning the judgment.
> Saying that someone made a call doesn't explain why the decision was
> what it was.
>

Well, feel free to CC Daniel and ask him directly. I agree with his
judgment call, but can't know what he might have been thinking
exactly.
TBH I haven't spent that much time on trying to figure out the exact
reasoning because potentially superfluous functions can't be removed
any more, so this question seems to be of largely historical interest.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
In reply to this post by Philipp Stephani
> From: Philipp Stephani <[hidden email]>
> Date: Thu, 21 Mar 2019 21:32:07 +0100
> Cc: Yuri Khan <[hidden email]>, Emacs developers <[hidden email]>
>
> > > Yes, that's exactly what Daniel suggested in his initial design. It's
> > > also what largely seems to be happening: people have written idiomatic
> > > wrappers for Rust, Go, and probably other languages.
> >
> > Where's the wrapper for C?
>
> It seems like there isn't enough interest in modules written in C for
> such a wrapper.

I think if we maintain that this is the job for a wrapper, we should
have such a wrapper in Emacs.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
In reply to this post by Philipp Stephani
> From: Philipp Stephani <[hidden email]>
> Date: Thu, 21 Mar 2019 21:34:39 +0100
> Cc: Emacs developers <[hidden email]>
>
> potentially superfluous functions can't be removed any more, so this
> question seems to be of largely historical interest.

No, it's not only of historical interest, because we can add
functions.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Do., 21. März 2019 um 21:46 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> > From: Philipp Stephani <[hidden email]>
> > Date: Thu, 21 Mar 2019 21:32:07 +0100
> > Cc: Yuri Khan <[hidden email]>, Emacs developers <[hidden email]>
> >
> > > > Yes, that's exactly what Daniel suggested in his initial design. It's
> > > > also what largely seems to be happening: people have written idiomatic
> > > > wrappers for Rust, Go, and probably other languages.
> > >
> > > Where's the wrapper for C?
> >
> > It seems like there isn't enough interest in modules written in C for
> > such a wrapper.
>
> I think if we maintain that this is the job for a wrapper, we should
> have such a wrapper in Emacs.

I certainly wouldn't mind having such a wrapper in Emacs core.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Philipp Stephani
In reply to this post by Eli Zaretskii
Am Do., 21. März 2019 um 21:51 Uhr schrieb Eli Zaretskii <[hidden email]>:

>
> > From: Philipp Stephani <[hidden email]>
> > Date: Thu, 21 Mar 2019 21:34:39 +0100
> > Cc: Emacs developers <[hidden email]>
> >
> > potentially superfluous functions can't be removed any more, so this
> > question seems to be of largely historical interest.
>
> No, it's not only of historical interest, because we can add
> functions.

That's true, and I agree for those we should find some clearer critera
than "best judgment" or philosophical vague principles like
"simplicity."
Each addition should be discussed separately. Possible criteria could be:
1. Is it possible to obtain the functionality by calling existing
functions? ("completeness")
2. Is it very difficult to replicate the functionality with the
existing API, and the difficulty would be reduced significantly by
introducing a new function? ("simplicity")
3. Is there a huge performance benefit in introducing a specialized function?
If none of these are fulfilled, then the function should probably not
be added ("simplicity").
For example, I'd vote for adding timespec and bignum conversion
functions based on (2).

Reply | Threaded
Open this post in threaded view
|

creating unibyte strings (was: Oddities with dynamic modules)

Stefan Monnier
>> > potentially superfluous functions can't be removed any more, so this
>> > question seems to be of largely historical interest.
>> No, it's not only of historical interest, because we can add
>> functions.

Which reminds me: could someone add to the module API a primitive to
build a *unibyte* string?  Currently it seems the only way to do that is
by building a multibyte string and then encoding it with utf-8, which is
both inefficient and risky (I know it's supposed to return exactly the
originally intended bytes, but it's a very round about way to do it
with lots of opportunity for bugs along the way).


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: creating unibyte strings (was: Oddities with dynamic modules)

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Date: Thu, 21 Mar 2019 21:26:32 -0400
>
> Which reminds me: could someone add to the module API a primitive to
> build a *unibyte* string?

I don't like adding such a primitive.  We don't want to proliferate
unibyte strings in Emacs through that back door, because manipulating
unibyte strings involves subtle issues many Lisp programmers are not
aware of.

Instead, how about doing that via vectors of byte values?  If there
are Emacs primitives that currently only accept strings, we could
extend them.

Reply | Threaded
Open this post in threaded view
|

Re: Oddities with dynamic modules

Eli Zaretskii
In reply to this post by Philipp Stephani
> From: Philipp Stephani <[hidden email]>
> Date: Thu, 21 Mar 2019 21:58:11 +0100
> Cc: Emacs developers <[hidden email]>
>
> > No, it's not only of historical interest, because we can add
> > functions.
>
> That's true, and I agree for those we should find some clearer critera
> than "best judgment" or philosophical vague principles like
> "simplicity."
> [...]
> For example, I'd vote for adding timespec and bignum conversion
> functions based on (2).

I agree with the last proposal.

But I also think that we need a lot more convenience wrappers even for
existing functionalities.  Any non-trivial module whose code I ever
saw is a clear evidence to that, as they all introduce their own
wrappers for practically the same purposes.

Reply | Threaded
Open this post in threaded view
|

Re: creating unibyte strings

Stefan Monnier
In reply to this post by Eli Zaretskii
>> Which reminds me: could someone add to the module API a primitive to
>> build a *unibyte* string?
> I don't like adding such a primitive.  We don't want to proliferate
> unibyte strings in Emacs through that back door, because manipulating
> unibyte strings involves subtle issues many Lisp programmers are not
> aware of.

I don't see what's subtle about "unibyte" strings, as long as you
understand that these are strings of *bytes* instead of strings
of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`).

"Multibyte" strings are just as subtle (maybe more so even), yet we
rightly don't hesitate to offer a primitive way to construct them.

> Instead, how about doing that via vectors of byte values?

What's the advantage?  That seems even more convoluted: create a Lisp
vector of the right size (i.e. 8x the size of your string on a 64bit
system), loop over your string turning each byte into a Lisp integer
(with the reverted API, this involves allocation of an `emacs_value`
box), then pass that to `concat`?

It's probably going to be even less efficient than going through utf-8
and back.

Think about cases where the module receives byte strings from the disk
or the network and need to pass that to `decode-coding-string`.
And consider that we might be talking about megabytes of strings.


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: creating unibyte strings

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Cc: [hidden email]
> Date: Fri, 22 Mar 2019 08:33:02 -0400
>
> >> Which reminds me: could someone add to the module API a primitive to
> >> build a *unibyte* string?
> > I don't like adding such a primitive.  We don't want to proliferate
> > unibyte strings in Emacs through that back door, because manipulating
> > unibyte strings involves subtle issues many Lisp programmers are not
> > aware of.
>
> I don't see what's subtle about "unibyte" strings, as long as you
> understand that these are strings of *bytes* instead of strings
> of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`).

That's the subtlety, right there.  Handling such "strings" in Emacs
Lisp can produce strange and unexpected results for someone who is not
aware of the difference and its implications.

> "Multibyte" strings are just as subtle (maybe more so even), yet we
> rightly don't hesitate to offer a primitive way to construct them.

Because we succeed to hide the subtleties in that case, so the
multibyte nature is not really visible on the Lisp level, unless you
try very hard to make it so.

> > Instead, how about doing that via vectors of byte values?
>
> What's the advantage?  That seems even more convoluted: create a Lisp
> vector of the right size (i.e. 8x the size of your string on a 64bit
> system), loop over your string turning each byte into a Lisp integer
> (with the reverted API, this involves allocation of an `emacs_value`
> box), then pass that to `concat`?

That's one way, but I'm sure I can come up with a simpler one. ;-)

> It's probably going to be even less efficient than going through utf-8
> and back.

I doubt that.  It's just an assignment.  And it's a rare situation
anyway.

> Think about cases where the module receives byte strings from the disk
> or the network and need to pass that to `decode-coding-string`.
> And consider that we might be talking about megabytes of strings.

They don't need to decode, they just need to arrange for it to be
UTF-8.

12