bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables

Eric Abrahamsen-2

Here's the next thing: turning Gnus' obarrays-as-hash-tables into real
hash tables. Gnus currently stores information about groups by coercing
group names to unibyte, interning them in custom obarrays, and then
setting their symbol-value to whatever value needs to be stored. I think
all this was written before Emacs had actual hash tables.

I think real hash tables are better for a few reasons:

1. Hash table lookups seem to be marginally faster than obarray lookups,
2. It's "less weird" for contributors and bug hunters,
3. It allows us to reduces the amount of encoding/decoding going on:
group names can stay strings instead of being forced into symbols.

I've pushed a branch, scratch/gnus-hashtables, with two commits in it.
The first changes all the obarrays to hash tables. Apart from simply
replacing function calls, there were a few bumps.

1. Gnus uses `text-property-any' to find and compare group-name text
   properties; I made a new `gnus-text-property-search' which behaves
   similarly, but compares with equal.
2. Some of the "hash tables" were simply storing the value t, ie just
   used for membership. I left these as hash tables, but I think in many
   cases simple string-in-list membership tests would suffice.
3. The hash table in gnus-async.el didn't appear to be doing anything --
   it seemed to be redundant with `gnus-async-article-alist'. I've
   removed it, but it might need to be put back if I'm missing something.
4. The creation of `gnus-newsgroup-dependencies' was the most fiddly,
   and I added tests for that. I'm not entirely convinced that
   `gnus-thread-loop-p' behaves the way it's meant to, it appears to
   only check for direct loops between parent and child, not parent and
   descendant.
5. The old return value of (gnus-gethash <group-name>
   gnus-newsrc-hashtb) was kind of a slice into `gnus-newsrc-alist': it
   behaved a bit like `member', but also included the group *before* the
   group you wanted, as well as all those after, so you could traverse
   the list in either direction. It now no longer preserves the order of
   `gnus-newsrc-alist' (this ordering wasn't actually used in many
   places), and instead there's a new variable `gnus-group-list' which
   records the proper sort order of the groups.

I feel fairly confident that all this is working okay. The second commit
I do *not* feel very confident about, and it's more of a "let's see how
things break" attempt. In theory, we should now be able to limit group
name encoding/decoding to the boundaries of Gnus -- reading/writing
active files, and talking to servers. Within Gnus, the group names can
remain decoded strings.

In the second commit I've just gone in and pulled out all the decoding I
can find, changed all the 'raw-text encoding options to 'undecided, and
deleted the `mm-disable-multibyte's. I have no confidence that I've
covered all the bases, though I have been using this branch for a couple
of weeks, with nnml, nnmaildir, and nnimap groups with multibyte names,
and so far nothing has broken. I'm less confident about nntp.

I'll continue working on this, and I hope I can get some feedback here,
particularly on the second commit.

Thanks, Eric



Reply | Threaded
Open this post in threaded view
|

bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables

Eric Abrahamsen-2
Eric Abrahamsen <[hidden email]> writes:

> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real
> hash tables. Gnus currently stores information about groups by coercing
> group names to unibyte, interning them in custom obarrays, and then
> setting their symbol-value to whatever value needs to be stored. I think
> all this was written before Emacs had actual hash tables.

Also hounding Lars for his opinion...




Reply | Threaded
Open this post in threaded view
|

bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables

Eric Abrahamsen-2

On 12/11/18 20:23 PM, Lars Ingebrigtsen wrote:

> Eric Abrahamsen <[hidden email]> writes:
>
>>> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real
>>> hash tables. Gnus currently stores information about groups by coercing
>>> group names to unibyte, interning them in custom obarrays, and then
>>> setting their symbol-value to whatever value needs to be stored. I think
>>> all this was written before Emacs had actual hash tables.
>>
>> Also hounding Lars for his opinion...
>
> Using real hash tables sounds nice, if there's no negative performance
> impact.

Cool! In fact it seems to be a hair faster.

If you have a moment for this, would you look at the second commit in
the scratch/gnus-hashtables branch? The first commit seems fairly
stable, but the second has to do with encoding -- if group names never
need to be unibyte, it should be possible to leave them decoded inside
Gnus itself.

Besides active files and server communication, are there any other
places where group names would need to be encoded?

Thanks,
Eric



Reply | Threaded
Open this post in threaded view
|

bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables

Eric Abrahamsen-2
In reply to this post by Eric Abrahamsen-2

On 12/11/18 20:23 PM, Lars Ingebrigtsen wrote:

> Eric Abrahamsen <[hidden email]> writes:
>
>>> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real
>>> hash tables. Gnus currently stores information about groups by coercing
>>> group names to unibyte, interning them in custom obarrays, and then
>>> setting their symbol-value to whatever value needs to be stored. I think
>>> all this was written before Emacs had actual hash tables.
>>
>> Also hounding Lars for his opinion...
>
> Using real hash tables sounds nice, if there's no negative performance
> impact.

I've pushed another commit to scratch/gnus-hashtables, fixing a small
bug in gnus-bklg.el. Otherwise, I've been using this branch for a couple
months now, with no ill effects.

I guess what I'd like to do is squash the first commit with the third
(the changes that actually implement the hash tables), but leave the
second aside for now (the one that stops doing internal
encoding/decoding).

That ought to be done eventually, but I'll need more time to figure it
out, and would like to get more exposure for this refactoring in the
meantime.

Eric