Here's the next thing: turning Gnus' obarrays-as-hash-tables into real hash tables. Gnus currently stores information about groups by coercing group names to unibyte, interning them in custom obarrays, and then setting their symbol-value to whatever value needs to be stored. I think all this was written before Emacs had actual hash tables. I think real hash tables are better for a few reasons: 1. Hash table lookups seem to be marginally faster than obarray lookups, 2. It's "less weird" for contributors and bug hunters, 3. It allows us to reduces the amount of encoding/decoding going on: group names can stay strings instead of being forced into symbols. I've pushed a branch, scratch/gnus-hashtables, with two commits in it. The first changes all the obarrays to hash tables. Apart from simply replacing function calls, there were a few bumps. 1. Gnus uses `text-property-any' to find and compare group-name text properties; I made a new `gnus-text-property-search' which behaves similarly, but compares with equal. 2. Some of the "hash tables" were simply storing the value t, ie just used for membership. I left these as hash tables, but I think in many cases simple string-in-list membership tests would suffice. 3. The hash table in gnus-async.el didn't appear to be doing anything -- it seemed to be redundant with `gnus-async-article-alist'. I've removed it, but it might need to be put back if I'm missing something. 4. The creation of `gnus-newsgroup-dependencies' was the most fiddly, and I added tests for that. I'm not entirely convinced that `gnus-thread-loop-p' behaves the way it's meant to, it appears to only check for direct loops between parent and child, not parent and descendant. 5. The old return value of (gnus-gethash <group-name> gnus-newsrc-hashtb) was kind of a slice into `gnus-newsrc-alist': it behaved a bit like `member', but also included the group *before* the group you wanted, as well as all those after, so you could traverse the list in either direction. It now no longer preserves the order of `gnus-newsrc-alist' (this ordering wasn't actually used in many places), and instead there's a new variable `gnus-group-list' which records the proper sort order of the groups. I feel fairly confident that all this is working okay. The second commit I do *not* feel very confident about, and it's more of a "let's see how things break" attempt. In theory, we should now be able to limit group name encoding/decoding to the boundaries of Gnus -- reading/writing active files, and talking to servers. Within Gnus, the group names can remain decoded strings. In the second commit I've just gone in and pulled out all the decoding I can find, changed all the 'raw-text encoding options to 'undecided, and deleted the `mm-disable-multibyte's. I have no confidence that I've covered all the bases, though I have been using this branch for a couple of weeks, with nnml, nnmaildir, and nnimap groups with multibyte names, and so far nothing has broken. I'm less confident about nntp. I'll continue working on this, and I hope I can get some feedback here, particularly on the second commit. Thanks, Eric |
Eric Abrahamsen <[hidden email]> writes:
> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real > hash tables. Gnus currently stores information about groups by coercing > group names to unibyte, interning them in custom obarrays, and then > setting their symbol-value to whatever value needs to be stored. I think > all this was written before Emacs had actual hash tables. Also hounding Lars for his opinion... |
On 12/11/18 20:23 PM, Lars Ingebrigtsen wrote: > Eric Abrahamsen <[hidden email]> writes: > >>> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real >>> hash tables. Gnus currently stores information about groups by coercing >>> group names to unibyte, interning them in custom obarrays, and then >>> setting their symbol-value to whatever value needs to be stored. I think >>> all this was written before Emacs had actual hash tables. >> >> Also hounding Lars for his opinion... > > Using real hash tables sounds nice, if there's no negative performance > impact. Cool! In fact it seems to be a hair faster. If you have a moment for this, would you look at the second commit in the scratch/gnus-hashtables branch? The first commit seems fairly stable, but the second has to do with encoding -- if group names never need to be unibyte, it should be possible to leave them decoded inside Gnus itself. Besides active files and server communication, are there any other places where group names would need to be encoded? Thanks, Eric |
In reply to this post by Eric Abrahamsen-2
On 12/11/18 20:23 PM, Lars Ingebrigtsen wrote: > Eric Abrahamsen <[hidden email]> writes: > >>> Here's the next thing: turning Gnus' obarrays-as-hash-tables into real >>> hash tables. Gnus currently stores information about groups by coercing >>> group names to unibyte, interning them in custom obarrays, and then >>> setting their symbol-value to whatever value needs to be stored. I think >>> all this was written before Emacs had actual hash tables. >> >> Also hounding Lars for his opinion... > > Using real hash tables sounds nice, if there's no negative performance > impact. I've pushed another commit to scratch/gnus-hashtables, fixing a small bug in gnus-bklg.el. Otherwise, I've been using this branch for a couple months now, with no ill effects. I guess what I'd like to do is squash the first commit with the third (the changes that actually implement the hash tables), but leave the second aside for now (the one that stops doing internal encoding/decoding). That ought to be done eventually, but I'll need more time to figure it out, and would like to get more exposure for this refactoring in the meantime. Eric |
Free forum by Nabble | Edit this page |