bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2

Right now, Gnus backends return article header data by writing it in a
parseable format into the `nntp-server-buffer', and returning one of the
symbols 'nov or 'headers, indicating how the data should be parsed.

This isn't great because it requires backends to first munge their data
into a format that looks like the NNTP format, which is then parsed
again, which is an extra layer of data transformation. It also makes use
of the `nntp-server-buffer', which is something I'd like to work on
reducing because it causes problems with threading and introduces
potential encoding bugs.

This patch provides the possibility for backends to return their own
headers (ie a list of vectors), though it doesn't actually change any of
the backends to do that -- that will be another patch.

I have one question at this stage: the 'nov or 'headers value gets
stored into the `gnus-headers-retrieved-by' variable. That variable is
later checked in a couple of places like so:

      (when (and gnus-fetch-old-headers
                 (eq gnus-headers-retrieved-by 'nov))
        (if (eq gnus-fetch-old-headers 'invisible)
            (gnus-build-all-threads)
          (gnus-build-old-threads)))

If the variable is 'headers, the `gnus-build-*-threads' functions don't
get called at all.

What's the difference between 'nov and 'headers, and why can we build
threads in one case and not the other? If backends were to return their
own headers, what value should they return? I'll also note that the nnir
version of this function returns 'nov regardless of what the "real"
backend function returned -- why is that?

I would love to use some other, more direct, heuristic to decide about
building threads or not, and get rid of the
'nov/'headers/gnus-headers-retrieved-by stuff altogether, but I don't
yet know how to do that.

Eric


0001-WIP-on-allowing-Gnus-backends-to-return-headers-dire.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Lars Ingebrigtsen
Eric Abrahamsen <[hidden email]> writes:

> This patch provides the possibility for backends to return their own
> headers (ie a list of vectors), though it doesn't actually change any of
> the backends to do that -- that will be another patch.

Great!

> I have one question at this stage: the 'nov or 'headers value gets
> stored into the `gnus-headers-retrieved-by' variable. That variable is
> later checked in a couple of places like so:
>
>       (when (and gnus-fetch-old-headers
> (eq gnus-headers-retrieved-by 'nov))
> (if (eq gnus-fetch-old-headers 'invisible)
>    (gnus-build-all-threads)
>  (gnus-build-old-threads)))
>
> If the variable is 'headers, the `gnus-build-*-threads' functions don't
> get called at all.
>
> What's the difference between 'nov and 'headers, and why can we build
> threads in one case and not the other? If backends were to return their
> own headers, what value should they return?

It's not about threading per se, but about displaying information about
already-read articles (or more precisely -- about articles not in the
set that was requested).  Threads are build no matter how the backend
delivers the data.

If nn-*retrieve-headers supports NOV fetching, then certain Gnus
variables about filling in threads with "old" articles is switched on,
because fetching extra NOV headers is fast (you just say "fetch 100-150"
instead of "fetch 100-110,120-130,150").  With the backends that fetch
head by head, this is slow, so it's not done.

It's basically a distinction between NNTP and almost all the other
backends.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
Lars Ingebrigtsen <[hidden email]> writes:

> Eric Abrahamsen <[hidden email]> writes:
>
>> This patch provides the possibility for backends to return their own
>> headers (ie a list of vectors), though it doesn't actually change any of
>> the backends to do that -- that will be another patch.
>
> Great!
>
>> I have one question at this stage: the 'nov or 'headers value gets
>> stored into the `gnus-headers-retrieved-by' variable. That variable is
>> later checked in a couple of places like so:
>>
>>       (when (and gnus-fetch-old-headers
>> (eq gnus-headers-retrieved-by 'nov))
>> (if (eq gnus-fetch-old-headers 'invisible)
>>    (gnus-build-all-threads)
>>  (gnus-build-old-threads)))
>>
>> If the variable is 'headers, the `gnus-build-*-threads' functions don't
>> get called at all.
>>
>> What's the difference between 'nov and 'headers, and why can we build
>> threads in one case and not the other? If backends were to return their
>> own headers, what value should they return?
>
> It's not about threading per se, but about displaying information about
> already-read articles (or more precisely -- about articles not in the
> set that was requested).  Threads are build no matter how the backend
> delivers the data.
>
> If nn-*retrieve-headers supports NOV fetching, then certain Gnus
> variables about filling in threads with "old" articles is switched on,
> because fetching extra NOV headers is fast (you just say "fetch 100-150"
> instead of "fetch 100-110,120-130,150").  With the backends that fetch
> head by head, this is slow, so it's not done.
>
> It's basically a distinction between NNTP and almost all the other
> backends.

Okay, that makes sense. So it's basically a flag saying header retrieval
is cheap enough that we might as well pull in more old messages than we
otherwise would. For example, nnmaildir builds and stores nov data
(which drives people with large maildirs insane because it takes
enormous amounts of time and space), so it returns 'nov, but if we ever
got rid of that (or made it optional) it would return 'headers.

I'm going to assume that any backend capable of returning its own
headers is probably going to... return 'headers. At any rate, as I
implement this for various backends, I'll start with the ones that
return 'headers to begin with. At some point, though, I'd still like to
get rid of this flag and build the distinction into the backend
functions themselves.

Hmmm... nnmaildir needs some love.

Eric



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
Eric Abrahamsen <[hidden email]> writes:

> Lars Ingebrigtsen <[hidden email]> writes:
>
>> Eric Abrahamsen <[hidden email]> writes:
>>
>>> This patch provides the possibility for backends to return their own
>>> headers (ie a list of vectors), though it doesn't actually change any of
>>> the backends to do that -- that will be another patch.
>>
>> Great!
>>
>>> I have one question at this stage: the 'nov or 'headers value gets
>>> stored into the `gnus-headers-retrieved-by' variable. That variable is
>>> later checked in a couple of places like so:
Okay, here's the patch as it stands -- I'll do some testing first. Andy,
I'm copying you in because this touches nnir.el, and nnselect.el will
need to be edited accordingly.


0001-WIP-on-allowing-Gnus-backends-to-return-headers-dire.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Lars Ingebrigtsen
In reply to this post by Eric Abrahamsen-2
Eric Abrahamsen <[hidden email]> writes:

> Okay, that makes sense. So it's basically a flag saying header retrieval
> is cheap enough that we might as well pull in more old messages than we
> otherwise would.

Yup.

> Hmmm... nnmaildir needs some love.

Indeed.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
I knew this seemed too simple...

Andy Cohen pointed out that `gnus-get-newsgroup-headers{,-xover}' were
doing more work (running hooks, and the alter function) that needed to
also be done if headers were returned directly. There's a bunch of other
overlap between those two functions (and between
`gnus-get-newsgroup-headers' and `nnheader-parse-naked-head') that Andy
is working on refactoring (I hope).

In the meantime there are several other places in the code that use
`gnus-retrieve-headers', and will need to be updated to handle the
possibility that headers have been returned directly. What I'd really
like to do, as much as possible, is to switch to `gnus-fetch-headers',
so we get our headers back and don't need to know how that happened.
This will also avoid ugliness like `nnvirtual-convert-headers'.

I think I can probably figure out how to do this in most spots.
gnus-agent and gnus-cache are particularly gnarly, though: what does
-braid-nov/-braid-heads actually do?

Eric



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Lars Ingebrigtsen
Eric Abrahamsen <[hidden email]> writes:

> I think I can probably figure out how to do this in most spots.
> gnus-agent and gnus-cache are particularly gnarly, though: what does
> -braid-nov/-braid-heads actually do?

Take two sets of nov/head headers and combine them.  Basically sort |
uniq, but they're already sorted.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2

On 11/08/19 22:03 PM, Lars Ingebrigtsen wrote:
> Eric Abrahamsen <[hidden email]> writes:
>
>> I think I can probably figure out how to do this in most spots.
>> gnus-agent and gnus-cache are particularly gnarly, though: what does
>> -braid-nov/-braid-heads actually do?
>
> Take two sets of nov/head headers and combine them.  Basically sort |
> uniq, but they're already sorted.

Okay good -- that should be as easy (easier) to do with an actual list
of headers than with text in a buffer.



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Lars Ingebrigtsen
Eric Abrahamsen <[hidden email]> writes:

> Okay good -- that should be as easy (easier) to do with an actual list
> of headers than with text in a buffer.

Yup; much easier.

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
In reply to this post by Eric Abrahamsen-2
Lars Ingebrigtsen <[hidden email]> writes:

> Eric Abrahamsen <[hidden email]> writes:
>
>> Okay good -- that should be as easy (easier) to do with an actual list
>> of headers than with text in a buffer.
>
> Yup; much easier.

Okay I've had a little more, uh, free time recently to work on Gnus
stuff, and am nearly ready with a patch for this. It got gnarly in
gnus-agent.el, but I think I've got a handle on it.

I have one question right now, about `nnvirtual-update-xref-header',
which needs to be rewritten from altering header text in a buffer, to
altering the `mail-header-xref' value of an already-parsed header.

First of all, it unconditionally adds an xref header to our group and
article, even if there wasn't one before. Then it does the following.

I'm not sure how to describe this in plain English. I would say it is
replacing all the server names with our "prefix", but that doesn't
explain the deletion of the group name *and* the article number in the
second "(when (re-search-forward" below.

Can someone explain what exactly this function is supposed to do?

(save-restriction
    (narrow-to-region (point)
                      (or (search-forward "\t" (point-at-eol) t)
                          (point-at-eol)))
    (goto-char (point-min))
    (when (re-search-forward "Xref: *[^\n:0-9 ]+ *" nil t)
      (replace-match "" t t))
    (goto-char (point-min))
    (when (re-search-forward
           (concat (regexp-quote (gnus-group-real-name group)) ":[0-9]+")
           nil t)
      (replace-match "" t t))
    (unless (eobp)
      (insert " ")
      (when (not (string= "" prefix))
        (while (re-search-forward "[^ ]+:[0-9]+" nil t)
          (save-excursion
            (goto-char (match-beginning 0))
            (insert prefix))))))



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
Lars Ingebrigtsen <[hidden email]> writes:

> Eric Abrahamsen <[hidden email]> writes:
>
>> Can someone explain what exactly this function is supposed to do?
>
> Somebody should have written more comments when they wrote that...  and,
> like, made it correct.
>
> It transforms this:
>
> 64007 Formatting labels used for URL buttons in Gnus articles Narendra
> Joshi <[hidden email]> Sun, 19 Apr 2020 00:31:13 +0200
> <[hidden email]>
> <[hidden email]> 3614 14 Xref:
> reader01.eternal-september.org gnu.emacs.help:57603
>
> Into this:
>
> 64007 Formatting labels used for URL buttons in Gnus articles Narendra
> Joshi <[hidden email]> Sun, 19 Apr 2020 00:31:13 +0200
> <[hidden email]>
> <[hidden email]> 3614 14 Xref: marnie gnu.emacs.help:57603
> 01.eternal-september.org
>
> Which is wrong, of course -- the "01.eternal-september.org" thing
> shouldn't be there.
>
> Anyway, what it's supposed to do it rewrite
>
> Xref: reader01.eternal-september.org foo.bar:2523 gnu.emacs.help:57603 zot.bar:3242
>
> to
>
> Xref: whatever gnu.emacs.help:57603 foo.bar:2523 zot.bar:3242
>
> That is, put the group/article we're really selecting first in the Xref
> header, but leave the remaining as they were.  This is because we need
> those to mark the article as read in those other groups, but we
> primarily need to know where this article really came from (the first
> entry).

Slowly, slowly, I'm getting this done. I'm still a bit confused here,
though. The xref elements look like they're not supposed to have spaces
in them, but the existing code does this:

(insert "Xref: " sysname " " group ":")
(princ article (current-buffer))

Which leaves a space between sysname and group.

You say the existing xrefs should be left as they are, but the code adds
"prefix" to them. Should this be added unconditionally?

Here's the new version of the function, operating on a header struct.
Does this look right to you?

Thanks,
Eric

(defun nnvirtual-update-xref-header (header group prefix sysname)
  "Add xref to component GROUP to HEADER.
Also add a server PREFIX any existing xref lines."
  (let ((bits (split-string (mail-header-xref header)
                            nil t "[[:blank:]]"))
        (art-no (mail-header-number header)))
    (setq bits
          (mapcar (lambda (bit)
                    (concat prefix bit))
                  bits))
    (setf (mail-header-xref header)
          (mapconcat #'identity
                     (cons (format "%s %s:%d"
                                   sysname group art-no)
                           bits)
                     " "))))



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Lars Ingebrigtsen
Eric Abrahamsen <[hidden email]> writes:

>> Xref: reader01.eternal-september.org foo.bar:2523 gnu.emacs.help:57603 zot.bar:3242

[...]

> Slowly, slowly, I'm getting this done. I'm still a bit confused here,
> though. The xref elements look like they're not supposed to have spaces
> in them, but the existing code does this:
>
> (insert "Xref: " sysname " " group ":")
> (princ article (current-buffer))
>
> Which leaves a space between sysname and group.

I'm not quite sure I understand the question?  The sysname is just a
part of the syntax of the Xref header and isn't used for anything by
Gnus, as far as I know.  So there has to be a space?  It's certainly not
part of the group name.

> You say the existing xrefs should be left as they are, but the code adds
> "prefix" to them. Should this be added unconditionally?

Uhm...  I think so?  But I'm not sure.

> Here's the new version of the function, operating on a header struct.
> Does this look right to you?
>
> Thanks,
> Eric
>
> (defun nnvirtual-update-xref-header (header group prefix sysname)
>   "Add xref to component GROUP to HEADER.
> Also add a server PREFIX any existing xref lines."
>   (let ((bits (split-string (mail-header-xref header)
>    nil t "[[:blank:]]"))
> (art-no (mail-header-number header)))
>     (setq bits
>  (mapcar (lambda (bit)
>    (concat prefix bit))
>  bits))
>     (setf (mail-header-xref header)
>  (mapconcat #'identity
>     (cons (format "%s %s:%d"
>                                    sysname group art-no)
>   bits)
>     " "))))

I think so.  The body of the let form is perhaps more easily expressed
as

(setf (mail-header-xref header)
      (concat (format "%s %s:%d " sysname group art-no)
              (mapconcat (lambda (bit)
    (concat prefix bit))
                         bits " ")))

?

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



Reply | Threaded
Open this post in threaded view
|

bug#38011: 27.0.50; [PATCH] WIP on allowing Gnus backends to return header data directly

Eric Abrahamsen-2
Lars Ingebrigtsen <[hidden email]> writes:

> Eric Abrahamsen <[hidden email]> writes:
>
>>> Xref: reader01.eternal-september.org foo.bar:2523 gnu.emacs.help:57603 zot.bar:3242
>
> [...]
>
>> Slowly, slowly, I'm getting this done. I'm still a bit confused here,
>> though. The xref elements look like they're not supposed to have spaces
>> in them, but the existing code does this:
>>
>> (insert "Xref: " sysname " " group ":")
>> (princ article (current-buffer))
>>
>> Which leaves a space between sysname and group.
>
> I'm not quite sure I understand the question?  The sysname is just a
> part of the syntax of the Xref header and isn't used for anything by
> Gnus, as far as I know.  So there has to be a space?  It's certainly not
> part of the group name.
TBH I only just went and read the RFC for this -- something I'd been
trying to avoid!

>> You say the existing xrefs should be left as they are, but the code adds
>> "prefix" to them. Should this be added unconditionally?
>
> Uhm...  I think so?  But I'm not sure.

Looking over the code again, I think it's best to only add if the prefix
isn't already there.

>> Here's the new version of the function, operating on a header struct.
>> Does this look right to you?
>>
>> Thanks,
>> Eric
>>
>> (defun nnvirtual-update-xref-header (header group prefix sysname)
>>   "Add xref to component GROUP to HEADER.
>> Also add a server PREFIX any existing xref lines."
>>   (let ((bits (split-string (mail-header-xref header)
>>    nil t "[[:blank:]]"))
>> (art-no (mail-header-number header)))
>>     (setq bits
>>  (mapcar (lambda (bit)
>>    (concat prefix bit))
>>  bits))
>>     (setf (mail-header-xref header)
>>  (mapconcat #'identity
>>     (cons (format "%s %s:%d"
>>                                    sysname group art-no)
>>   bits)
>>     " "))))
>
> I think so.  The body of the let form is perhaps more easily expressed
> as
>
> (setf (mail-header-xref header)
>       (concat (format "%s %s:%d " sysname group art-no)
>      (mapconcat (lambda (bit)
>     (concat prefix bit))
> bits " ")))
>
> ?
Sure, this was just my halfway-there muddle.

I've cleaned this branch, squashed it, and am preparing to test for a
while. I'm attaching the full diff in case anyone wants to read it :)

A net removal of 562 lines with, I hope, no change in behavior.



0001-Allow-gnus-retrieve-headers-to-return-headers-direct.patch (47K) Download Attachment