Help please! To track down GC trying to free an already freed object.

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
Hello, Emacs.

I get this problem after a recent merge of master into
/scratch/accurate-warning-pos (my branch where I'm trying to implement
correct source positions in the byte compiler's warning messages).  This
was a large merge, including bringing in the portable dumper.

Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
being freed already has type PVEC_FREE, i.e. has been freed already.
This object is a "symbol with position", a type of pseudovector which
doesn't yet exist outside of this scratch branch.

At a guess, I'm setting some data structure in the C code to a Lisp
structure containing this object, but failing to apply static protection
to this C variable.  Or something like that.

This failure occurs during the byte compilation of .../lisp/registry.el
in a make or make bootstrap.  The failure only occurs when this byte
compilation is started as -batch from the command line.  So my use of
GDB is from the command line, not within a running Emacs.

With GDB, I can break at the creation of this symbol-with-position
object and again at its (first) freeing with this breakpoint:

    break setup_on_free_list if (v == 0x5555561d0450)

.  However, this isn't helping me to track down the Lisp object which
still references this symbol-with-position.  I've tried to find the
address of Emacs's data segment, so as to be able to search through it
for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
thing to do.

Could somebody who has experience in this sort of thing please suggest
how I might proceed with the debugging, or possibly offer me some other
sort of help or hints.

Thanks in advance!

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Eli Zaretskii
> Date: Tue, 2 Apr 2019 11:25:37 +0000
> From: Alan Mackenzie <[hidden email]>
>
> With GDB, I can break at the creation of this symbol-with-position
> object and again at its (first) freeing with this breakpoint:
>
>     break setup_on_free_list if (v == 0x5555561d0450)
>
> .  However, this isn't helping me to track down the Lisp object which
> still references this symbol-with-position.  I've tried to find the
> address of Emacs's data segment, so as to be able to search through it
> for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> thing to do.
>
> Could somebody who has experience in this sort of thing please suggest
> how I might proceed with the debugging, or possibly offer me some other
> sort of help or hints.

The usual method of debugging such problems is described in etc/DEBUG,
it basically uses the last_marked[] array.  You start with the object
at last_marked[last_marked_index - 1], and go backwards (in circular
manner), comparing the objects you find in the array with those you
see in the call-stack frames that call mark_* functions.  Just be very
careful when you print the objects; e.g., never use 'pp', because the
function it calls cannot handle marked objects.

If you already tried this, please ask more specific questions.

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Daniel Colascione-5
In reply to this post by Alan Mackenzie
> Hello, Emacs.
>
> I get this problem after a recent merge of master into
> /scratch/accurate-warning-pos (my branch where I'm trying to implement
> correct source positions in the byte compiler's warning messages).  This
> was a large merge, including bringing in the portable dumper.
>
> Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> being freed already has type PVEC_FREE, i.e. has been freed already.
> This object is a "symbol with position", a type of pseudovector which
> doesn't yet exist outside of this scratch branch.

Out of curiosity, why do we need a new C-level type here?

> At a guess, I'm setting some data structure in the C code to a Lisp
> structure containing this object, but failing to apply static protection
> to this C variable.  Or something like that.
>
> This failure occurs during the byte compilation of .../lisp/registry.el
> in a make or make bootstrap.  The failure only occurs when this byte
> compilation is started as -batch from the command line.  So my use of
> GDB is from the command line, not within a running Emacs.
>
> With GDB, I can break at the creation of this symbol-with-position
> object and again at its (first) freeing with this breakpoint:
>
>     break setup_on_free_list if (v == 0x5555561d0450)
>
> .  However, this isn't helping me to track down the Lisp object which
> still references this symbol-with-position.  I've tried to find the
> address of Emacs's data segment, so as to be able to search through it
> for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> thing to do.
>
> Could somebody who has experience in this sort of thing please suggest
> how I might proceed with the debugging, or possibly offer me some other
> sort of help or hints.
>
> Thanks in advance!

rr is incredibly helpful for debugging this sort of problem. See
https://rr-project.org/. You can record an rr session containing the
crash, replay it, get to the crash, and then reverse-next, reverse-finish,
and reverse-continue your way through the GC, running it in reverse until
you find whatever it is that made mark_object on the dead object happen.
Hardware watchpoints with rr are also very useful and work great in
reverse mode: just use watch -l myvar and reverse-continue to see who last
wrote a memory location, or use rwatch to see who last *read* a location.
(The -l is important since it enables the use of hardware watchpoints.)



Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Eli Zaretskii
> Date: Tue, 2 Apr 2019 12:09:59 -0700
> From: "Daniel Colascione" <[hidden email]>
> Cc: [hidden email]
>
> rr is incredibly helpful for debugging this sort of problem. See
> https://rr-project.org/. You can record an rr session containing the
> crash, replay it, get to the crash, and then reverse-next, reverse-finish,
> and reverse-continue your way through the GC, running it in reverse until
> you find whatever it is that made mark_object on the dead object happen.

GDB supports reverse execution as well, on some platforms.

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Daniel Colascione-5
Hello, Daniel.

On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
> > Hello, Emacs.

> > I get this problem after a recent merge of master into
> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
> > correct source positions in the byte compiler's warning messages).  This
> > was a large merge, including bringing in the portable dumper.

> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> > being freed already has type PVEC_FREE, i.e. has been freed already.
> > This object is a "symbol with position", a type of pseudovector which
> > doesn't yet exist outside of this scratch branch.

> Out of curiosity, why do we need a new C-level type here?

It's to help solve a bug in the byte compiler, which up until recently
was intractable.  The byte compiler frequently (?usually) reports
incorrect line/column numbers in its warning messages.  This is due to
the kludge it uses to keep track of them.

The only current candidate for a fix is for the reader, on a flag being
bound to non-nil, to return "symbols with position" rather than standard
symbols.  The "position" associated with the symbol is it's textual
offset from the beginning of the construct in the source file being read.

These symbols with position are implemented as pseudovectors with type
PVEC_SYMBOL_WITH_POS and behave as ordinary symbols for all purposes,
except for when a warning message is being output, when the postion
supplies a correct file/line number for the message.

This works and works well.  However it causes an unacceptable slowdown in
Emacs (around 8 - 15 per cent).  I'm working on a fix for this, and have
made substantial progress.

The topic was discussed at length in emacs-devel starting November last
year in posts whose Subject: contained "scratch/accurate-warning-pos".

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Daniel Colascione-5
> Hello, Daniel.
>
> On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
>> > Hello, Emacs.
>
>> > I get this problem after a recent merge of master into
>> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
>> > correct source positions in the byte compiler's warning messages).
>> This
>> > was a large merge, including bringing in the portable dumper.
>
>> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
>> > being freed already has type PVEC_FREE, i.e. has been freed already.
>> > This object is a "symbol with position", a type of pseudovector which
>> > doesn't yet exist outside of this scratch branch.
>
>> Out of curiosity, why do we need a new C-level type here?
>
> It's to help solve a bug in the byte compiler, which up until recently
> was intractable.  The byte compiler frequently (?usually) reports
> incorrect line/column numbers in its warning messages.  This is due to
> the kludge it uses to keep track of them.
>
> The only current candidate for a fix is for the reader, on a flag being
> bound to non-nil, to return "symbols with position" rather than standard
> symbols.  The "position" associated with the symbol is it's textual
> offset from the beginning of the construct in the source file being read.

So if I read symbol foo from file1.el and symbol foo from file2.el, I get
two different symbol-with-location instances, each tagged with a different
source location? Do these symbol objects compare eq to each other?


Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Eli Zaretskii
Hello, Eli.

On Tue, Apr 02, 2019 at 18:04:22 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 11:25:37 +0000
> > From: Alan Mackenzie <[hidden email]>

> > With GDB, I can break at the creation of this symbol-with-position
> > object and again at its (first) freeing with this breakpoint:

> >     break setup_on_free_list if (v == 0x5555561d0450)

> > .  However, this isn't helping me to track down the Lisp object which
> > still references this symbol-with-position.  I've tried to find the
> > address of Emacs's data segment, so as to be able to search through it
> > for 0x5555561d0455 in GDB, but this doesn't feel like a very useful
> > thing to do.

> > Could somebody who has experience in this sort of thing please suggest
> > how I might proceed with the debugging, or possibly offer me some other
> > sort of help or hints.

> The usual method of debugging such problems is described in etc/DEBUG,

Apologies, I didn't see this.  I read quite a bit of etc/DEBUG, but for
some reason completely missed the bit about GC problems.

> it basically uses the last_marked[] array.  You start with the object
> at last_marked[last_marked_index - 1], and go backwards (in circular
> manner), comparing the objects you find in the array with those you
> see in the call-stack frames that call mark_* functions.  Just be very
> careful when you print the objects; e.g., never use 'pp', because the
> function it calls cannot handle marked objects.

I'm having some difficult seeing the entire last_marked array with GDB.
I will try to find a solution in the GDB manual.

> If you already tried this, please ask more specific questions.

No, I hadn't.  I didn't know about last_marked.  I'll see if I can get
further with its help.  Thanks!

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Eli Zaretskii
Hello, Eli.

On Tue, Apr 02, 2019 at 22:21:26 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 12:09:59 -0700
> > From: "Daniel Colascione" <[hidden email]>
> > Cc: [hidden email]
> >
> > rr is incredibly helpful for debugging this sort of problem. See
> > https://rr-project.org/. You can record an rr session containing the
> > crash, replay it, get to the crash, and then reverse-next, reverse-finish,
> > and reverse-continue your way through the GC, running it in reverse until
> > you find whatever it is that made mark_object on the dead object happen.

> GDB supports reverse execution as well, on some platforms.

On my GNU/Linux system, I tried to run 'reverse-next', and got the error
message:

    Target multi-thread does not support this command.

.  :-(  I suppose I could reconfigure without multi threading, but then
the bug (which is reproducible) probably wouldn't happen in the same
place.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Daniel Colascione-5
Hello again, Daniel.

On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:
> > Hello, Daniel.

> > On Tue, Apr 02, 2019 at 12:09:59 -0700, Daniel Colascione wrote:
> >> > Hello, Emacs.

> >> > I get this problem after a recent merge of master into
> >> > /scratch/accurate-warning-pos (my branch where I'm trying to implement
> >> > correct source positions in the byte compiler's warning messages).
> >> This
> >> > was a large merge, including bringing in the portable dumper.

> >> > Emacs aborts at mark_object L+179 (in alloc.c), because a pseudovector
> >> > being freed already has type PVEC_FREE, i.e. has been freed already.
> >> > This object is a "symbol with position", a type of pseudovector which
> >> > doesn't yet exist outside of this scratch branch.

> >> Out of curiosity, why do we need a new C-level type here?

> > It's to help solve a bug in the byte compiler, which up until recently
> > was intractable.  The byte compiler frequently (?usually) reports
> > incorrect line/column numbers in its warning messages.  This is due to
> > the kludge it uses to keep track of them.

> > The only current candidate for a fix is for the reader, on a flag being
> > bound to non-nil, to return "symbols with position" rather than standard
> > symbols.  The "position" associated with the symbol is it's textual
> > offset from the beginning of the construct in the source file being read.

> So if I read symbol foo from file1.el and symbol foo from file2.el, I get
> two different symbol-with-location instances, each tagged with a different
> source location? Do these symbol objects compare eq to each other?

They do, yes.  Otherwise the byte compiler wouldn't work, as it
frequently compares a symbol-with-position with a constant ("ordinary")
symbol using eq.

However, it is envisaged the flag symbols-with-pos-enable will be bound
to non-nil only by the byte compiler.  The reader resets this position to
zero for each top-level form it reads.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Daniel Colascione-5
In reply to this post by Alan Mackenzie
> Hello, Eli.
>
> On Tue, Apr 02, 2019 at 22:21:26 +0300, Eli Zaretskii wrote:
>> > Date: Tue, 2 Apr 2019 12:09:59 -0700
>> > From: "Daniel Colascione" <[hidden email]>
>> > Cc: [hidden email]
>> >
>> > rr is incredibly helpful for debugging this sort of problem. See
>> > https://rr-project.org/. You can record an rr session containing the
>> > crash, replay it, get to the crash, and then reverse-next,
>> reverse-finish,
>> > and reverse-continue your way through the GC, running it in reverse
>> until
>> > you find whatever it is that made mark_object on the dead object
>> happen.
>
>> GDB supports reverse execution as well, on some platforms.
>
> On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> message:
>
>     Target multi-thread does not support this command.
>
> .  :-(  I suppose I could reconfigure without multi threading, but then
> the bug (which is reproducible) probably wouldn't happen in the same
> place.

I don't think I've ever gotten pure-GDB reverse execution to work
correctly. rr Just Works for me in every instance I've tried it.



Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Eli Zaretskii
In reply to this post by Alan Mackenzie
> Date: Tue, 2 Apr 2019 20:46:53 +0000
> From: Alan Mackenzie <[hidden email]>
> Cc: Daniel Colascione <[hidden email]>, [hidden email]
>
> > GDB supports reverse execution as well, on some platforms.
>
> On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> message:
>
>     Target multi-thread does not support this command.

I think you are supposed to record the execution, and then say

  (gdb) target record-core

or

  (gdb) target record-btrace

before the reverse execution is available.

But I was always able to debug GC problems by using last_marked array.

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Eli Zaretskii
In reply to this post by Alan Mackenzie
> Date: Tue, 2 Apr 2019 20:42:37 +0000
> From: Alan Mackenzie <[hidden email]>
> Cc: [hidden email]
>
> I'm having some difficult seeing the entire last_marked array with GDB.
> I will try to find a solution in the GDB manual.

You want "set print elements unlimited", I think.

However, my recommendation is to examine the array one element at a
time, moving back to the previous one only when you understand what
the element you've looked at is and whether it is or isn't related to
the problem.  Also, last_marked array is written cyclically, so you
may need to wrap around the index to see the objects in the right
order.

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Eli Zaretskii
Hello, Eli.

On Wed, Apr 03, 2019 at 07:39:35 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 20:46:53 +0000
> > From: Alan Mackenzie <[hidden email]>
> > Cc: Daniel Colascione <[hidden email]>, [hidden email]

> > > GDB supports reverse execution as well, on some platforms.

> > On my GNU/Linux system, I tried to run 'reverse-next', and got the error
> > message:

> >     Target multi-thread does not support this command.

> I think you are supposed to record the execution, and then say

>   (gdb) target record-core

> or

>   (gdb) target record-btrace

> before the reverse execution is available.

Yes.  I thought there was something missing.  ;-)  There's no mention of
such recording in the GDB manual's "Reverse Execution" page, nor any
cross reference to "Process Record and Replay" there.

I'll try again and see if I can get it working.

> But I was always able to debug GC problems by using last_marked array.

The problem I think I'm up against is that the symbol-with-pos object is
not being marked at a particular garbage_collect_1, and thus gets freed
prematurely.

I intend to get the hex values of the Lisp_Objects which constitute the
list in which the symbol-with-pos is embedded and search for these in
last_marked.  Putting a conditional breakpoint on Fcons slows down Emacs
somewhat.  ;-)

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Eli Zaretskii
> Date: Wed, 3 Apr 2019 10:01:13 +0000
> Cc: [hidden email], [hidden email]
> From: Alan Mackenzie <[hidden email]>
>
> The problem I think I'm up against is that the symbol-with-pos object is
> not being marked at a particular garbage_collect_1, and thus gets freed
> prematurely.
>
> I intend to get the hex values of the Lisp_Objects which constitute the
> list in which the symbol-with-pos is embedded and search for these in
> last_marked.  Putting a conditional breakpoint on Fcons slows down Emacs
> somewhat.  ;-)

GDB has memory-search commands, see the node "Searching Memory" in the
GDB manual.  Maybe this can help.

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Paul Eggert
In reply to this post by Alan Mackenzie
Alan Mackenzie wrote:
> There's no mention of
> such recording in the GDB manual's "Reverse Execution" page, nor any
> cross reference to "Process Record and Replay" there.

I filed a bug report for that here:

https://sourceware.org/bugzilla/show_bug.cgi?id=24417

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
In reply to this post by Eli Zaretskii
Hello, Eli.

On Wed, Apr 03, 2019 at 07:43:22 +0300, Eli Zaretskii wrote:
> > Date: Tue, 2 Apr 2019 20:42:37 +0000
> > From: Alan Mackenzie <[hidden email]>
> > Cc: [hidden email]

> > I'm having some difficult seeing the entire last_marked array with GDB.
> > I will try to find a solution in the GDB manual.

> You want "set print elements unlimited", I think.

> However, my recommendation is to examine the array one element at a
> time, moving back to the previous one only when you understand what
> the element you've looked at is and whether it is or isn't related to
> the problem.  Also, last_marked array is written cyclically, so you
> may need to wrap around the index to see the objects in the right
> order.

I've found the bug.

In the garbage collection, it's necessary for Qsymbols_with_pos_enabled
to be bound to nil.  (That's the variable which enables symbols with
position).

I had bound that variable to nil in Fgarbage_collect, not noticing that
there are calls to the C function garbage_collect which bypass the
primitive.  This was the bug.

As a result, the pseudovector (Symbol "nil" at position 339) was caught
by a NILP, causing it not to get marked.  So it got swept away, even
though it was still live.

So I've spent several days on this, but as a consolation I now know GDB
much better than I did before.  ;-).  My branch now builds successfully.

Thanks for all the help!

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alex Gramiak
In reply to this post by Alan Mackenzie
Alan Mackenzie <[hidden email]> writes:

> Hello again, Daniel.
>
> On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:
>
>> So if I read symbol foo from file1.el and symbol foo from file2.el, I get
>> two different symbol-with-location instances, each tagged with a different
>> source location? Do these symbol objects compare eq to each other?
>
> They do, yes.  Otherwise the byte compiler wouldn't work, as it
> frequently compares a symbol-with-position with a constant ("ordinary")
> symbol using eq.
>
> However, it is envisaged the flag symbols-with-pos-enable will be bound
> to non-nil only by the byte compiler.  The reader resets this position to
> zero for each top-level form it reads.

I apologize if this topic already reached its conclusion, but IMO
having eq return true for two different object types is quite
surprising behaviour. Is it out of the question to leave eq alone and
introduce, e.g., eq-excluding-position that strips possible positions
before comparison?

Reply | Threaded
Open this post in threaded view
|

Re: Help please! To track down GC trying to free an already freed object.

Alan Mackenzie
Hello, Alex.

On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
> Alan Mackenzie <[hidden email]> writes:

> > On Tue, Apr 02, 2019 at 13:33:02 -0700, Daniel Colascione wrote:

> >> So if I read symbol foo from file1.el and symbol foo from file2.el,
> >> I get two different symbol-with-location instances, each tagged with
> >> a different source location? Do these symbol objects compare eq to
> >> each other?

> > They do, yes.  Otherwise the byte compiler wouldn't work, as it
> > frequently compares a symbol-with-position with a constant
> > ("ordinary") symbol using eq.

> > However, it is envisaged the flag symbols-with-pos-enable will be bound
> > to non-nil only by the byte compiler.  The reader resets this position to
> > zero for each top-level form it reads.

> I apologize if this topic already reached its conclusion, but IMO
> having eq return true for two different object types is quite
> surprising behaviour.

We are comparing two symbols, both of which are 'foo, but one of which is
annotated with its position in a source file.  The two symbols are the
same symbol.

I understand the reaction to the idea, though.  Even though the
representation of these two objects is different, conceptually they are
the same object.

But consider: on a make bootstrap I did last night, there were 332
warning messages from the byte compiler.  Of these, only 80 gave the
correct line/column position, the other 252 being wrong.  There have been
several bug reports from users complaining about such false positions.
This is what I'm trying to fix.

> Is it out of the question to leave eq alone and introduce, e.g.,
> eq-excluding-position that strips possible positions before comparison?

It is, rather.  To implement this would involve rewriting everything
which calls eq and is used by the byte compiler, to call
eq-excluding-position instead.  These functions would need to exist in
two versions.  There are rather a lot of functions which use eq.  ;-)

My actual strategy is to have two versions of each C primitive used by
the byte compiler, and to switch over to the "symbol-with-position"
version at the start of the byte compiler.

--
Alan Mackenzie (Nuremberg, Germany).

Reply | Threaded
Open this post in threaded view
|

Comparing symbol-with-position using eq (was: Help please! To track down GC trying to free an already freed object.)

Alex Gramiak
Hello, Alan.

Alan Mackenzie <[hidden email]> writes:

> On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:
>
>> I apologize if this topic already reached its conclusion, but IMO
>> having eq return true for two different object types is quite
>> surprising behaviour.
>
> We are comparing two symbols, both of which are 'foo, but one of which is
> annotated with its position in a source file.  The two symbols are the
> same symbol.

Is it not comparing a symbol with a pseudovector containing that symbol
and a position?

> I understand the reaction to the idea, though.  Even though the
> representation of these two objects is different, conceptually they are
> the same object.

Similar objects, but I don't believe that's enough for eq. Consider that
it's regarded non-portable in Lisp to compare integers with eq since the
same number may be represented by different objects, or (eq 3 3.0), or
(eq (list 1 2) (list 1 2)).

> But consider: on a make bootstrap I did last night, there were 332
> warning messages from the byte compiler.  Of these, only 80 gave the
> correct line/column position, the other 252 being wrong.  There have been
> several bug reports from users complaining about such false positions.
> This is what I'm trying to fix.

I agree that it's a problem very much worth fixing; thank you for
working on it.

>> Is it out of the question to leave eq alone and introduce, e.g.,
>> eq-excluding-position that strips possible positions before comparison?
>
> It is, rather.  To implement this would involve rewriting everything
> which calls eq and is used by the byte compiler, to call
> eq-excluding-position instead.  These functions would need to exist in
> two versions.  There are rather a lot of functions which use eq.  ;-)

Why would you need to rewrite the helper procedures that the byte
compiler uses? What about stripping the position at each relevant call
site?

Reply | Threaded
Open this post in threaded view
|

Re: Comparing symbol-with-position using eq

Alan Mackenzie
Hello, Alex.

On Fri, Apr 05, 2019 at 11:05:59 -0600, Alex wrote:
> Hello, Alan.

> Alan Mackenzie <[hidden email]> writes:

> > On Thu, Apr 04, 2019 at 22:49:22 -0600, Alex wrote:

> >> I apologize if this topic already reached its conclusion, but IMO
> >> having eq return true for two different object types is quite
> >> surprising behaviour.

> > We are comparing two symbols, both of which are 'foo, but one of which is
> > annotated with its position in a source file.  The two symbols are the
> > same symbol.

> Is it not comparing a symbol with a pseudovector containing that symbol
> and a position?

At the machine code level, that is what it's doing, yes.

> > I understand the reaction to the idea, though.  Even though the
> > representation of these two objects is different, conceptually they are
> > the same object.

> Similar objects, but I don't believe that's enough for eq. Consider that
> it's regarded non-portable in Lisp to compare integers with eq since the
> same number may be represented by different objects, or (eq 3 3.0), or
> (eq (list 1 2) (list 1 2)).

The point is that comparing 'foo with (Symbol "foo" at 339) with `eq',
and returning t doesn't do any harm.  On the contrary, it enables correct
source positions to be output in byte compiler warning messages.  That it
does no harm is verified by the fact that a make bootstrap with such
annotated symbols works.

However, there is a slight slowdown in this Emacs, compared with the
master branch.  The powers that be have intimated that this slowdown is
unacceptable, so I'm having to make more far reaching changes in the C
code to confine this slowdown to byte compilation.

> > But consider: on a make bootstrap I did last night, there were 332
> > warning messages from the byte compiler.  Of these, only 80 gave the
> > correct line/column position, the other 252 being wrong.  There have been
> > several bug reports from users complaining about such false positions.
> > This is what I'm trying to fix.

> I agree that it's a problem very much worth fixing; thank you for
> working on it.

It's a difficult problem.  The idea of annotating symbols with a source
position (this was Stefan M.'s idea) is the only idea which has even come
close to solving this problem.  I was struggling with another approach
back in 2016 which involved keeping the source location in a hash table
indexed by the corresponding cons cell.  This effort collapsed from the
sheer tedium of the changes needed, coupled with the unlikelihood of
getting the changes working, to say nothing of the fact it would have
rendered the byte compiler unreadable.

> >> Is it out of the question to leave eq alone and introduce, e.g.,
> >> eq-excluding-position that strips possible positions before comparison?

> > It is, rather.  To implement this would involve rewriting everything
> > which calls eq and is used by the byte compiler, to call
> > eq-excluding-position instead.  These functions would need to exist in
> > two versions.  There are rather a lot of functions which use eq.  ;-)

> Why would you need to rewrite the helper procedures that the byte
> compiler uses? What about stripping the position at each relevant call
> site?

I'm not sure what you mean here.  If by "relevant call site" you mean
"places where `eq' is used", there are just too many of them.  They're in
the C code as well as the Lisp.  If you mean "places where the helper
procedures are called", then that stripping the positions would negate
the whole point of the symbols with positions, since it is these helper
procedures which output warning messages.

Or did you mean something else?

--
Alan Mackenzie (Nuremberg, Germany).

12