Emacs port to gcc -fcheck-pointer-bounds

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
In <https://debbugs.gnu.org/29600> I published patches to port Emacs to the
-fcheck-pointer-bounds option of GCC, so that I can debug Emacs with hardware
pointer bounds checking on platforms that support it (such as the Kaby Lake chip
in my year-old laptop running Ubuntu 17.10). This entails changing the
fundamental Emacs internal word from an integer to a pointer of the same width -
which is not as big a deal as one might think, as the commonly-used EMACS_INT
type does not change and Emacs users and Emacs Lisp programmers should not
notice any change.

I would like to install these patches on 'master' soon, and am mentioning this
on emacs-devel to give a heads-up to the few but hardy volunteers who work on
the low-level part of the Emacs implementation.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Eli Zaretskii
> From: Paul Eggert <[hidden email]>
> Date: Wed, 6 Dec 2017 23:34:54 -0800
> Cc: Pip Cet <[hidden email]>
>
> In <https://debbugs.gnu.org/29600> I published patches to port Emacs to the
> -fcheck-pointer-bounds option of GCC, so that I can debug Emacs with hardware
> pointer bounds checking on platforms that support it (such as the Kaby Lake chip
> in my year-old laptop running Ubuntu 17.10). This entails changing the
> fundamental Emacs internal word from an integer to a pointer of the same width -
> which is not as big a deal as one might think, as the commonly-used EMACS_INT
> type does not change and Emacs users and Emacs Lisp programmers should not
> notice any change.

Thanks.

It's a large patch, so I think the discussion could benefit from an
overview of the main points of the implementation.

In particular, how would this work in a build --with-wide-int?

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Pip Cet
In reply to this post by Paul Eggert
How does this work? Is sizeof(void *) == 24? (I don't have hardware
that supports this, unfortunately, and no software emulation appears
to be available).

I would suggest an explicit __bnd_check_pointer_bounds in
make_lisp_symbol until the GCC issue is fixed, as an out-of-bounds
symbol index seems a very real possibility for a bug. (And maybe
lisp.h should include ptr-bounds.h, as we'll probably need it in the
allocation functions there?)

The rest of the patch looks good to me, though it's unfortunate that
NIL_IS_ZERO's definition is becoming less futureproof (it's a minor
detail, but switching to a macro there might be better).

On Thu, Dec 7, 2017 at 7:34 AM, Paul Eggert <[hidden email]> wrote:

> In <https://debbugs.gnu.org/29600> I published patches to port Emacs to the
> -fcheck-pointer-bounds option of GCC, so that I can debug Emacs with
> hardware pointer bounds checking on platforms that support it (such as the
> Kaby Lake chip in my year-old laptop running Ubuntu 17.10). This entails
> changing the fundamental Emacs internal word from an integer to a pointer of
> the same width - which is not as big a deal as one might think, as the
> commonly-used EMACS_INT type does not change and Emacs users and Emacs Lisp
> programmers should not notice any change.
>
> I would like to install these patches on 'master' soon, and am mentioning
> this on emacs-devel to give a heads-up to the few but hardy volunteers who
> work on the low-level part of the Emacs implementation.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
In reply to this post by Eli Zaretskii
On 12/08/2017 05:45 AM, Eli Zaretskii wrote:
> It's a large patch, so I think the discussion could benefit from an
> overview of the main points of the implementation.
>
> In particular, how would this work in a build --with-wide-int?

The short answer is, "It doesn't"; this is enforced by an '# error
"pointer-checking not supported with wide integers"' in src/lisp.h. The
longer answer is that I could add support for it, but I doubt whether
it's worth the trouble.

The basic idea of -fcheck-pointer-bounds is that you cannot lie to GCC
that a pointer is an integer: if a value is intended to be a pointer you
must declare it to be a pointer. You are allowed to calculate an
out-of-range pointer so long as you don't use it, and you are allowed to
cast it to some other type or to void *. The hardware/software
combination checks the bounds some (but not all, alas) pointer
dereferencing operations. The idea is to catch the most common cases
that might be victimized.

The -mmpx implementation does not change the size of pointers: they are
still 8 bytes on x86-64 and are still 4 bytes on x86. Instead, the
hardware keeps a cache that maps addresses of pointers to the pointers'
bounds. The compiler generates code that deduces a pointer's bounds from
the cache when the program loads a pointer from memory. It's the
compiler's responsibility to keep track of the bounds of pointers that
it keeps in registers, so that when it stores these pointers the cache
can be updated. The reason for all this complexity is to support
programs where only some modules have been built with
-fcheck-pointer-bounds, and where pointers are passed back and forth
between modules.

The -mmpx implementation is jury-rigged, and I do not recommend it for
production: it is so slow and buggy that not a lot of people use it, and
quite possibly it will be withdrawn from GCC some day. However, for
debugging Emacs it is useful and I found a real bug in emacs-26 with it.
Unlike -fsanitize=address, -fcheck-pointer-bounds works with a dumped
Emacs. (Well, it works halfway: since the cache doesn't survive
undumping, the dumped Emacs cannot check pointers created before Emacs
was dumped.) In contrast, -fsanitize=address makes a dumped Emacs crash
immediately.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
In reply to this post by Pip Cet
On 12/08/2017 08:13 AM, Pip Cet wrote:
> I would suggest an explicit __bnd_check_pointer_bounds in
> make_lisp_symbol until the GCC issue is fixed, as an out-of-bounds
> symbol index seems a very real possibility for a bug.

Thanks, that's a good suggestion, and I'll give it a whirl. I did try it
on an earlier iteration and don't recall why I yanked it.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Eli Zaretskii
In reply to this post by Paul Eggert
> From: Paul Eggert <[hidden email]>
> Date: Fri, 8 Dec 2017 14:06:25 -0800
> Cc: [hidden email], [hidden email]
>
> On 12/08/2017 05:45 AM, Eli Zaretskii wrote:
> > It's a large patch, so I think the discussion could benefit from an
> > overview of the main points of the implementation.
> >
> > In particular, how would this work in a build --with-wide-int?
>
> The short answer is, "It doesn't"

I suspected that much.  Too bad, but not a catastrophe.

What about the more general question I asked: can you say a few words
about the idea of your implementation of the support for
"-fcheck-pointer-bounds"?

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
On 12/09/2017 12:33 AM, Eli Zaretskii wrote:
> can you say a few words
> about the idea of your implementation of the support for
> "-fcheck-pointer-bounds"?


Sure. Three basic points.

1. The Emacs C code should store pointer values only in objects declared to be
of type pointer. Otherwise, every time Emacs converted an integer to a pointer,
machine code generated by -fcheck-pointer-bounds would disable bounds checking
for that pointer (which would defeat the point of bounds checking).

This is what the first patch does. I like this patch anyway, since it cleans up
the Emacs internals a bit and it doesn't significantly affect performance in the
typical case where -fcheck-pointer-bounds is not used.

This first patch does not mean Emacs can't cast integers to pointers; that's OK.
Emacs just can't cast pointers to integers and back again and then dereference
the result and expect pointer-bounds checking to catch errors there.

2. With the 1st patch installed, building with -fcheck-pointer-bounds makes
Emacs crash due to some false alarms. A typical example is that Emacs takes two
individually valid but unrelated pointers P and Q, computes Q-P, and then later
dereferences by computing P[Q - P], which crashes because Q-P falls outside P's
bounds. The 2nd patch inserts the minimal changes to Emacs to avoid these
crashes, by widening P's bounds in such cases.

3. The downside of the 2nd patch is that pointer bounds are often made too wide,
so bounds checking won't catch some errors that it could easily catch. To fix
some of this, the 3rd patch tightens pointer bounds when that is easy. This
patch does not attempt to tighten bounds in all cases, as that would involve too
many changes to the code and would make bounds-checking even slower than it is.
It merely tightens bounds in a few strategic places, mostly in allocators, so
that bounds errors are likely to be caught. It's a cost/benefit guesswork where
I've tried to minimize development and runtime cost while maximizing
error-catching benefit.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Eli Zaretskii
> From: Paul Eggert <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Sat, 9 Dec 2017 23:10:43 -0800
>
> 1. The Emacs C code should store pointer values only in objects declared to be
> of type pointer. Otherwise, every time Emacs converted an integer to a pointer,
> machine code generated by -fcheck-pointer-bounds would disable bounds checking
> for that pointer (which would defeat the point of bounds checking).
>
> This is what the first patch does. I like this patch anyway, since it cleans up
> the Emacs internals a bit and it doesn't significantly affect performance in the
> typical case where -fcheck-pointer-bounds is not used.

But if we install this patch regardless of -fcheck-pointer-bounds,
then the --with-wide-int build will cease working, won't it?

Thanks for the other explanations.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
Eli Zaretskii wrote:
> But if we install this patch regardless of -fcheck-pointer-bounds,
> then the --with-wide-int build will cease working, won't it?

No, --with-wide-int still works fine, because it causes LISP_WORDS_ARE_POINTERS
to be false, so src/lisp.h falls back on using EMACS_INT for the basic Lisp
type, just as before.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Paul Eggert <[hidden email]>
> Date: Sun, 10 Dec 2017 23:54:59 -0800
>
> Eli Zaretskii wrote:
> > But if we install this patch regardless of -fcheck-pointer-bounds,
> > then the --with-wide-int build will cease working, won't it?
>
> No, --with-wide-int still works fine, because it causes LISP_WORDS_ARE_POINTERS
> to be false, so src/lisp.h falls back on using EMACS_INT for the basic Lisp
> type, just as before.

So we will have two different data types implementing a Lisp object,
one --with-wide-int and another without it?  That sounds suboptimal to
me.

Why not reserve the new representation to the -fcheck-pointer-bounds
builds only?  Such builds will anyway be used only by Emacs developers
(not even by all of them), and for those who do a different
representation won't be a significant obstacle.

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
On 12/11/2017 07:26 AM, Eli Zaretskii wrote:
> So we will have two different data types implementing a Lisp object,
> one --with-wide-int and another without it?  That sounds suboptimal to
> me.

I figured out a simple fix to get it to work --with-wide-int (see first
attached patch), so I hope this issue is moot now, and boldly installed
the patches into master, along with the attached followup patches.

> Why not reserve the new representation to the -fcheck-pointer-bounds
> builds only?

Although we could change lisp.h to do that, I'd rather not. Having
Lisp_Object be an opaque pointer type is a win even for non-developers
who merely make minor changes to the C code and are not using
-fcheck-pointer-bounds or --enable-check-lisp-object-type or anything
like that. In practice it's error-prone when Lisp_Object is an integer
type, and if we can easily avoid these errors we should.

Here's another way to think about it. Our current practice of defaulting
to --enable-check-lisp-object-type for developers is an outgrowth of the
fact that integer Lisp_Objects are so error-prone. Unfortunately, this
practice is dicey in its own right, as it means developers are dealing
with different object code than non-developers. I would favor going back
to the old practice of disabling --enable-check-lisp-object-type by
default, even for developers, once we've shaken out the change that I
just installed. That way, developers and non-developers will default to
more-similar machine code.

0001-Port-fcheck-pointer-bounds-to-with-wide-int.patch (1K) Download Attachment
0002-Fix-recently-introduced-cast-typo.patch (995 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Paul Eggert <[hidden email]>
> Date: Tue, 12 Dec 2017 15:35:01 -0800
>
> Here's another way to think about it. Our current practice of defaulting
> to --enable-check-lisp-object-type for developers is an outgrowth of the
> fact that integer Lisp_Objects are so error-prone. Unfortunately, this
> practice is dicey in its own right, as it means developers are dealing
> with different object code than non-developers. I would favor going back
> to the old practice of disabling --enable-check-lisp-object-type by
> default, even for developers, once we've shaken out the change that I
> just installed. That way, developers and non-developers will default to
> more-similar machine code.

But using a (fake) pointer is only marginally safer than using an
integer, isn't it?

Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
On 12/13/2017 08:20 AM, Eli Zaretskii wrote:
> using a (fake) pointer is only marginally safer than using an
> integer, isn't it?

The fake pointer catches (at compile-time) common faults like the one
the attached patch fixes, where an int was passed where a Lisp_Object
was expected. These are the most important faults that
--enable-check-lisp-object-type catches.

We could say that the fake pointer is only marginally safer, in the
sense that --enable-check-lisp-object-type is only marginally safer than
--disable-check-lisp-object-type. However, this marginal safety is
useful; and once you have the fake pointer,
--enable-check-lisp-object-type doesn't buy much extra safety that is
useful.


0001-Fix-type-typo-on-Solaris.patch (821 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Stefan Monnier
>> using a (fake) pointer is only marginally safer than using an
>> integer, isn't it?
> The fake pointer catches (at compile-time) common faults like the one the
> attached patch fixes, where an int was passed where a Lisp_Object was
> expected. These are the most important faults
> that --enable-check-lisp-object-type catches.

Indeed, it doesn't catch things like `x + n` since adding a constant to
a pointer is also a valid operation, but it does catch the vast majority
of problems.

> and once you have the fake pointer, --enable-check-lisp-object-type
> doesn't buy much extra safety that is useful.

It does give us some extra checking, but not very much, indeed.
Maybe we can turn it into a no-op.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Emacs port to gcc -fcheck-pointer-bounds

Paul Eggert
On 12/13/2017 11:17 AM, Stefan Monnier wrote:
>> The fake pointer catches (at compile-time) common faults like the one
>> the
>> attached patch fixes, where an int was passed where a Lisp_Object was
>> expected. These are the most important faults that
>> --enable-check-lisp-object-type catches.
> ... it doesn't catch things like `x + n` since adding a constant to a
> pointer is also a valid operation

Actually it catches even (x + n), because Lisp_Object is 'union Lisp_X
*', and the union type is deliberately incomplete so the C compiler does
not know its size and cannot multiply n by sizeof (union Lisp_X). The C
Standard requires a diagnostic for (x + n) and practice compilers
invariably issue at least a warning.

There are some things it doesn't catch. Most of these (e.g.,
'Lisp_Object x = 0;', or 'Lisp Object x = FOO, y = BAR; return x == y;')
are harmless annoyances. The only worrisome thing not caught is
converting between void * and Lisp_Object, e.g., 'Lisp_Object z = malloc
(n);'. However, to my mind it's overkill to
--enable-check-lisp-object-type by default just to catch this, as void *
is dangerous with every C pointer type and there's little extra harm to
making it dangerous with Lisp_Object too.

> Maybe we can turn it into a no-op.

Yes, that's my thought too.