Using __builtin_expect (likely/unlikely macros)

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Would using these, if available, be acceptable? I noticed that some code
in configure.ac already defines HAVE___BUILTIN_EXPECT, though I couldn't
see what part of autoconf does this.

Good candidates here include the conditionals around emacs_abort/fatal,
and possibly in cases like the *alloc procedures and xfree.

Even if there's no noticeable speedup, I think that the likely/unlikely
macros are a nice way to indicate that a branch is exceedingly
rare/common.

  #define unlikely(expr) __builtin_expect(!!(expr), 0)
  #define likely(expr) __builtin_expect(!!(expr), 1)

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
Alex Gramiak wrote:
> the likely/unlikely
> macros are a nice way to indicate that a branch is exceedingly
> rare/common.

The cost (in making the C code harder to read, write and maintain) so often
exceeds that benefit that I'd rather avoid these macros unless there's a good
performance case for putting them in.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Paul Eggert <[hidden email]> writes:

> Alex Gramiak wrote:
>> the likely/unlikely
>> macros are a nice way to indicate that a branch is exceedingly
>> rare/common.
>
> The cost (in making the C code harder to read, write and maintain) so often
> exceeds that benefit that I'd rather avoid these macros unless there's a good
> performance case for putting them in.

I doubt that any performance benefit would be noticeable, at least
without some microbenchmarks that I don't have.

Though I don't think that these macros, used sparingly around code such
as emacs_abort, would make it harder to read/write/maintain. I'm
certainly not suggesting to throw them around without care.

Still, I understand the resistance to include them.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
Alex Gramiak wrote:
> Though I don't think that these macros, used sparingly around code such
> as emacs_abort, would make it harder to read/write/maintain.

These macro calls would not help near calls to emacs_abort, as it should already
be obvious to a careful human reader that the jump to emacs_abort is the road
less traveled. (That's also obvious to GCC, since emacs_abort is _Noreturn.)

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Paul Eggert <[hidden email]> writes:

> Alex Gramiak wrote:
>> Though I don't think that these macros, used sparingly around code such
>> as emacs_abort, would make it harder to read/write/maintain.
>
> These macro calls would not help near calls to emacs_abort, as it should already
> be obvious to a careful human reader that the jump to emacs_abort is the road
> less traveled. (That's also obvious to GCC, since emacs_abort is _Noreturn.)

To human readers, yes, but from what I can tell, GCC is mixed on this.
When applying the following diff that surrounds emacs_abort in
bytecode.c and xdisp.c, the assembly in both applied/unapplied is the
same for bytecode.c, but not for xdisp.c (even without the LIKELY
cases). I tested using gcc -O2 -S.

Also, I tried putting a LIKELY in lisp_h_CHECK_TYPE (seems like a nice
place for one) and that yielded different assembly for fns.c, even
though wrong_type_argument is _Noreturn.


diff --git a/src/bytecode.c b/src/bytecode.c
index 40977799bf..ce1a7bd254 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -39,6 +39,7 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 #ifndef BYTE_CODE_SAFE
 # define BYTE_CODE_SAFE false
 #endif
+#define unlikely(expr) (__builtin_expect (expr, 0))
 
 /* Define BYTE_CODE_METER to generate a byte-op usage histogram.  */
 /* #define BYTE_CODE_METER */
@@ -404,7 +405,7 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
       int op;
       enum handlertype type;
 
-      if (BYTE_CODE_SAFE && ! (stack_base <= top && top < stack_lim))
+      if (unlikely (BYTE_CODE_SAFE && ! (stack_base <= top && top < stack_lim)))
  emacs_abort ();
 
 #ifdef BYTE_CODE_METER
@@ -664,9 +665,9 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
  op_branch:
   op -= pc - bytestr_data;
  op_relative_branch:
-  if (BYTE_CODE_SAFE
+  if (unlikely (BYTE_CODE_SAFE
       && ! (bytestr_data - pc <= op
-    && op < bytestr_data + bytestr_length - pc))
+    && op < bytestr_data + bytestr_length - pc)))
     emacs_abort ();
   quitcounter += op < 0;
   if (!quitcounter)
@@ -1397,7 +1398,7 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
        number of cases is less, which uses a simple vector for linear
        search as the jump table.  */
             Lisp_Object jmp_table = POP;
-    if (BYTE_CODE_SAFE && !HASH_TABLE_P (jmp_table))
+    if (unlikely (BYTE_CODE_SAFE && !HASH_TABLE_P (jmp_table)))
               emacs_abort ();
             Lisp_Object v1 = POP;
             ptrdiff_t i;
@@ -1426,7 +1427,7 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
     if (i >= 0)
       {
  Lisp_Object val = HASH_VALUE (h, i);
- if (BYTE_CODE_SAFE && !FIXNUMP (val))
+ if (unlikely (BYTE_CODE_SAFE && !FIXNUMP (val)))
   emacs_abort ();
  op = XFIXNUM (val);
  goto op_branch;
@@ -1436,8 +1437,8 @@ exec_byte_code (Lisp_Object bytestr, Lisp_Object vector, Lisp_Object maxdepth,
 
  CASE_DEFAULT
  CASE (Bconstant):
-  if (BYTE_CODE_SAFE
-      && ! (Bconstant <= op && op < Bconstant + const_length))
+  if (unlikely (BYTE_CODE_SAFE
+                        && ! (Bconstant <= op && op < Bconstant + const_length)))
     emacs_abort ();
   PUSH (vectorp[op - Bconstant]);
   NEXT;
diff --git a/src/xdisp.c b/src/xdisp.c
index a88fc698b8..9872f69cb0 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -344,7 +344,8 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 
 /* Holds the list (error).  */
 static Lisp_Object list_of_error;
-
+#define UNLIKELY(expr) (__builtin_expect (expr, 0))
+#define LIKELY(expr) (__builtin_expect (expr, 1))
 #ifdef HAVE_WINDOW_SYSTEM
 
 /* Test if overflow newline into fringe.  Called with iterator IT
@@ -8322,7 +8323,7 @@ compute_stop_pos_backwards (struct it *it)
       reseat_1 (it, pos, false);
       compute_stop_pos (it);
       /* We must advance forward, right?  */
-      if (it->stop_charpos <= charpos)
+      if (UNLIKELY (it->stop_charpos <= charpos))
  emacs_abort ();
     }
   while (charpos > BEGV && it->stop_charpos >= it->end_charpos);
@@ -8371,7 +8372,7 @@ handle_stop_backwards (struct it *it, ptrdiff_t charpos)
  it->current.string_pos = string_pos (charpos, it->string);
       compute_stop_pos (it);
       /* We must advance forward, right?  */
-      if (it->stop_charpos <= it->prev_stop)
+      if (UNLIKELY (it->stop_charpos <= it->prev_stop))
  emacs_abort ();
       charpos = it->stop_charpos;
     }
@@ -11443,7 +11444,7 @@ pop_message_unwind (void)
 void
 check_message_stack (void)
 {
-  if (!NILP (Vmessage_stack))
+  if (UNLIKELY (!NILP (Vmessage_stack)))
     emacs_abort ();
 }
 
@@ -15495,7 +15496,7 @@ set_cursor_from_row (struct window *w, struct glyph_row *row,
       /* Need to compute x that corresponds to GLYPH.  */
       for (g = row->glyphs[TEXT_AREA], x = row->x; g < glyph; g++)
  {
-  if (g >= row->glyphs[TEXT_AREA] + row->used[TEXT_AREA])
+  if (UNLIKELY (g >= row->glyphs[TEXT_AREA] + row->used[TEXT_AREA]))
     emacs_abort ();
   x += g->pixel_width;
  }
@@ -16827,9 +16828,9 @@ redisplay_window (Lisp_Object window, bool just_this_one_p)
 
   /* Some sanity checks.  */
   CHECK_WINDOW_END (w);
-  if (Z == Z_BYTE && CHARPOS (opoint) != BYTEPOS (opoint))
+  if (UNLIKELY (Z == Z_BYTE && CHARPOS (opoint) != BYTEPOS (opoint)))
     emacs_abort ();
-  if (BYTEPOS (opoint) < CHARPOS (opoint))
+  if (UNLIKELY (BYTEPOS (opoint) < CHARPOS (opoint)))
     emacs_abort ();
 
   if (mode_line_update_needed (w))
@@ -19318,9 +19319,9 @@ try_window_id (struct window *w)
       adjust_window_ends (w, last_text_row, false);
       eassert (w->window_end_bytepos >= 0);
     }
-  else if (first_unchanged_at_end_row == NULL
-   && last_text_row == NULL
-   && last_text_row_at_end == NULL)
+  else if (LIKELY (first_unchanged_at_end_row == NULL
+                   && last_text_row == NULL
+                   && last_text_row_at_end == NULL))
     {
       /* Displayed to end of window, but no line containing text was
  displayed.  Lines were deleted at the end of the window.  */
@@ -21015,7 +21016,7 @@ find_row_edges (struct it *it, struct glyph_row *row,
    which puts the iterator at the beginning of the next line, in
    the logical order. */
  row->maxpos = it->current.pos;
-      else if (max_pos == min_pos && it->method != GET_FROM_BUFFER)
+      else if (LIKELY (max_pos == min_pos && it->method != GET_FROM_BUFFER))
  /* A line that is entirely from a string/image/stretch...  */
  row->maxpos = row->minpos;
       else
@@ -25210,7 +25211,7 @@ display_string (const char *string, Lisp_Object lisp_string, Lisp_Object face_st
  }
       break;
     }
-  else if (x + glyph->pixel_width >= it->first_visible_x)
+  else if (LIKELY (x + glyph->pixel_width >= it->first_visible_x))
     {
       /* Glyph is at least partially visible.  */
       ++it->hpos;
@@ -27847,7 +27848,7 @@ produce_special_glyphs (struct it *it, enum display_element_type what)
   spec_glyph_lookup_face (XWINDOW (it->window), &glyph);
  }
     }
-  else if (what == IT_TRUNCATION)
+  else if (LIKELY (what == IT_TRUNCATION))
     {
       /* Truncation glyph.  */
       SET_GLYPH_FROM_CHAR (glyph, '$');
Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Eli Zaretskii
> From: Alex Gramiak <[hidden email]>
> Date: Mon, 15 Apr 2019 18:16:02 -0600
> Cc: [hidden email]
>
> To human readers, yes, but from what I can tell, GCC is mixed on this.
> When applying the following diff that surrounds emacs_abort in
> bytecode.c and xdisp.c, the assembly in both applied/unapplied is the
> same for bytecode.c, but not for xdisp.c (even without the LIKELY
> cases). I tested using gcc -O2 -S.

I'm not sure whether you put LIKELY and UNLIKELY somewhat randomly,
just for testing, or did you really think each place is
likely/unlikely as in the change, but at least some places in xdisp.c
are wrong: they use LIKELY where UNLIKELY is more appropriate, abd
vice versa.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
In reply to this post by Alex Gramiak
Alex Gramiak wrote:
>> These macro calls would not help near calls to emacs_abort, as it should already
>> be obvious to a careful human reader that the jump to emacs_abort is the road
>> less traveled. (That's also obvious to GCC, since emacs_abort is _Noreturn.)

> To human readers, yes, but from what I can tell, GCC is mixed on this.

Then we should fix GCC, if the code it generates has a performance problem
(whatever it is, it's quite small). That'd be better than littering the Emacs
source code with __builtin_expect or UNLIKELY calls. The GCC manual recommends
against manually inserting such calls; performance nerds are supposed to use
-fprofile-arcs instead. In my experience the calls are typically more trouble
than they're worth.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
In reply to this post by Eli Zaretskii
Eli Zaretskii <[hidden email]> writes:

> I'm not sure whether you put LIKELY and UNLIKELY somewhat randomly,
> just for testing, or did you really think each place is
> likely/unlikely as in the change, but at least some places in xdisp.c
> are wrong: they use LIKELY where UNLIKELY is more appropriate, abd
> vice versa.

Out of curiosity, which places would that be? I only put UNLIKELY calls
around the checks for emacs_abort (which I presume to be unlikely), and
the LIKELY calls are of the form:

  else if (LIKELY (<cond>))
    {
      ...
    }
  else
    emacs_abort ();

So if already at the "else if" clause, it's likely that the condition is
true, otherwise an abort would take place.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Stefan Monnier
In reply to this post by Paul Eggert
>>> These macro calls would not help near calls to emacs_abort, as it
>>> should already be obvious to a careful human reader that the jump to
>>> emacs_abort is the road less traveled. (That's also obvious to GCC,
>>> since emacs_abort is _Noreturn.)
>> To human readers, yes, but from what I can tell, GCC is mixed on this.
> Then we should fix GCC, if the code it generates has a performance problem
> (whatever it is, it's quite small).

FWIW, the fact that a function is "_Noreturn" doesn't necessarily mean
that a call to it is unlikely (in many cases it is, I guess, but
definitely not all), so maybe GCC maintainers consciously decided not to
link the two.

> The GCC manual recommends against manually inserting such calls;

It's likely based on some past experiments that showed programmers
aren't very good at understanding what is likely and what isn't in
their code.

BTW I think instead of marking branches as likely or unlikely, I'd
prefer to tell GCC that some functions "should be slow"
(e.g. emacs_abort) so it optimizes the code paths that don't go through
those functions to the detriment of those that do.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
Stefan Monnier wrote:

> FWIW, the fact that a function is "_Noreturn" doesn't necessarily mean
> that a call to it is unlikely (in many cases it is, I guess, but
> definitely not all), so maybe GCC maintainers consciously decided not to
> link the two.

Possibly they did. In hindsight I'd argue that was a mistake. If one has no
other evidence about the likelihood of a branch, a branch to a _Noreturn call
should default to being unlikely.

> BTW I think instead of marking branches as likely or unlikely, I'd
> prefer to tell GCC that some functions "should be slow"
> (e.g. emacs_abort) so it optimizes the code paths that don't go through
> those functions to the detriment of those that do.

GCC has the function attribute 'cold' for that. This is less intrusive than
__builtin_expect and so would be preferable. Still, the GCC manual says that
__attribute__ ((cold)) is ignored when profile feedback is available, which is
another indication that people who care about performance should be using
-fprofile-use etc. And as far as I know __attribute__ ((cold)) is rarely used:
even glibc uses it only once, in obscure code never used on GNU/Linux.
Presumably this is partly because the attribute didn't exist until about five
years ago.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Eli Zaretskii
In reply to this post by Alex Gramiak
> From: Alex Gramiak <[hidden email]>
> Cc: [hidden email],  [hidden email]
> Date: Mon, 15 Apr 2019 23:33:51 -0600
>
> Eli Zaretskii <[hidden email]> writes:
>
> > I'm not sure whether you put LIKELY and UNLIKELY somewhat randomly,
> > just for testing, or did you really think each place is
> > likely/unlikely as in the change, but at least some places in xdisp.c
> > are wrong: they use LIKELY where UNLIKELY is more appropriate, abd
> > vice versa.
>
> Out of curiosity, which places would that be? I only put UNLIKELY calls
> around the checks for emacs_abort (which I presume to be unlikely), and
> the LIKELY calls are of the form:
>
>   else if (LIKELY (<cond>))
>     {
>       ...
>     }
>   else
>     emacs_abort ();

OK, it's possible that I don't understand the exact semantics of
__builtin_expect in these situations.

The problem is that you used LIKELY like this:

  if (A)
    do_A;
  else if (B)
    do_B;
  else if (C)
    do_C;
  else if (LIKELY (D))
    do_D;
  else
    cant_happen ();

Essentially, the above is a moral equivalent of a 'switch' with the
'default' case aborting because it "cannot happen".  In such code, the
order of the clauses doesn't necessarily tell anything about their
likelihood; up front, they all are equally "likely".  So using LIKELY
only in the last one sends a wrong signal: that last condition is
neither more nor less likely than all the others.  Actually, in some
cases it might be _less_ likely than the preceding ones, because if I
knew that some of these conditions happens much more frequently, I'd
test it first.

Now, it's possible that the effect of __builtin_expect doesn't care
about this issue.  The GCC manual doesn't help to figure out whether
this is the case, because it only talks about a simple case of a
single 'if' clause, and doesn't tell any details about what GCC is
allowed to do when it sees __builtin_expect.  But just by looking at
how the code looks, I immediately raised a brow.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Eli Zaretskii <[hidden email]> writes:

> OK, it's possible that I don't understand the exact semantics of
> __builtin_expect in these situations.
>
> The problem is that you used LIKELY like this:
>
>   if (A)
>     do_A;
>   else if (B)
>     do_B;
>   else if (C)
>     do_C;
>   else if (LIKELY (D))
>     do_D;
>   else
>     cant_happen ();
>
> Essentially, the above is a moral equivalent of a 'switch' with the
> 'default' case aborting because it "cannot happen".  In such code, the
> order of the clauses doesn't necessarily tell anything about their
> likelihood; up front, they all are equally "likely".  So using LIKELY
> only in the last one sends a wrong signal: that last condition is
> neither more nor less likely than all the others.  Actually, in some
> cases it might be _less_ likely than the preceding ones, because if I
> knew that some of these conditions happens much more frequently, I'd
> test it first.

It was my understanding that since an else if is equivalent to else { if
... }, it would only affect the last two branches. Though I could easily
be wrong here.

> Now, it's possible that the effect of __builtin_expect doesn't care
> about this issue.  The GCC manual doesn't help to figure out whether
> this is the case, because it only talks about a simple case of a
> single 'if' clause, and doesn't tell any details about what GCC is
> allowed to do when it sees __builtin_expect.  But just by looking at
> how the code looks, I immediately raised a brow.

Right, considering the confusion it would be counterproductive to use
them in this fashion. A workaround to the confusion would be to do:

  else
    {
      if (LIKELY (D))
        do_D;
      else
        cant_happen ();
    }

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
In reply to this post by Paul Eggert
Paul Eggert <[hidden email]> writes:

>> BTW I think instead of marking branches as likely or unlikely, I'd
>> prefer to tell GCC that some functions "should be slow"
>> (e.g. emacs_abort) so it optimizes the code paths that don't go through
>> those functions to the detriment of those that do.
>
> GCC has the function attribute 'cold' for that. This is less intrusive than
> __builtin_expect and so would be preferable. Still, the GCC manual says that
> __attribute__ ((cold)) is ignored when profile feedback is available, which is
> another indication that people who care about performance should be using
> -fprofile-use etc. And as far as I know __attribute__ ((cold)) is rarely used:
> even glibc uses it only once, in obscure code never used on GNU/Linux.
> Presumably this is partly because the attribute didn't exist until about five
> years ago.

It would still be good to use it for default builds, no? The GCC
documentation states:

  It is thus useful to mark functions used to handle unlikely
  conditions, such as perror, as cold to improve optimization of hot
  functions that do call marked functions in rare occasions.

So the error/signal calls, wrong_type_argument, etc. would be good
places for this. It doesn't indicate anything to the programmer at call
sites, but that point is controversial, so slapping a few attributes
down seems like a better way to go.

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
On 4/16/19 9:10 AM, Alex Gramiak wrote:
> It would still be good to use it for default builds, no? The GCC
> documentation states:
>
>   It is thus useful to mark functions used to handle unlikely
>   conditions, such as perror, as cold to improve optimization of hot
>   functions that do call marked functions in rare occasions.

That argument would be stronger if functions like 'perror' actually used
__attribute__ ((cold)), which they don't (at least, not in GNU systems).

There is a cost and a benefit to adding these attributes. The benefit is
that they should improve runtime performance very slightly, and (more
important) they should help nonexpert human readers know that a function
is rarely called. The cost is that they make the code harder to
maintain, thus placing a burden on maintainers. Perhaps I'm biased as I
would bear the cost and get none of the benefit; still, the overall
cost-benefit ratio doesn't look all that favorable for Emacs.

That being said, it might make sense for a few obviously-rarely-called
functions like 'emacs-abort' to be marked with __attribute__ ((cold)),
so long as we don't turn this into a mission to mark all cold functions
(which would cost us more than it would benefit). That is what GCC
itself does, with its own functions. However, I'd like to see
performance figures. Could you try it out on the benchmark of 'cd lisp
&& time make compile-always'?

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Paul Eggert <[hidden email]> writes:

> That being said, it might make sense for a few obviously-rarely-called
> functions like 'emacs-abort' to be marked with __attribute__ ((cold)),
> so long as we don't turn this into a mission to mark all cold functions
> (which would cost us more than it would benefit). That is what GCC
> itself does, with its own functions. However, I'd like to see
> performance figures. Could you try it out on the benchmark of 'cd lisp
> && time make compile-always'?

Right, I agree that if used, they should be used sparingly. I tested
three versions a few times each with both 'make' and 'make -j4':

a) Regular Emacs master.
b) The below diff with only the _Cold attribute
c) The below diff with both _Cold and _Hot attributes

a) Normal
real    4:17.97s
user    3:57.18s
sys     20.394s

real    1:17.67s
user    4:23.78s
sys     18.888s

b) Cold
real    4:10.92s
user    3:50.34s
sys     20.178s

real    1:15.77s
user    4:16.73s
sys     18.943s

c) Hot/Cold
real    4:11.43s
user    3:51.07s
sys     19.961s

real    1:16.01s
user    4:17.63s
sys     18.662s

So not much of a difference. For some reason the Hot/Cold performed
consistently worse than Cold.

I also tested startup/shutdown with perf:

 Performance counter stats for '../emacs-normal -f kill-emacs' (20 runs):

            762.17 msec task-clock:u              #    0.844 CPUs utilized            ( +-  0.23% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,941      page-faults:u             #    0.017 M/sec                    ( +-  0.01% )
     2,998,322,125      cycles:u                  #    3.934 GHz                      ( +-  0.06% )
     1,392,869,413      stalled-cycles-frontend:u #   46.45% frontend cycles idle     ( +-  0.15% )
       982,206,843      stalled-cycles-backend:u  #   32.76% backend cycles idle      ( +-  0.18% )
     4,874,186,825      instructions:u            #    1.63  insn per cycle        
                                                  #    0.29  stalled cycles per insn  ( +-  0.01% )
     1,037,929,374      branches:u                # 1361.802 M/sec                    ( +-  0.01% )
        17,930,471      branch-misses:u           #    1.73% of all branches          ( +-  0.16% )
     1,209,539,215      L1-dcache-loads:u         # 1586.960 M/sec                    ( +-  0.01% )
        42,346,229      L1-dcache-load-misses:u   #    3.50% of all L1-dcache hits    ( +-  0.05% )
         9,088,647      LLC-loads:u               #   11.925 M/sec                    ( +-  0.29% )
   <not supported>      LLC-load-misses:u                                          

           0.90325 +- 0.00441 seconds time elapsed  ( +-  0.49% )



 Performance counter stats for '../emacs.cold -f kill-emacs' (20 runs):

            755.94 msec task-clock:u              #    0.845 CPUs utilized            ( +-  0.24% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,941      page-faults:u             #    0.017 M/sec                    ( +-  0.01% )
     2,976,036,365      cycles:u                  #    3.937 GHz                      ( +-  0.06% )
     1,374,451,779      stalled-cycles-frontend:u #   46.18% frontend cycles idle     ( +-  0.14% )
       990,227,732      stalled-cycles-backend:u  #   33.27% backend cycles idle      ( +-  0.18% )
     4,878,661,927      instructions:u            #    1.64  insn per cycle        
                                                  #    0.28  stalled cycles per insn  ( +-  0.00% )
     1,038,495,525      branches:u                # 1373.782 M/sec                    ( +-  0.00% )
        17,859,906      branch-misses:u           #    1.72% of all branches          ( +-  0.16% )
     1,209,345,531      L1-dcache-loads:u         # 1599.792 M/sec                    ( +-  0.00% )
        42,444,358      L1-dcache-load-misses:u   #    3.51% of all L1-dcache hits    ( +-  0.06% )
         9,204,368      LLC-loads:u               #   12.176 M/sec                    ( +-  0.41% )
   <not supported>      LLC-load-misses:u                                          

           0.89430 +- 0.00217 seconds time elapsed  ( +-  0.24% )


 Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20 runs):

            761.97 msec task-clock:u              #    0.845 CPUs utilized            ( +-  0.20% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,947      page-faults:u             #    0.017 M/sec                    ( +-  0.01% )
     2,989,750,359      cycles:u                  #    3.924 GHz                      ( +-  0.04% )
     1,383,312,275      stalled-cycles-frontend:u #   46.27% frontend cycles idle     ( +-  0.12% )
       994,643,853      stalled-cycles-backend:u  #   33.27% backend cycles idle      ( +-  0.13% )
     4,879,318,990      instructions:u            #    1.63  insn per cycle        
                                                  #    0.28  stalled cycles per insn  ( +-  0.00% )
     1,038,584,045      branches:u                # 1363.022 M/sec                    ( +-  0.00% )
        17,863,736      branch-misses:u           #    1.72% of all branches          ( +-  0.13% )
     1,209,327,347      L1-dcache-loads:u         # 1587.103 M/sec                    ( +-  0.00% )
        42,501,374      L1-dcache-load-misses:u   #    3.51% of all L1-dcache hits    ( +-  0.05% )
         9,201,311      LLC-loads:u               #   12.076 M/sec                    ( +-  0.28% )
   <not supported>      LLC-load-misses:u                                          

           0.90132 +- 0.00201 seconds time elapsed  ( +-  0.22% )


Which again shows a slight improvement with the Cold attributes, and
still shows the hot attributes degrading performance. Perhaps I was too
overzealous with the hot tagging?


hot-cold.diff (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Alex Gramiak
Alex Gramiak <[hidden email]> writes:
> Perhaps I was too overzealous with the hot tagging?

I definitely was, since I removed the _Hot attributes from the
*malloc/free procedures, which turned out to be in bad judgment.
Removing those yielded:

 Performance counter stats for 'src/emacs.hot-cold -f kill-emacs' (20 runs):

            751.22 msec task-clock:u              #    0.841 CPUs utilized            ( +-  0.14% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,928      page-faults:u             #    0.017 M/sec                    ( +-  0.02% )
     2,959,142,867      cycles:u                  #    3.939 GHz                      ( +-  0.05% )
     1,350,640,657      stalled-cycles-frontend:u #   45.64% frontend cycles idle     ( +-  0.12% )
       972,612,724      stalled-cycles-backend:u  #   32.87% backend cycles idle      ( +-  0.15% )
     4,865,119,525      instructions:u            #    1.64  insn per cycle        
                                                  #    0.28  stalled cycles per insn  ( +-  0.00% )
     1,035,582,401      branches:u                # 1378.539 M/sec                    ( +-  0.00% )
        17,794,068      branch-misses:u           #    1.72% of all branches          ( +-  0.14% )
     1,206,398,515      L1-dcache-loads:u         # 1605.925 M/sec                    ( +-  0.00% )
        42,095,141      L1-dcache-load-misses:u   #    3.49% of all L1-dcache hits    ( +-  0.05% )
         9,057,830      LLC-loads:u               #   12.058 M/sec                    ( +-  0.39% )
   <not supported>      LLC-load-misses:u                                          

           0.89309 +- 0.00484 seconds time elapsed  ( +-  0.54% )

I also realized that I completely forgot putting the attribute on
emacs_abort. With the _Cold attribute on emacs_abort:


 Performance counter stats for 'src/emacs -f kill-emacs' (20 runs):

            760.73 msec task-clock:u              #    0.846 CPUs utilized            ( +-  0.22% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,925      page-faults:u             #    0.017 M/sec                    ( +-  0.02% )
     2,991,729,656      cycles:u                  #    3.933 GHz                      ( +-  0.04% )
     1,388,332,047      stalled-cycles-frontend:u #   46.41% frontend cycles idle     ( +-  0.13% )
       976,840,303      stalled-cycles-backend:u  #   32.65% backend cycles idle      ( +-  0.17% )
     4,867,077,504      instructions:u            #    1.63  insn per cycle        
                                                  #    0.29  stalled cycles per insn  ( +-  0.00% )
     1,036,158,051      branches:u                # 1362.059 M/sec                    ( +-  0.00% )
        17,860,346      branch-misses:u           #    1.72% of all branches          ( +-  0.19% )
     1,207,924,887      L1-dcache-loads:u         # 1587.852 M/sec                    ( +-  0.00% )
        42,358,754      L1-dcache-load-misses:u   #    3.51% of all L1-dcache hits    ( +-  0.06% )
         9,046,859      LLC-loads:u               #   11.892 M/sec                    ( +-  0.32% )
   <not supported>      LLC-load-misses:u                                          

           0.89896 +- 0.00200 seconds time elapsed  ( +-  0.22% )


 Performance counter stats for 'src/emacs.cold -f kill-emacs' (20 runs):

            755.78 msec task-clock:u              #    0.845 CPUs utilized            ( +-  0.18% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,929      page-faults:u             #    0.017 M/sec                    ( +-  0.02% )
     2,978,556,838      cycles:u                  #    3.941 GHz                      ( +-  0.05% )
     1,370,387,120      stalled-cycles-frontend:u #   46.01% frontend cycles idle     ( +-  0.12% )
       978,514,384      stalled-cycles-backend:u  #   32.85% backend cycles idle      ( +-  0.16% )
     4,866,672,758      instructions:u            #    1.63  insn per cycle        
                                                  #    0.28  stalled cycles per insn  ( +-  0.00% )
     1,035,997,172      branches:u                # 1370.762 M/sec                    ( +-  0.00% )
        17,838,674      branch-misses:u           #    1.72% of all branches          ( +-  0.13% )
     1,206,937,944      L1-dcache-loads:u         # 1596.939 M/sec                    ( +-  0.00% )
        42,110,067      L1-dcache-load-misses:u   #    3.49% of all L1-dcache hits    ( +-  0.05% )
         9,088,714      LLC-loads:u               #   12.026 M/sec                    ( +-  0.28% )
   <not supported>      LLC-load-misses:u                                          

           0.89401 +- 0.00185 seconds time elapsed  ( +-  0.21% )



 Performance counter stats for 'src/emacs.hot-cold -f kill-emacs' (20 runs):

            752.56 msec task-clock:u              #    0.846 CPUs utilized            ( +-  0.18% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
            12,923      page-faults:u             #    0.017 M/sec                    ( +-  0.01% )
     2,973,502,618      cycles:u                  #    3.951 GHz                      ( +-  0.05% )
     1,368,004,926      stalled-cycles-frontend:u #   46.01% frontend cycles idle     ( +-  0.11% )
       974,077,949      stalled-cycles-backend:u  #   32.76% backend cycles idle      ( +-  0.13% )
     4,865,128,800      instructions:u            #    1.64  insn per cycle        
                                                  #    0.28  stalled cycles per insn  ( +-  0.00% )
     1,035,577,546      branches:u                # 1376.070 M/sec                    ( +-  0.00% )
        17,721,035      branch-misses:u           #    1.71% of all branches          ( +-  0.17% )
     1,206,420,627      L1-dcache-loads:u         # 1603.086 M/sec                    ( +-  0.00% )
        42,129,928      L1-dcache-load-misses:u   #    3.49% of all L1-dcache hits    ( +-  0.04% )
         9,033,444      LLC-loads:u               #   12.004 M/sec                    ( +-  0.40% )
   <not supported>      LLC-load-misses:u                                          

           0.88928 +- 0.00161 seconds time elapsed  ( +-  0.18% )



hot-cold.diff (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Stefan Monnier
In reply to this post by Alex Gramiak
> Which again shows a slight improvement with the Cold attributes, and
> still shows the hot attributes degrading performance.  Perhaps I was too
> overzealous with the hot tagging?

I think it's pretty tricky to decide which functions should be "hot", so
I'm not surprised you get worse results there: That matches the past
experience with programmer-annotated likelihood ;-)

For "cold" I think the meaning is pretty clear: assume this code is
almost never executed which can both mean that functions that call it
can be optimized under the assumption that those calls are unlikely, and
at the same time spend less time optimizing the function (and move it
"out of the way" to a different page).

But for "hot", it's not clear whether it means "called very often" or
"we spend a lot of time inside of it".  E.g. it would never have
occurred to me to mark  redisplay_internal as "hot".


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Konstantin Kharlamov
In reply to this post by Alex Gramiak
FWIW I was in a similar search not so long ago, and I was told that
e.g. "cold" attribute can sometimes produce unbearably slow code
https://gcc.gnu.org/ml/gcc-help/2019-01/msg00035.html

В Вт, апр 16, 2019 at 14:50, Alex Gramiak <[hidden email]>
написал:

> Paul Eggert <[hidden email]> writes:
>
>>  That being said, it might make sense for a few
>> obviously-rarely-called
>>  functions like 'emacs-abort' to be marked with __attribute__
>> ((cold)),
>>  so long as we don't turn this into a mission to mark all cold
>> functions
>>  (which would cost us more than it would benefit). That is what GCC
>>  itself does, with its own functions. However, I'd like to see
>>  performance figures. Could you try it out on the benchmark of 'cd
>> lisp
>>  && time make compile-always'?
>
> Right, I agree that if used, they should be used sparingly. I tested
> three versions a few times each with both 'make' and 'make -j4':
>
> a) Regular Emacs master.
> b) The below diff with only the _Cold attribute
> c) The below diff with both _Cold and _Hot attributes
>
> a) Normal
> real    4:17.97s
> user    3:57.18s
> sys     20.394s
>
> real    1:17.67s
> user    4:23.78s
> sys     18.888s
>
> b) Cold
> real    4:10.92s
> user    3:50.34s
> sys     20.178s
>
> real    1:15.77s
> user    4:16.73s
> sys     18.943s
>
> c) Hot/Cold
> real    4:11.43s
> user    3:51.07s
> sys     19.961s
>
> real    1:16.01s
> user    4:17.63s
> sys     18.662s
>
> So not much of a difference. For some reason the Hot/Cold performed
> consistently worse than Cold.
>
> I also tested startup/shutdown with perf:
>
>  Performance counter stats for '../emacs-normal -f kill-emacs' (20
> runs):
>
>             762.17 msec task-clock:u              #    0.844 CPUs
> utilized            ( +-  0.23% )
>                  0      context-switches:u        #    0.000 K/sec
>                  0      cpu-migrations:u          #    0.000 K/sec
>             12,941      page-faults:u             #    0.017 M/sec    
>                 ( +-  0.01% )
>      2,998,322,125      cycles:u                  #    3.934 GHz      
>                 ( +-  0.06% )
>      1,392,869,413      stalled-cycles-frontend:u #   46.45% frontend
> cycles idle     ( +-  0.15% )
>        982,206,843      stalled-cycles-backend:u  #   32.76% backend
> cycles idle      ( +-  0.18% )
>      4,874,186,825      instructions:u            #    1.63  insn per
> cycle
>                                                   #    0.29  stalled
> cycles per insn  ( +-  0.01% )
>      1,037,929,374      branches:u                # 1361.802 M/sec    
>                 ( +-  0.01% )
>         17,930,471      branch-misses:u           #    1.73% of all
> branches          ( +-  0.16% )
>      1,209,539,215      L1-dcache-loads:u         # 1586.960 M/sec    
>                 ( +-  0.01% )
>         42,346,229      L1-dcache-load-misses:u   #    3.50% of all
> L1-dcache hits    ( +-  0.05% )
>          9,088,647      LLC-loads:u               #   11.925 M/sec    
>                 ( +-  0.29% )
>    <not supported>      LLC-load-misses:u
>
>            0.90325 +- 0.00441 seconds time elapsed  ( +-  0.49% )
>
>
>
>  Performance counter stats for '../emacs.cold -f kill-emacs' (20
> runs):
>
>             755.94 msec task-clock:u              #    0.845 CPUs
> utilized            ( +-  0.24% )
>                  0      context-switches:u        #    0.000 K/sec
>                  0      cpu-migrations:u          #    0.000 K/sec
>             12,941      page-faults:u             #    0.017 M/sec    
>                 ( +-  0.01% )
>      2,976,036,365      cycles:u                  #    3.937 GHz      
>                 ( +-  0.06% )
>      1,374,451,779      stalled-cycles-frontend:u #   46.18% frontend
> cycles idle     ( +-  0.14% )
>        990,227,732      stalled-cycles-backend:u  #   33.27% backend
> cycles idle      ( +-  0.18% )
>      4,878,661,927      instructions:u            #    1.64  insn per
> cycle
>                                                   #    0.28  stalled
> cycles per insn  ( +-  0.00% )
>      1,038,495,525      branches:u                # 1373.782 M/sec    
>                 ( +-  0.00% )
>         17,859,906      branch-misses:u           #    1.72% of all
> branches          ( +-  0.16% )
>      1,209,345,531      L1-dcache-loads:u         # 1599.792 M/sec    
>                 ( +-  0.00% )
>         42,444,358      L1-dcache-load-misses:u   #    3.51% of all
> L1-dcache hits    ( +-  0.06% )
>          9,204,368      LLC-loads:u               #   12.176 M/sec    
>                 ( +-  0.41% )
>    <not supported>      LLC-load-misses:u
>
>            0.89430 +- 0.00217 seconds time elapsed  ( +-  0.24% )
>
>
>  Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20
> runs):
>
>             761.97 msec task-clock:u              #    0.845 CPUs
> utilized            ( +-  0.20% )
>                  0      context-switches:u        #    0.000 K/sec
>                  0      cpu-migrations:u          #    0.000 K/sec
>             12,947      page-faults:u             #    0.017 M/sec    
>                 ( +-  0.01% )
>      2,989,750,359      cycles:u                  #    3.924 GHz      
>                 ( +-  0.04% )
>      1,383,312,275      stalled-cycles-frontend:u #   46.27% frontend
> cycles idle     ( +-  0.12% )
>        994,643,853      stalled-cycles-backend:u  #   33.27% backend
> cycles idle      ( +-  0.13% )
>      4,879,318,990      instructions:u            #    1.63  insn per
> cycle
>                                                   #    0.28  stalled
> cycles per insn  ( +-  0.00% )
>      1,038,584,045      branches:u                # 1363.022 M/sec    
>                 ( +-  0.00% )
>         17,863,736      branch-misses:u           #    1.72% of all
> branches          ( +-  0.13% )
>      1,209,327,347      L1-dcache-loads:u         # 1587.103 M/sec    
>                 ( +-  0.00% )
>         42,501,374      L1-dcache-load-misses:u   #    3.51% of all
> L1-dcache hits    ( +-  0.05% )
>          9,201,311      LLC-loads:u               #   12.076 M/sec    
>                 ( +-  0.28% )
>    <not supported>      LLC-load-misses:u
>
>            0.90132 +- 0.00201 seconds time elapsed  ( +-  0.22% )
>
>
> Which again shows a slight improvement with the Cold attributes, and
> still shows the hot attributes degrading performance. Perhaps I was
> too
> overzealous with the hot tagging?
>



Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Paul Eggert
Konstantin Kharlamov wrote:
> I was told that e.g. "cold"
> attribute can sometimes produce unbearably slow code
> https://gcc.gnu.org/ml/gcc-help/2019-01/msg00035.html

Although cold functions can be slow, it appears that overall it's a win for
Emacs to mark _Noreturn error function declarations as cold: on my platform,
'make compile-always' ran about 1.3% faster. So I installed the attached patch
into master. (Like Stefan, I'm wary of marking functions 'hot' so I didn't do that.)

This patch also adds a convenience macro AVOID for the now-common pattern
'_Noreturn ATTRIBUTE_COLD void'.

0001-Mark-_Noreturn-error-functions-as-cold.txt (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using __builtin_expect (likely/unlikely macros)

Konstantin Kharlamov
Thanks. I wonder if "AVOID" macro may sound more clear about its
intended use as "NEVER_RETURN" or "DIE"?

On Чт, Apr 18, 2019 at 01:25, Paul Eggert <[hidden email]> wrote:

> Konstantin Kharlamov wrote:
>> I was told that e.g. "cold" attribute can sometimes produce
>> unbearably slow code
>> https://gcc.gnu.org/ml/gcc-help/2019-01/msg00035.html
>
> Although cold functions can be slow, it appears that overall it's a
> win for Emacs to mark _Noreturn error function declarations as cold:
> on my platform, 'make compile-always' ran about 1.3% faster. So I
> installed the attached patch into master. (Like Stefan, I'm wary of
> marking functions 'hot' so I didn't do that.)
>
> This patch also adds a convenience macro AVOID for the now-common
> pattern '_Noreturn ATTRIBUTE_COLD void'.



12