bug#34757: Invalid bytecode from byte compiler

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

chuntaro
Invalid bytecode is output and error occurs when executed.

Byte compile the following code.

bug.el
----------------------------------------------------------------
;;; -*- lexical-binding: t; -*-

(let ((a 2))
  (print 1)
  (setq a 1))
----------------------------------------------------------------

$ emacs --batch -f batch-byte-compile bug.el

bug.elc
----------------------------------------------------------------
;ELC
...
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


(funcall 2 'print 1)
1
----------------------------------------------------------------

$ emacs --batch -l bug.elc
Invalid function: 2

It occurs in versions 25.2, 26.1 and HEAD.



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Eli Zaretskii
> From: chuntaro <[hidden email]>
> Date: Tue, 5 Mar 2019 17:01:00 +0900
>
> Invalid bytecode is output and error occurs when executed.
>
> Byte compile the following code.
>
> bug.el
> ----------------------------------------------------------------
> ;;; -*- lexical-binding: t; -*-
>
> (let ((a 2))
>   (print 1)
>   (setq a 1))
> ----------------------------------------------------------------
>
> $ emacs --batch -f batch-byte-compile bug.el
>
> bug.elc
> ----------------------------------------------------------------
> ;ELC
> ...
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>
> (funcall 2 'print 1)
> 1
> ----------------------------------------------------------------
>
> $ emacs --batch -l bug.elc
> Invalid function: 2

Why is that a problem?  The Lisp code is invalid, and the error
message says that much.



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Michael Heerdegen
Eli Zaretskii <[hidden email]> writes:

> > From: chuntaro <[hidden email]>
> > Date: Tue, 5 Mar 2019 17:01:00 +0900
> >
> > Invalid bytecode is output and error occurs when executed.
> >
> > Byte compile the following code.
> >
> > bug.el
> > ----------------------------------------------------------------
> > ;;; -*- lexical-binding: t; -*-
> >
> > (let ((a 2))
> >   (print 1)
> >   (setq a 1))
> > ----------------------------------------------------------------
> >
> > $ emacs --batch -f batch-byte-compile bug.el
> >
> > bug.elc
> > ----------------------------------------------------------------
> > ;ELC
> > ...
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >
> >
> > (funcall 2 'print 1)
> > 1
> > ----------------------------------------------------------------
> >
> > $ emacs --batch -l bug.elc
> > Invalid function: 2
>
> Why is that a problem?  The Lisp code is invalid, and the error
> message says that much.

Please take a closer look: The Lisp code in the compiled file is
invalid, but the code in the .el isn't.

Michael.



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Eli Zaretskii
> From: Michael Heerdegen <[hidden email]>
> Cc: chuntaro <[hidden email]>,  [hidden email]
> Date: Fri, 08 Mar 2019 14:50:33 +0100
>
> Please take a closer look: The Lisp code in the compiled file is
> invalid, but the code in the .el isn't.

Right, sorry for my confusion.



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Pip Cet
In reply to this post by chuntaro
On Tue, Mar 5, 2019 at 3:30 PM chuntaro <[hidden email]> wrote:
>
> Invalid bytecode is output and error occurs when executed.

If I'm looking at this correctly, the problem is that the byte code
which is generated in an intermediate step:

0       constant  2
1       constant  print
2       constant  1
3       call      1
4       discard
5       constant  3
6       return

is considered a "trivial function" by byte-compile-out-toplevel, which
assumes that all values on the stack are used by the call.

We could fix byte-compile-out-toplevel to properly analyze how many
stack arguments the call takes, but this patch simply treats forms
like this as nontrivial:

diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index 0b8f8824b4c..4e54e08ce14 100644
--- a/lisp/emacs-lisp/bytecomp.el
+++ b/lisp/emacs-lisp/bytecomp.el
@@ -3025,6 +3025,7 @@ byte-compile-out-toplevel
                        (or (null (cdr rest))
                            (and (memq output-type '(file progn t))
                                 (cdr (cdr rest))
+                                (eql (length body) (cdr (car rest)))
                                 (eq (car (nth 1 rest)) 'byte-discard)
                                 (progn (setq rest (cdr rest)) t))))
                   (setq maycall nil)    ; Only allow one real function call.

emacs-34757.diff (922 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Eli Zaretskii
> From: Pip Cet <[hidden email]>
> Date: Fri, 8 Mar 2019 21:13:37 +0000
> Cc: [hidden email]
>
> On Tue, Mar 5, 2019 at 3:30 PM chuntaro <[hidden email]> wrote:
> >
> > Invalid bytecode is output and error occurs when executed.
>
> If I'm looking at this correctly, the problem is that the byte code
> which is generated in an intermediate step:
>
> 0       constant  2
> 1       constant  print
> 2       constant  1
> 3       call      1
> 4       discard
> 5       constant  3
> 6       return
>
> is considered a "trivial function" by byte-compile-out-toplevel, which
> assumes that all values on the stack are used by the call.
>
> We could fix byte-compile-out-toplevel to properly analyze how many
> stack arguments the call takes, but this patch simply treats forms
> like this as nontrivial:
>
> diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
> index 0b8f8824b4c..4e54e08ce14 100644
> --- a/lisp/emacs-lisp/bytecomp.el
> +++ b/lisp/emacs-lisp/bytecomp.el
> @@ -3025,6 +3025,7 @@ byte-compile-out-toplevel
>                         (or (null (cdr rest))
>                             (and (memq output-type '(file progn t))
>                                  (cdr (cdr rest))
> +                                (eql (length body) (cdr (car rest)))
>                                  (eq (car (nth 1 rest)) 'byte-discard)
>                                  (progn (setq rest (cdr rest)) t))))
>                    (setq maycall nil)    ; Only allow one real function call.

Stefan, any comments?



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Stefan Monnier
>> diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
>> index 0b8f8824b4c..4e54e08ce14 100644
>> --- a/lisp/emacs-lisp/bytecomp.el
>> +++ b/lisp/emacs-lisp/bytecomp.el
>> @@ -3025,6 +3025,7 @@ byte-compile-out-toplevel
>>                         (or (null (cdr rest))
>>                             (and (memq output-type '(file progn t))
>>                                  (cdr (cdr rest))
>> +                                (eql (length body) (cdr (car rest)))
>>                                  (eq (car (nth 1 rest)) 'byte-discard)
>>                                  (progn (setq rest (cdr rest)) t))))
>>                    (setq maycall nil)    ; Only allow one real function call.
>
> Stefan, any comments?

Looks good to me, thanks.

I'm personally running with the following patch instead, which is much
more risky and hasn't been sufficiently tested yet.


        Stefan


diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index f46cab2c17..573b0489d4 100644
*** a/lisp/emacs-lisp/bytecomp.el
--- b/lisp/emacs-lisp/bytecomp.el
***************
*** 2990,3059 ****
        (setq byte-compile-output
     (byte-optimize-lapcode byte-compile-output)))
 
!   ;; Decompile trivial functions:
!   ;; only constants and variables, or a single funcall except in lambdas.
!   ;; Except for Lisp_Compiled objects, forms like (foo "hi")
!   ;; are still quicker than (byte-code "..." [foo "hi"] 2).
!   ;; Note that even (quote foo) must be parsed just as any subr by the
!   ;; interpreter, so quote should be compiled into byte-code in some contexts.
!   ;; What to leave uncompiled:
!   ;; lambda -> never.  we used to leave it uncompiled if the body was
!   ;;   a single atom, but that causes confusion if the docstring
!   ;;   uses the (file . pos) syntax.  Besides, now that we have
!   ;;   the Lisp_Compiled type, the compiled form is faster.
!   ;; eval -> atom, quote or (function atom atom atom)
!   ;; progn -> as <<same-as-eval>> or (progn <<same-as-eval>> atom)
!   ;; file -> as progn, but takes both quotes and atoms, and longer forms.
!   (let (rest
! (maycall (not (eq output-type 'lambda))) ; t if we may make a funcall.
! tmp body)
!     (cond
!      ;; #### This should be split out into byte-compile-nontrivial-function-p.
!      ((or (eq output-type 'lambda)
!  (nthcdr (if (eq output-type 'file) 50 8) byte-compile-output)
!  (assq 'TAG byte-compile-output) ; Not necessary, but speeds up a bit.
!  (not (setq tmp (assq 'byte-return byte-compile-output)))
!  (progn
!    (setq rest (nreverse
! (cdr (memq tmp (reverse byte-compile-output)))))
!    (while
!                 (cond
!                  ((memq (car (car rest)) '(byte-varref byte-constant))
!                   (setq tmp (car (cdr (car rest))))
!                   (if (if (eq (car (car rest)) 'byte-constant)
!                           (or (consp tmp)
!                               (and (symbolp tmp)
!                                    (not (macroexp--const-symbol-p tmp)))))
!                       (if maycall
!                           (setq body (cons (list 'quote tmp) body)))
!                     (setq body (cons tmp body))))
!                  ((and maycall
!                        ;; Allow a funcall if at most one atom follows it.
!                        (null (nthcdr 3 rest))
!                        (setq tmp (get (car (car rest)) 'byte-opcode-invert))
!                        (or (null (cdr rest))
!                            (and (memq output-type '(file progn t))
!                                 (cdr (cdr rest))
!                                 (eq (car (nth 1 rest)) 'byte-discard)
!                                 (progn (setq rest (cdr rest)) t))))
!                   (setq maycall nil) ; Only allow one real function call.
!                   (setq body (nreverse body))
!                   (setq body (list
!                               (if (and (eq tmp 'funcall)
!                                        (eq (car-safe (car body)) 'quote)
!       (symbolp (nth 1 (car body))))
!                                   (cons (nth 1 (car body)) (cdr body))
!                                 (cons tmp body))))
!                   (or (eq output-type 'file)
!                       (not (delq nil (mapcar 'consp (cdr (car body))))))))
!      (setq rest (cdr rest)))
!    rest))
!       (let ((byte-compile-vector (byte-compile-constants-vector)))
! (list 'byte-code (byte-compile-lapcode byte-compile-output)
!      byte-compile-vector byte-compile-maxdepth)))
!      ;; it's a trivial function
!      ((cdr body) (cons 'progn (nreverse body)))
!      ((car body)))))
 
  ;; Given BODY, compile it and return a new body.
  (defun byte-compile-top-level-body (body &optional for-effect)
--- 2993,3013 ----
        (setq byte-compile-output
     (byte-optimize-lapcode byte-compile-output)))
 
!   (cond
!    ((and (not (eq output-type 'lambda)) ;;The caller really wants (byte-code ...)
!          (null (cddr byte-compile-output))
!          (eq (car (nth 0 byte-compile-output)) 'byte-constant)
!          (eq (car (nth 1 byte-compile-output)) 'byte-return))
!     ;; Special case for trivial code returning a function.
!     ;; This is so that when compiling #'(lambda ...) we return
!     ;; a #[...] byte-code object instead of a (byte-code "...")
!     ;; expression that returns this object.  It's not indispensable,
!     ;; but it's cleaner.
!     (macroexp-quote (cadr (nth 0 byte-compile-output))))
!    (t
!     (let ((byte-compile-vector (byte-compile-constants-vector)))
!       (list 'byte-code (byte-compile-lapcode byte-compile-output)
!    byte-compile-vector byte-compile-maxdepth)))))
 
  ;; Given BODY, compile it and return a new body.
  (defun byte-compile-top-level-body (body &optional for-effect)



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Pip Cet
On Fri, Mar 15, 2019 at 2:08 PM Stefan Monnier <[hidden email]> wrote:

> >> diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
> >> index 0b8f8824b4c..4e54e08ce14 100644
> >> --- a/lisp/emacs-lisp/bytecomp.el
> >> +++ b/lisp/emacs-lisp/bytecomp.el
> >> @@ -3025,6 +3025,7 @@ byte-compile-out-toplevel
> >>                         (or (null (cdr rest))
> >>                             (and (memq output-type '(file progn t))
> >>                                  (cdr (cdr rest))
> >> +                                (eql (length body) (cdr (car rest)))
> >>                                  (eq (car (nth 1 rest)) 'byte-discard)
> >>                                  (progn (setq rest (cdr rest)) t))))
> >>                    (setq maycall nil)    ; Only allow one real function call.
> >
> > Stefan, any comments?
>
> Looks good to me, thanks.
>
> I'm personally running with the following patch instead, which is much
> more risky and hasn't been sufficiently tested yet.

Just to be sure I understand correctly, you would like to remove the
decompilation of trivial function calls entirely?

That seems good to me.

It seems the special case is necessary to avoid compilation errors,
and that it's used for (byte-compile 3), so I think the comment could
be improved a little.

I'm not sure this case can actually happen without doing something
silly, but (byte-compile '(internal-get-closed-var 0)) throws an error
with Stefan's patch, because the byte code is (byte-constant . 0)
(byte-return).



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Stefan Monnier
> Just to be sure I understand correctly, you would like to remove the
> decompilation of trivial function calls entirely?

Yes, tho the main motivation was to try and figure out what the
decompilation is useful for.

This only affects "open code" (i.e. not the content of functions, which
are already never decompiled), so the impact should be minor and it
removes a rather complicated and brittle chunk of code whose purpose is
not clear.  It was originally introduced when we didn't have
byte-compiled function objects, so its main purpose was one of
performance, to avoid pessimizing the code by replacing trivial function
calls with more costly (byte-code "...") expressions but nowadays such
(byte-code "...") expressions only occur basically at the top-level of
.elc files where such pessimization would be unnoticeable in terms
of performance.

It does impact the readability of .elc files, OTOH, so I'm not
completely happy with the result when considering the few cases where
I was happy to be able to make sense of a .elc file to better understand
the source of a problem (after all, that's why I wrote the
elisp-byte-code-mode).

> It seems the special case is necessary to avoid compilation errors,

I haven't found it to be really necessary, no.

> and that it's used for (byte-compile 3), so I think the comment could
> be improved a little.

(byte-compile 3) seems to work correctly here even without the
special case.  It returns (byte-code "\300\207" [3] 1) which is indeed
a correct expression that evaluates to 3 (just like the argument to
`byte-compile` was an expression whose evaluation returns 3).

Let's not forget that what `byte-compile` tries to do is to preserve the
invariant that

    (eval EXP) ≃ (eval (byte-compile EXP))

This said, if you remove the special case, you will bump into
a corner-case bug in `byte-compile` which happens because that function
incorrectly considers that `byte-compile-top-level` returns a value
whereas in reality it returns an expression (just like `byte-compile`):
the decompilation happens to turn expressions that return constant
values (like byte-compiled functions) into their value (as an
optimization which relies on the fact that those objects are
self-evaluating), so if you remove it, you then bump into this bug of
byte-compile.  The patch below would fix this bug.

But indeed the decompilation of constants is handy since that's what
people expect from `byte-compile`.  When I (byte-compile '(lambda (x)
(foo))) I expect to receive a byte-compiled function, and not
a byte-code expression which when evaluated will return that
byte-compiled function.

> I'm not sure this case can actually happen without doing something
> silly, but (byte-compile '(internal-get-closed-var 0)) throws an error
> with Stefan's patch, because the byte code is (byte-constant . 0)
> (byte-return).

This source code is arguably invalid, so it's not a real problem, but
indeed we should burp in that case.  I can't remember enough of how
internal-get-closed-var is handled to know where'd the bug be, offhand.


        Stefan


diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
index f46cab2c17..ae17553d0c 100644
--- a/lisp/emacs-lisp/bytecomp.el
+++ b/lisp/emacs-lisp/bytecomp.el
@@ -2674,7 +2674,11 @@ byte-compile
          (setq fun (byte-compile-top-level fun nil 'eval)))
         (if macro (push 'macro fun))
         (if (symbolp form)
-            (fset form fun)
+            ;; byte-compile returns an *expression* equivalent to the
+            ;; `fun' expression, so we need to evaluate it, tho normally
+            ;; this is not needed because the expression is just a constant
+            ;; byte-code object, which is self-evaluating.
+            (fset form (eval fun t))
           fun)))))))
 
 (defun byte-compile-sexp (sexp)



Reply | Threaded
Open this post in threaded view
|

bug#34757: Invalid bytecode from byte compiler

Pip Cet
On Fri, Mar 15, 2019 at 8:30 PM Stefan Monnier <[hidden email]> wrote:
>
> > Just to be sure I understand correctly, you would like to remove the
> > decompilation of trivial function calls entirely?
>
> Yes, tho the main motivation was to try and figure out what the
> decompilation is useful for.

Thanks for explaining!

> This only affects "open code" (i.e. not the content of functions, which
> are already never decompiled), so the impact should be minor and it
> removes a rather complicated and brittle chunk of code whose purpose is
> not clear.  It was originally introduced when we didn't have
> byte-compiled function objects, so its main purpose was one of
> performance, to avoid pessimizing the code by replacing trivial function
> calls with more costly (byte-code "...") expressions but nowadays such
> (byte-code "...") expressions only occur basically at the top-level of
> .elc files where such pessimization would be unnoticeable in terms
> of performance.

I agree completely, for what it's worth.

> It does impact the readability of .elc files, OTOH, so I'm not
> completely happy with the result when considering the few cases where
> I was happy to be able to make sense of a .elc file to better understand
> the source of a problem (after all, that's why I wrote the
> elisp-byte-code-mode).

I can speak only for myself, but I think the byte-compiler "magically"
deciding to turn (rare) top-level non-defuns into plain code rather
than byte code is confusing. It's much better with your patches.

> > It seems the special case is necessary to avoid compilation errors,
>
> I haven't found it to be really necessary, no.

Well, you fixed it with the second patch.

> > and that it's used for (byte-compile 3), so I think the comment could
> > be improved a little.
>
> (byte-compile 3) seems to work correctly here even without the
> special case.  It returns (byte-code "\300\207" [3] 1) which is indeed
> a correct expression that evaluates to 3 (just like the argument to
> `byte-compile` was an expression whose evaluation returns 3).

No argument here. The case branch affects what happens when
(byte-compile 3) is called, but isn't necessary for the result to be
correct, and the comment can be misread to imply it's irrelevant in
this case.

> Let's not forget that what `byte-compile` tries to do is to preserve the
> invariant that
>
>     (eval EXP) ≃ (eval (byte-compile EXP))

I think byte-compile does different things for different arguments:
the behavior for symbols and other expressions is simply different.

> This said, if you remove the special case, you will bump into
> a corner-case bug in `byte-compile` which happens because that function
> incorrectly considers that `byte-compile-top-level` returns a value
> whereas in reality it returns an expression (just like `byte-compile`):
> the decompilation happens to turn expressions that return constant
> values (like byte-compiled functions) into their value (as an
> optimization which relies on the fact that those objects are
> self-evaluating), so if you remove it, you then bump into this bug of
> byte-compile.  The patch below would fix this bug.

I don't think that was a bug, but it was an unfortunate wrinkle in the
(undocumented) API of byte-compile-top-level.

> But indeed the decompilation of constants is handy since that's what
> people expect from `byte-compile`.  When I (byte-compile '(lambda (x)
> (foo))) I expect to receive a byte-compiled function, and not
> a byte-code expression which when evaluated will return that
> byte-compiled function.

I think it's more than handy: it's how I'd read the current
documentation, and how I'd expect a function called byte-compile to
behave.

> > I'm not sure this case can actually happen without doing something
> > silly, but (byte-compile '(internal-get-closed-var 0)) throws an error
> > with Stefan's patch, because the byte code is (byte-constant . 0)
> > (byte-return).
>
> This source code is arguably invalid, so it's not a real problem, but

The source code is invalid, but the LAP code is valid-looking, and I
can't conclude it cannot be generated by valid source code being
passed to `byte-compile' somehow.

> diff --git a/lisp/emacs-lisp/bytecomp.el b/lisp/emacs-lisp/bytecomp.el
> index f46cab2c17..ae17553d0c 100644
> --- a/lisp/emacs-lisp/bytecomp.el
> +++ b/lisp/emacs-lisp/bytecomp.el
> @@ -2674,7 +2674,11 @@ byte-compile
>           (setq fun (byte-compile-top-level fun nil 'eval)))
>          (if macro (push 'macro fun))
>          (if (symbolp form)
> -            (fset form fun)
> +            ;; byte-compile returns an *expression* equivalent to the

I think this would be clearer if it read "byte-compile-top-level
returns an *expression*..."