bug#43439: [PATCH] doprnt improvements

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Paul Eggert
Improve doprnt performance, internal checking, and internal
documentation.  On my platform (Ubuntu 18.04.5 x86-64), this
improved CPU speed of ‘make -C lisp compile-always’ by 6%.
This patch implements some of my suggestions in Bug#8545,
with further changes suggested by Eli Zaretskii.
* src/doprnt.c: Improve comments.
(SIZE_BOUND_EXTRA): Now at top level, for parse_format_integer.
(parse_format_integer): New static function, containing some of
the old doprint.  Fix a bug that caused doprnt to infloop on
formats like "%10s" that Emacs does not use.  We could simplify
doprnt further if we dropped support for these never-used formats.
(doprnt): Omit FORMAT_END argument, since it’s always NULL,
which means doprnt must call strlen on FORMAT; doing this means
doprnt needs just one pass over FORMAT, not two.  All callers changed.
Assume C99 to make code clearer.  Do not use malloc or alloca
to allocate a copy of the format FMTCPY; instead, use a small
fixed-size array FMTSTAR, and use '*' in that array to represent
width and precision, passing them as separate int arguments.
Use eassume to pacify GCC in switch statements.  Drop support for
"%S" which is never used and which would cause GCC to warn anyway.
* src/lisp.h (doprnt): Give it ATTRIBUTE_FORMAT_PRINTF (3, 0),
since GCC can grok doprnt's new API.
---
 src/doprnt.c | 223 ++++++++++++++++++++++++++-------------------------
 src/lisp.h   |   4 +-
 src/sysdep.c |   2 +-
 src/xdisp.c  |   5 +-
 4 files changed, 117 insertions(+), 117 deletions(-)

diff --git a/src/doprnt.c b/src/doprnt.c
index b0ba12552b..f154578c0d 100644
--- a/src/doprnt.c
+++ b/src/doprnt.c
@@ -28,6 +28,7 @@
    . For %s and %c, when field width is specified (e.g., %25s), it accounts for
      the display width of each character, according to char-width-table.  That
      is, it does not assume that each character takes one column on display.
+     Nor does it assume that each character is a single byte.
 
    . If the size of the buffer is not enough to produce the formatted string in
      its entirety, it makes sure that truncation does not chop the last
@@ -42,38 +43,41 @@
      Emacs can handle.
 
    OTOH, this function supports only a small subset of the standard C formatted
-   output facilities.  E.g., %u and %ll are not supported, and precision is
-   ignored %s and %c conversions.  (See below for the detailed documentation of
-   what is supported.)  However, this is okay, as this function is supposed to
-   be called from `error' and similar functions, and thus does not need to
-   support features beyond those in `Fformat_message', which is used
-   by `error' on the Lisp level.  */
+   output facilities.  E.g., %u is not supported, precision is ignored
+   in %s and %c conversions, and although %lld often works it is not
+   supported and code should use something like %"pM"d with intmax_t instead.
+   (See below for the detailed documentation of what is supported.)
+   However, this is okay, as this function is supposed to be called
+   from 'error' and similar C functions, and thus does not need to
+   support all the features of 'Fformat_message', which is used by the
+   Lisp 'error' function.  */
 
 /* In the FORMAT argument this function supports ` and ' as directives
    that output left and right quotes as per ‘text-quoting style’.  It
    also supports the following %-sequences:
 
    %s means print a string argument.
-   %S is treated as %s, for loose compatibility with `Fformat_message'.
    %d means print a `signed int' argument in decimal.
    %o means print an `unsigned int' argument in octal.
    %x means print an `unsigned int' argument in hex.
    %e means print a `double' argument in exponential notation.
    %f means print a `double' argument in decimal-point notation.
    %g means print a `double' argument in exponential notation
-      or in decimal-point notation, whichever uses fewer characters.
+      or in decimal-point notation, depending on the value;
+      this is often (though not always) the shorter of the two notations
    %c means print a `signed int' argument as a single character.
    %% means produce a literal % character.
 
-   A %-sequence may contain optional flag, width, and precision specifiers, and
-   a length modifier, as follows:
+   A %-sequence other than %% may contain optional flags, width, precision,
+   and length, as follows:
 
      %<flags><width><precision><length>character
 
    where flags is [+ -0], width is [0-9]+, precision is .[0-9]+, and length
    is empty or l or the value of the pD or pI or PRIdMAX (sans "d") macros.
-   Also, %% in a format stands for a single % in the output.  A % that
-   does not introduce a valid %-sequence causes undefined behavior.
+   A % that does not introduce a valid %-sequence causes undefined behavior.
+   ASCII bytes in FORMAT other than % are copied through as-is;
+   non-ASCII bytes should not appear in FORMAT.
 
    The + flag character inserts a + before any positive number, while a space
    inserts a space before any positive number; these flags only affect %d, %o,
@@ -99,7 +103,9 @@
 
    For %e, %f, and %g sequences, the number after the "." in the precision
    specifier says how many decimal places to show; if zero, the decimal point
-   itself is omitted.  For %s and %S, the precision specifier is ignored.  */
+   itself is omitted.  For %d, %o, and %x sequences, the precision specifies
+   the minimum number of digits to appear.  Precision specifiers are
+   not supported for other %-sequences.  */
 
 #include <config.h>
 #include <stdio.h>
@@ -115,9 +121,29 @@
    another macro.  */
 #include "character.h"
 
-/* Generate output from a format-spec FORMAT,
-   terminated at position FORMAT_END.
-   (*FORMAT_END is not part of the format, but must exist and be readable.)
+/* Enough to handle floating point formats with large numbers.  */
+enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
+
+/* Parse FMT as an unsigned decimal integer, putting its value into *VALUE.
+   Return the address of the first byte after the integer.
+   If FMT is not an integer, return FMT and store zero into *VALUE.  */
+static char const *
+parse_format_integer (char const *fmt, int *value)
+{
+  int n = 0;
+  bool overflow = false;
+  for (; '0' <= *fmt && *fmt <= '9'; fmt++)
+    {
+      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
+      overflow |= INT_ADD_WRAPV (n, *fmt - '0', &n);
+    }
+  if (overflow || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
+    error ("Format width or precision too large");
+  *value = n;
+  return fmt;
+}
+
+/* Generate output from a format-spec FORMAT.
    Output goes in BUFFER, which has room for BUFSIZE chars.
    BUFSIZE must be positive.  If the output does not fit, truncate it
    to fit and return BUFSIZE - 1; if this truncates a multibyte
@@ -128,15 +154,11 @@
    Integers are passed as C integers.  */
 
 ptrdiff_t
-doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
- const char *format_end, va_list ap)
+doprnt (char *buffer, ptrdiff_t bufsize, const char *format, va_list ap)
 {
   const char *fmt = format; /* Pointer into format string.  */
   char *bufptr = buffer; /* Pointer into output buffer.  */
 
-  /* Enough to handle floating point formats with large numbers.  */
-  enum { SIZE_BOUND_EXTRA = DBL_MAX_10_EXP + 50 };
-
   /* Use this for sprintf unless we need something really big.  */
   char tembuf[SIZE_BOUND_EXTRA + 50];
 
@@ -150,103 +172,91 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   char *big_buffer = NULL;
 
   enum text_quoting_style quoting_style = text_quoting_style ();
-  ptrdiff_t tem = -1;
-  char *string;
-  char fixed_buffer[20]; /* Default buffer for small formatting. */
-  char *fmtcpy;
-  int minlen;
-  char charbuf[MAX_MULTIBYTE_LENGTH + 1]; /* Used for %c.  */
-  USE_SAFE_ALLOCA;
-
-  if (format_end == 0)
-    format_end = format + strlen (format);
-
-  fmtcpy = (format_end - format < sizeof (fixed_buffer) - 1
-    ? fixed_buffer
-    : SAFE_ALLOCA (format_end - format + 1));
 
   bufsize--;
 
   /* Loop until end of format string or buffer full. */
-  while (fmt < format_end && bufsize > 0)
+  while (*fmt && bufsize > 0)
     {
       char const *fmt0 = fmt;
       char fmtchar = *fmt++;
       if (fmtchar == '%')
  {
-  ptrdiff_t size_bound = 0;
   ptrdiff_t width;  /* Columns occupied by STRING on display.  */
   enum {
     pDlen = sizeof pD - 1,
     pIlen = sizeof pI - 1,
-    pMlen = sizeof PRIdMAX - 2
+    pMlen = sizeof PRIdMAX - 2,
+    maxmlen = max (max (1, pDlen), max (pIlen, pMlen))
   };
   enum {
     no_modifier, long_modifier, pD_modifier, pI_modifier, pM_modifier
   } length_modifier = no_modifier;
   static char const modifier_len[] = { 0, 1, pDlen, pIlen, pMlen };
-  int maxmlen = max (max (1, pDlen), max (pIlen, pMlen));
   int mlen;
+  char charbuf[MAX_MULTIBYTE_LENGTH + 1]; /* Used for %c.  */
 
-  /* Copy this one %-spec into fmtcpy.  */
-  string = fmtcpy;
+  /* Width and precision specified by this %-sequence.  */
+  int wid = 0, prec = -1;
+
+  /* FMTSTAR will be a "%*.*X"-like version of this %-sequence.
+     Start by putting '%' into FMTSTAR.  */
+  char fmtstar[sizeof "%-+ 0*.*d" + maxmlen];
+  char *string = fmtstar;
   *string++ = '%';
-  while (fmt < format_end)
+
+  /* Copy at most one instance of each flag into FMTSTAR.  */
+  bool minusflag = false, plusflag = false, zeroflag = false,
+    spaceflag = false;
+  for (;; fmt++)
     {
-      *string++ = *fmt;
-      if ('0' <= *fmt && *fmt <= '9')
+      *string = *fmt;
+      switch (*fmt)
  {
-  /* Get an idea of how much space we might need.
-     This might be a field width or a precision; e.g.
-     %1.1000f and %1000.1f both might need 1000+ bytes.
-     Parse the width or precision, checking for overflow.  */
-  int n = *fmt - '0';
-  bool overflow = false;
-  while (fmt + 1 < format_end
- && '0' <= fmt[1] && fmt[1] <= '9')
-    {
-      overflow |= INT_MULTIPLY_WRAPV (n, 10, &n);
-      overflow |= INT_ADD_WRAPV (n, fmt[1] - '0', &n);
-      *string++ = *++fmt;
-    }
-
-  if (overflow
-      || min (PTRDIFF_MAX, SIZE_MAX) - SIZE_BOUND_EXTRA < n)
-    error ("Format width or precision too large");
-  if (size_bound < n)
-    size_bound = n;
+ case '-': string += !minusflag; minusflag = true; continue;
+ case '+': string += !plusflag; plusflag = true; continue;
+ case ' ': string += !spaceflag; spaceflag = true; continue;
+ case '0': string += !zeroflag; zeroflag = true; continue;
  }
-      else if (! (*fmt == '-' || *fmt == ' ' || *fmt == '.'
-  || *fmt == '+'))
- break;
-      fmt++;
+      break;
     }
 
+  /* Parse width and precision, putting "*.*" into FMTSTAR.  */
+  if ('1' <= *fmt && *fmt <= '9')
+    fmt = parse_format_integer (fmt, &wid);
+  if (*fmt == '.')
+    fmt = parse_format_integer (fmt + 1, &prec);
+  *string++ = '*';
+  *string++ = '.';
+  *string++ = '*';
+
   /* Check for the length modifiers in textual length order, so
      that longer modifiers override shorter ones.  */
   for (mlen = 1; mlen <= maxmlen; mlen++)
     {
-      if (format_end - fmt < mlen)
- break;
       if (mlen == 1 && *fmt == 'l')
  length_modifier = long_modifier;
-      if (mlen == pDlen && memcmp (fmt, pD, pDlen) == 0)
+      if (mlen == pDlen && strncmp (fmt, pD, pDlen) == 0)
  length_modifier = pD_modifier;
-      if (mlen == pIlen && memcmp (fmt, pI, pIlen) == 0)
+      if (mlen == pIlen && strncmp (fmt, pI, pIlen) == 0)
  length_modifier = pI_modifier;
-      if (mlen == pMlen && memcmp (fmt, PRIdMAX, pMlen) == 0)
+      if (mlen == pMlen && strncmp (fmt, PRIdMAX, pMlen) == 0)
  length_modifier = pM_modifier;
     }
 
+  /* Copy optional length modifier and conversion specifier
+     character into FMTSTAR, and append a NUL.  */
   mlen = modifier_len[length_modifier];
-  memcpy (string, fmt + 1, mlen);
-  string += mlen;
+  string = mempcpy (string, fmt, mlen + 1);
   fmt += mlen;
   *string = 0;
 
-  /* Make the size bound large enough to handle floating point formats
+  /* An idea of how much space we might need.
+     This might be a field width or a precision; e.g.
+     %1.1000f and %1000.1f both might need 1000+ bytes.
+     Make it large enough to handle floating point formats
      with large numbers.  */
-  size_bound += SIZE_BOUND_EXTRA;
+  ptrdiff_t size_bound = max (wid, prec) + SIZE_BOUND_EXTRA;
 
   /* Make sure we have that much.  */
   if (size_bound > size_allocated)
@@ -257,48 +267,49 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
       sprintf_buffer = big_buffer;
       size_allocated = size_bound;
     }
-  minlen = 0;
+  int minlen = 0;
+  ptrdiff_t tem;
   switch (*fmt++)
     {
     default:
-      error ("Invalid format operation %s", fmtcpy);
+      error ("Invalid format operation %s", fmt0);
 
-/*    case 'b': */
-    case 'l':
     case 'd':
       switch (length_modifier)
  {
  case no_modifier:
   {
     int v = va_arg (ap, int);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case long_modifier:
   {
     long v = va_arg (ap, long);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case pD_modifier:
  signed_pD_modifier:
   {
     ptrdiff_t v = va_arg (ap, ptrdiff_t);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case pI_modifier:
   {
     EMACS_INT v = va_arg (ap, EMACS_INT);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case pM_modifier:
   {
     intmax_t v = va_arg (ap, intmax_t);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
+ default:
+  eassume (false);
  }
       /* Now copy into final output, truncating as necessary.  */
       string = sprintf_buffer;
@@ -311,13 +322,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
  case no_modifier:
   {
     unsigned v = va_arg (ap, unsigned);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case long_modifier:
   {
     unsigned long v = va_arg (ap, unsigned long);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case pD_modifier:
@@ -325,15 +336,17 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
  case pI_modifier:
   {
     EMACS_UINT v = va_arg (ap, EMACS_UINT);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
  case pM_modifier:
   {
     uintmax_t v = va_arg (ap, uintmax_t);
-    tem = sprintf (sprintf_buffer, fmtcpy, v);
+    tem = sprintf (sprintf_buffer, fmtstar, wid, prec, v);
   }
   break;
+ default:
+  eassume (false);
  }
       /* Now copy into final output, truncating as necessary.  */
       string = sprintf_buffer;
@@ -344,22 +357,18 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
     case 'g':
       {
  double d = va_arg (ap, double);
- tem = sprintf (sprintf_buffer, fmtcpy, d);
+ tem = sprintf (sprintf_buffer, fmtstar, wid, prec, d);
  /* Now copy into final output, truncating as necessary.  */
  string = sprintf_buffer;
  goto doit;
       }
 
-    case 'S':
-      string[-1] = 's';
-      FALLTHROUGH;
     case 's':
-      if (fmtcpy[1] != 's')
- minlen = atoi (&fmtcpy[1]);
+      minlen = minusflag ? -wid : wid;
       string = va_arg (ap, char *);
       tem = strnlen (string, STRING_BYTES_BOUND + 1);
       if (tem == STRING_BYTES_BOUND + 1)
- error ("String for %%s or %%S format is too long");
+ error ("String for %%s format is too long");
       width = strwidth (string, tem);
       goto doit1;
 
@@ -432,14 +441,12 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
  string = charbuf;
  string[tem] = 0;
  width = strwidth (string, tem);
- if (fmtcpy[1] != 'c')
-  minlen = atoi (&fmtcpy[1]);
+ minlen = minusflag ? -wid : wid;
  goto doit1;
       }
 
     case '%':
       /* Treat this '%' as normal.  */
-      fmt0 = fmt - 1;
       break;
     }
  }
@@ -450,13 +457,13 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
  src = uLSQM, srclen = sizeof uLSQM - 1;
       else if (quoting_style == CURVE_QUOTING_STYLE && fmtchar == '\'')
  src = uRSQM, srclen = sizeof uRSQM - 1;
-      else if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
- src = "'", srclen = 1;
       else
  {
-  while (fmt < format_end && !CHAR_HEAD_P (*fmt))
-    fmt++;
-  src = fmt0, srclen = fmt - fmt0;
+  if (quoting_style == STRAIGHT_QUOTING_STYLE && fmtchar == '`')
+    fmtchar = '\'';
+  eassert (ASCII_CHAR_P (fmtchar));
+  *bufptr++ = fmtchar;
+  continue;
  }
 
       if (bufsize < srclen)
@@ -479,8 +486,6 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
   xfree (big_buffer);
 
   *bufptr = 0; /* Make sure our string ends with a '\0' */
-
-  SAFE_FREE ();
   return bufptr - buffer;
 }
 
@@ -495,10 +500,9 @@ doprnt (char *buffer, ptrdiff_t bufsize, const char *format,
 ptrdiff_t
 esprintf (char *buf, char const *format, ...)
 {
-  ptrdiff_t nbytes;
   va_list ap;
   va_start (ap, format);
-  nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, 0, ap);
+  ptrdiff_t nbytes = doprnt (buf, TYPE_MAXIMUM (ptrdiff_t), format, ap);
   va_end (ap);
   return nbytes;
 }
@@ -534,10 +538,9 @@ evxprintf (char **buf, ptrdiff_t *bufsize,
 {
   for (;;)
     {
-      ptrdiff_t nbytes;
       va_list ap_copy;
       va_copy (ap_copy, ap);
-      nbytes = doprnt (*buf, *bufsize, format, 0, ap_copy);
+      ptrdiff_t nbytes = doprnt (*buf, *bufsize, format, ap_copy);
       va_end (ap_copy);
       if (nbytes < *bufsize - 1)
  return nbytes;
diff --git a/src/lisp.h b/src/lisp.h
index a24898004d..957ca41702 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4034,8 +4034,8 @@ #define FLOAT_TO_STRING_BUFSIZE 350
 extern void syms_of_print (void);
 
 /* Defined in doprnt.c.  */
-extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, const char *,
- va_list);
+extern ptrdiff_t doprnt (char *, ptrdiff_t, const char *, va_list)
+  ATTRIBUTE_FORMAT_PRINTF (3, 0);
 extern ptrdiff_t esprintf (char *, char const *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern ptrdiff_t exprintf (char **, ptrdiff_t *, char const *, ptrdiff_t,
diff --git a/src/sysdep.c b/src/sysdep.c
index e161172a79..790ae084d3 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -2192,7 +2192,7 @@ snprintf (char *buf, size_t bufsize, char const *format, ...)
   if (size)
     {
       va_start (ap, format);
-      nbytes = doprnt (buf, size, format, 0, ap);
+      nbytes = doprnt (buf, size, format, ap);
       va_end (ap);
     }
 
diff --git a/src/xdisp.c b/src/xdisp.c
index 615f0ca7cf..213c2a464a 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -11269,13 +11269,10 @@ vmessage (const char *m, va_list ap)
  {
   if (m)
     {
-      ptrdiff_t len;
       ptrdiff_t maxsize = FRAME_MESSAGE_BUF_SIZE (f);
       USE_SAFE_ALLOCA;
       char *message_buf = SAFE_ALLOCA (maxsize + 1);
-
-      len = doprnt (message_buf, maxsize, m, 0, ap);
-
+      ptrdiff_t len = doprnt (message_buf, maxsize, m, ap);
       message3 (make_string (message_buf, len));
       SAFE_FREE ();
     }
--
2.17.1




Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Eli Zaretskii
> From: Paul Eggert <[hidden email]>
> Date: Tue, 15 Sep 2020 18:50:51 -0700
> Cc: Paul Eggert <[hidden email]>
>
> (doprnt): Omit FORMAT_END argument, since it’s always NULL,
> which means doprnt must call strlen on FORMAT; doing this means
> doprnt needs just one pass over FORMAT, not two.  All callers changed.

This loses a feature.  Emacs traditionally supports strings with
embedded null characters, and this feature is in line with that.  It
is true that it is currently unused, but why is it a good idea to
remove it?

If the problem is the slight inefficiency caused by the call to
strlen, we could instead solve it in the callers: all the formats I've
seen are const strings, so the value of FORMAT_END can be computed at
compile time, and used instead of passing NULL.

> Drop support for
> "%S" which is never used and which would cause GCC to warn anyway.

This is an old compatibility feature, I'd rather not drop it.  Who
knows what code relies on the fact that 'message' and 'format-message'
support it?

Thanks.



Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Paul Eggert
On 9/16/20 7:58 AM, Eli Zaretskii wrote:

> Emacs traditionally supports strings with
> embedded null characters, and this feature is in line with that.  It
> is true that it is currently unused, but why is it a good idea to
> remove it?

It's a good idea not only because the feature is currently unused and its
support complicates and adds bugs to the code, but also because it would be a
bad idea to ever use the feature.

The Emacs feature is for Lisp strings. Emacs does not (and for API reasons, it
cannot practically) rely on embedded NULs in C strings. Among other things, if
we tried to use C-style printf formats with embedded NULs, GCC's warnings about
formats not matching their arguments would stop working. These GCC warnings are
quite useful for preventing bugs in Emacs's C code and have helped to catch many
such bugs, and we should not give them up.

More generally, the vestigial support for NULs and %S in doprnt's C formats
dates back to long ago, before GCC warned about these features. It may have made
sense back then but it does not make sense now. Any C-level formatting facility
that supports NULs and %S should not attempt to use the longstanding printf API
that is incompatible with such support - it should be a separate facility. And
Emacs C code already has a non-printf facility for formatting with NULs and %S -
e.g., Fformat and Fformat_message - so it doesn't need yet another one.

> If the problem is the slight inefficiency caused by the call to
> strlen, we could instead solve it in the callers: all the formats I've
> seen are const strings, so the value of FORMAT_END can be computed at
> compile time, and used instead of passing NULL.

This would require unnecessary complication of the code and runtime overhead.
doprnt is called directly only in few places: esprintf, evxprintf, vmessage, and
the (rarely-if-ever used) snprintf replacement. If we modified callers to do
what you're suggesting, we'd need to modify the callers of these functions, and
their callers too, all the way back to the original ancestor call that specifies
the format. This would clutter and bloat the code and would add runtime cost to
all the callers. For example, we'd have to change all calls to the 'error'
function from something like this:

   if (ret < GNUTLS_E_SUCCESS)
     error ("GnuTLS AEAD cipher %s/%s initialization failed: %s",
           gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));

to something like this:

   if (ret < GNUTLS_E_SUCCESS)
     {
       char const msg[] = "GnuTLS AEAD cipher %s/%s initialization failed: %s";
       error (msg, sizeof msg - 1,
             gnutls_cipher_get_name (gca), desc, emacs_gnutls_strerror (ret));
     }

Of course we could invent a new macro ERROR to package this up, but such a macro
would still be less efficient than what we have, and worse it would not always
work, for cases like this:

   if (NILP (tem) || (XBUFFER (tem) != current_buffer))
     error (for_region ? "The mark is not set now, so there is no region"
           : "The mark is not set now");

or like this:

       error (format, string); // in x_check_errors

And of course this could be gotten around as well, but we're now talking about a
reasonably large amount of code surgery that will hurt code readability and
runtime performance, all to support a low-level C feature that Emacs does not
use and won't ever reasonably use.

Instead, we should leave most of the C code alone and just adjust 'doprnt' and
its very few callers.

>> Drop support for
>> "%S" which is never used and which would cause GCC to warn anyway.
>
> This is an old compatibility feature, I'd rather not drop it.  Who
> knows what code relies on the fact that 'message' and 'format-message'
> support it?

I know because I checked all the code. No Emacs C code uses %S. And none is
likely to use it in the future because GCC warns about it if you try. (To be
specific: GCC warns unless you use %S compatibly with its standard C meaning,
which differs from that of Emacs doprnt - which is yet another compatibility
minefield if we insist on keeping doprnt's unused %S feature.)



Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Paul Eggert
On 9/18/20 12:30 AM, Eli Zaretskii wrote:

> How about ... we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?  This will allow
> us to keep some of the feature, and I think the amount of changes will
> be smaller.  It should also not be much slower than what you propose.

Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf
(named by analogy from esprintf) whose API is like C vsnprintf but which does
formatting the Emacs way. We can avoid duplication of code by implementing
doprnt in terms of evsnprintf. This fixes the performance issue with current
Emacs, and avoids the need for evsnprintf having to check for both NULs and
FORMAT_END etc. Updated patch attached.

0001-New-function-evsnprintf-to-speed-clean-up-doprnt.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Eli Zaretskii
> Cc: [hidden email]
> From: Paul Eggert <[hidden email]>
> Date: Thu, 15 Oct 2020 10:58:49 -0700
>
> On 9/18/20 12:30 AM, Eli Zaretskii wrote:
>
> > How about ... we modify doprint to exit when either it finds
> > NUL or reaches the character specified by FORMAT_END?  This will allow
> > us to keep some of the feature, and I think the amount of changes will
> > be smaller.  It should also not be much slower than what you propose.
>
> Better yet, let's leave doprnt's API unchanged, and add a function evsnprintf
> (named by analogy from esprintf) whose API is like C vsnprintf but which does
> formatting the Emacs way.

No, let's not, please.  I didn't agree to modifying doprnt in
significant ways, so you are now suggesting to do an even more radical
modification, just under another name?  This is moving away of a
potential compromise point, not towards it.



Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Eli Zaretskii
> Cc: [hidden email]
> From: Paul Eggert <[hidden email]>
> Date: Thu, 15 Oct 2020 11:50:48 -0700
>
> On 10/15/20 11:12 AM, Eli Zaretskii wrote:
> > I didn't agree to modifying doprnt in
> > significant ways, so you are now suggesting to do an even more radical
> > modification, just under another name?
>
> If you'd rather have the patch keep doprnt entirely as-is (i.e., not change
> doprnt's implementation at all), I can easily modify the patch to do that.

No, I want to continue calling doprnt directly, not replace all its
calls with a call to another function.  doprnt by itself is useless
unless it is used by the relevant primitives.  I see no need to
replace it with another function, because doprnt works and works well.

> All current Emacs code that calls doprnt would benefit from switching to the
> proposed evsnprintf function, an API that is simpler and faster and that has
> better static checking with GCC.

Yes, that's exactly where we disagree.  I made my proposal to find
some kind of middle ground, and was disappointed to see you suggesting
to move even farther from a potential agreement.

In general, I'm against messing with code that has been stable for
ages, for ephemeral benefits or minor stylistic reasons.  If nothing
else, it gets in the way of maintaining Emacs because code I've known
for years and could find with my eyes closed constantly shifts and
changes under my feet.  Another example of this is that src/lisp.h
macros look nowadays completely different from what they were several
years ago.  This need to constantly unlearn that which was burned into
my muscle memory is not pleasant at all.  Changes that take us forward
because they are needed for new and improved features are welcome and
justified, but there are no new features in all those changes,
including in the doprnt patch.  I wish this fever would stop.



Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Paul Eggert
On 10/15/20 12:05 PM, Eli Zaretskii wrote:
> I want to continue calling doprnt directly, not replace all its
> calls with a call to another function.

OK, then attached is a patch that does things that way. This patch affects only
the implementation of doprnt; no doprnt callers are affected. The code should
have most of the proposed performance benefits of the earlier patches I proposed.

0001-Improve-doprnt-performance.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

bug#43439: [PATCH] doprnt improvements

Paul Eggert
On 10/17/20 11:32 AM, Eli Zaretskii wrote:
> this is still nowhere near the compromise I suggested.
> And it loses %S, something I didn't agree to.

Attached is a revised patch that implements what you suggested, and it does not
lose %S.

Here is what you suggested in
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=43439#14>:

> How about a compromise: we modify doprint to exit when either it finds
> NUL or reaches the character specified by FORMAT_END?

and the attached patch implements this proposed API change.

0001-Improve-doprnt-performance.patch (18K) Download Attachment