newline cache

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

newline cache

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I recall that some months ago I encountered a bug in the newline cache
which caused mail-fetch-field to include a extra newline in the value
of the X-RMAIL-ATTRIBUTES field.  ISTR that the bug was fixed.

But it happened again, in Emacs as rebuilt on March 19 from fairly
recent sources.

There was little hope I can find a way to reproduce it from a
standing start -- I could hardly hope to get the newline cache just
right.

Does anyone remember what the fix was for that bug?

--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
> Date: Mon, 21 Apr 2014 12:28:03 -0400
> From: Richard Stallman <[hidden email]>
>
> I recall that some months ago I encountered a bug in the newline cache
> which caused mail-fetch-field to include a extra newline in the value
> of the X-RMAIL-ATTRIBUTES field.  ISTR that the bug was fixed.
>
> But it happened again, in Emacs as rebuilt on March 19 from fairly
> recent sources.
>
> There was little hope I can find a way to reproduce it from a
> standing start -- I could hardly hope to get the newline cache just
> right.
>
> Does anyone remember what the fix was for that bug?

I've seen a couple of problems in the current pretest which
disappeared once I turned off the cache in the Rmail buffer.  I've
tried to see what does Rmail do that triggers this, but came up
empty-handed.  Help is welcome.

In general, each insertion and each deletion in any buffer must call
invalidate_buffer_caches.  I tried to find any code that doesn't, but
couldn't find anything.

Another possibility is that the tricky optimizations in find_newline
have some subtle bug, so close scrutiny of that code might find the
cause.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    I've seen a couple of problems in the current pretest which
    disappeared once I turned off the cache in the Rmail buffer.  I've
    tried to see what does Rmail do that triggers this, but came up
    empty-handed.

I investigated  what was happening when the bug occurred.
(forward-line 1) moved across an extra newline, down to point-max.
But I don't know how the newline cache works, so I did not try to
study that code.

One possible approach is to write a builtin function to check the newline cache
contents for valid correspondence to the buffer contents.
I would not mind making Rmail run that in a post-command-hook.
Then I could find out which command causes the cache to become incorrect.

But someone else will have to write that builtin function.

--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Stefan Monnier
> One possible approach is to write a builtin function to check the
> newline cache contents for valid correspondence to the buffer
> contents.

Maybe add a monitoring code, which (when enabled at compile-time) always
checks the return value of that cache against the non-cached result?


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Richard Stallman
> Date: Tue, 22 Apr 2014 01:37:50 -0400
> From: Richard Stallman <[hidden email]>
> CC: [hidden email]
>
>     I've seen a couple of problems in the current pretest which
>     disappeared once I turned off the cache in the Rmail buffer.  I've
>     tried to see what does Rmail do that triggers this, but came up
>     empty-handed.
>
> I investigated  what was happening when the bug occurred.
> (forward-line 1) moved across an extra newline, down to point-max.

When forward-line errs, it's too late, because the cache is already
corrupted.  We need to catch the primitive that causes the corruption.

> But I don't know how the newline cache works, so I did not try to
> study that code.

The cache code is in region-cache.c.

> One possible approach is to write a builtin function to check the newline cache
> contents for valid correspondence to the buffer contents.
> I would not mind making Rmail run that in a post-command-hook.
> Then I could find out which command causes the cache to become incorrect.
>
> But someone else will have to write that builtin function.

I did it, and installed that on the emacs-24 branch (will be merged to
trunk soon).  If you are tracking the trunk, and don't want to wait
for the merge, you can install the patch below in your sources and
rebuild.

The function I added returns an array with 2 sub-arrays, one with
newline positions according to the cache, the other with newline
positions according to the actual buffer contents.  Check these two
sub-arrays to be 'equal', and if they aren't, the cache was corrupted.

Here's the patch that I installed:

--- src/search.c 2014-03-16 16:28:34 +0000
+++ src/search.c 2014-04-22 17:37:35 +0000
@@ -3108,6 +3108,170 @@ DEFUN ("regexp-quote", Fregexp_quote, Sr
  out - temp,
  STRING_MULTIBYTE (string));
 }
+
+/* Like find_newline, but doesn't use the cache, and only searches forward.  */
+static ptrdiff_t
+find_newline1 (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
+       ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *shortage,
+       ptrdiff_t *bytepos, bool allow_quit)
+{
+  if (count > 0)
+    {
+      if (!end)
+ end = ZV, end_byte = ZV_BYTE;
+    }
+  else
+    {
+      if (!end)
+ end = BEGV, end_byte = BEGV_BYTE;
+    }
+  if (end_byte == -1)
+    end_byte = CHAR_TO_BYTE (end);
+
+  if (shortage != 0)
+    *shortage = 0;
+
+  immediate_quit = allow_quit;
+
+  if (count > 0)
+    while (start != end)
+      {
+        /* Our innermost scanning loop is very simple; it doesn't know
+           about gaps, buffer ends, or the newline cache.  ceiling is
+           the position of the last character before the next such
+           obstacle --- the last character the dumb search loop should
+           examine.  */
+ ptrdiff_t tem, ceiling_byte = end_byte - 1;
+
+ if (start_byte == -1)
+  start_byte = CHAR_TO_BYTE (start);
+
+        /* The dumb loop can only scan text stored in contiguous
+           bytes. BUFFER_CEILING_OF returns the last character
+           position that is contiguous, so the ceiling is the
+           position after that.  */
+ tem = BUFFER_CEILING_OF (start_byte);
+ ceiling_byte = min (tem, ceiling_byte);
+
+        {
+          /* The termination address of the dumb loop.  */
+  unsigned char *lim_addr = BYTE_POS_ADDR (ceiling_byte) + 1;
+  ptrdiff_t lim_byte = ceiling_byte + 1;
+
+  /* Nonpositive offsets (relative to LIM_ADDR and LIM_BYTE)
+     of the base, the cursor, and the next line.  */
+  ptrdiff_t base = start_byte - lim_byte;
+  ptrdiff_t cursor, next;
+
+  for (cursor = base; cursor < 0; cursor = next)
+    {
+              /* The dumb loop.  */
+      unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
+      next = nl ? nl - lim_addr : 0;
+
+              if (! nl)
+ break;
+      next++;
+
+      if (--count == 0)
+ {
+  immediate_quit = 0;
+  if (bytepos)
+    *bytepos = lim_byte + next;
+  return BYTE_TO_CHAR (lim_byte + next);
+ }
+            }
+
+  start_byte = lim_byte;
+  start = BYTE_TO_CHAR (start_byte);
+        }
+      }
+
+  immediate_quit = 0;
+  if (shortage)
+    *shortage = count;
+  if (bytepos)
+    {
+      *bytepos = start_byte == -1 ? CHAR_TO_BYTE (start) : start_byte;
+      eassert (*bytepos == CHAR_TO_BYTE (start));
+    }
+  return start;
+}
+
+DEFUN ("newline-cache-check", Fnewline_cache_check, Snewline_cache_check,
+       0, 1, 0,
+       doc: /* Check the newline cache of BUFFER against buffer contents.
+
+BUFFER defaults to the current buffer.
+
+Value is an array of 2 sub-arrays of buffer positions for newlines,
+the first based on the cache, the second based on actually scanning
+the buffer.  If the buffer doesn't have a cache, the value is nil.  */)
+  (Lisp_Object buffer)
+{
+  struct buffer *buf;
+  struct region_cache *nlcache;
+  ptrdiff_t shortage, nl_count_cache, nl_count_buf;
+  Lisp_Object cache_newlines, buf_newlines, val;
+  ptrdiff_t from, from_byte, found, i;
+
+  if (NILP (buffer))
+    buf = current_buffer;
+  else
+    {
+      CHECK_BUFFER (buffer);
+      buf = XBUFFER (buffer);
+    }
+  if (buf->base_buffer)
+    buf = buf->base_buffer;
+
+  /* If the buffer doesn't have a newline cache, return nil.  */
+  if (NILP (BVAR (buf, cache_long_scans))
+      || buf->newline_cache == NULL)
+    return Qnil;
+
+  /* How many newlines are there according to the cache?  */
+  find_newline (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+ BUF_Z (buf), BUF_Z_BYTE (buf),
+ TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_cache = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+
+  /* Create vector and populate it.  */
+  cache_newlines = make_uninit_vector (nl_count_cache);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+ ASET (cache_newlines, i, make_number (found - 1));
+    }
+
+  /* Now do the same, but without using the cache.  */
+  find_newline1 (BUF_BEG (buf), BUF_BEG_BYTE (buf),
+ BUF_Z (buf), BUF_Z_BYTE (buf),
+ TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_buf = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+  buf_newlines = make_uninit_vector (nl_count_buf);
+  for (from = BUF_BEG( buf), found = from, i = 0;
+       from < BUF_Z (buf);
+       from = found, i++)
+    {
+      ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+      found = find_newline1 (from, from_byte, 0, -1, 1, &shortage, NULL, true);
+      if (shortage == 0)
+ ASET (buf_newlines, i, make_number (found - 1));
+    }
+
+  /* Construct the value and return it.  */
+  val = make_uninit_vector (2);
+  ASET (val, 0, cache_newlines);
+  ASET (val, 1, buf_newlines);
+  return val;
+}
 
 void
 syms_of_search (void)
@@ -3180,4 +3344,5 @@ is to bind it with `let' around a small
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
   defsubr (&Sregexp_quote);
+  defsubr (&Snewline_cache_check);
 }


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Stefan Monnier
> From: Stefan Monnier <[hidden email]>
> Cc: Eli Zaretskii <[hidden email]>, [hidden email]
> Date: Tue, 22 Apr 2014 10:28:19 -0400
>
> Maybe add a monitoring code, which (when enabled at compile-time) always
> checks the return value of that cache against the non-cached result?

As I explained in a previous message, when the value returned by the
cache is wrong, it's too late, because the corruption happened some
time in the past.  It's like debugging memory allocation bugs by
catching exceptions in 'free'.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Richard Stallman
In reply to this post by Eli Zaretskii
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > I investigated  what was happening when the bug occurred.
    > (forward-line 1) moved across an extra newline, down to point-max.

    When forward-line errs, it's too late, because the cache is already
    corrupted.

The point is, this is how I know the problem is in the newline cache.
A year ago I debugged the subroutines of Fforward_line and verified
that the problem happened in the call to the newline cache code.

    The cache code is in region-cache.c.

I don't have time to study and debug that.

--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Richard Stallman
In reply to this post by Eli Zaretskii
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    The function I added returns an array with 2 sub-arrays, one with
    newline positions according to the cache, the other with newline
    positions according to the actual buffer contents.

That may take a very long time for my RMAIL buffer, which is 4 meg.  I
don't think I could tolerate having that run after each Rmail command.

To make this fast enough for me to use it to localize the bug, it
needs to operate only on a specified part of the buffer.  Maybe that
will make it fast enough.  If not, it needs to be optimized in other
ways too.

--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Richard Stallman
> Date: Wed, 23 Apr 2014 01:31:24 -0400
> From: Richard Stallman <[hidden email]>
> CC: [hidden email]
>
>     > I investigated  what was happening when the bug occurred.
>     > (forward-line 1) moved across an extra newline, down to point-max.
>
>     When forward-line errs, it's too late, because the cache is already
>     corrupted.
>
> The point is, this is how I know the problem is in the newline cache.

I don't doubt that.  An even easier way to make sure is set
cache-long-scans to a nil value.

>     The cache code is in region-cache.c.
>
> I don't have time to study and debug that.

That's okay, I just mentioned that for if/when you will.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Richard Stallman
> Date: Wed, 23 Apr 2014 01:31:25 -0400
> From: Richard Stallman <[hidden email]>
> CC: [hidden email]
>
>     The function I added returns an array with 2 sub-arrays, one with
>     newline positions according to the cache, the other with newline
>     positions according to the actual buffer contents.
>
> That may take a very long time for my RMAIL buffer, which is 4 meg.  I
> don't think I could tolerate having that run after each Rmail command.

I suggest to try it; you might be surprised.  I tried it on a 7MB mbox
file, and didn't see any significant slowdown.  The reason is simple:
the mbox buffer is almost always narrowed, and find_newline, which is
the workhorse of the function I wrote, and also the main suspect, only
looks within the restriction.

In any case, if you decide to use the debugging code, please use the
patch below, which fixes a stupid thinko in the previous version.

> To make this fast enough for me to use it to localize the bug, it
> needs to operate only on a specified part of the buffer.

Narrowing already does that.  Anyway, it is impossible to predict in
advance in which portions of the buffer the corruption will happen, at
least not with the level of understanding of this bug that I have now.

FWIW, I played with this post-command-hook in a large mbox buffer, and
couldn't reproduce any problems yet.  So something else might be at
work here.

Here's an updated patch:

=== modified file 'src/search.c'
--- src/search.c 2014-03-16 16:28:34 +0000
+++ src/search.c 2014-04-23 15:21:25 +0000
@@ -3108,6 +3108,187 @@ DEFUN ("regexp-quote", Fregexp_quote, Sr
  out - temp,
  STRING_MULTIBYTE (string));
 }
+
+/* Like find_newline, but doesn't use the cache, and only searches forward.  */
+static ptrdiff_t
+find_newline1 (ptrdiff_t start, ptrdiff_t start_byte, ptrdiff_t end,
+       ptrdiff_t end_byte, ptrdiff_t count, ptrdiff_t *shortage,
+       ptrdiff_t *bytepos, bool allow_quit)
+{
+  if (count > 0)
+    {
+      if (!end)
+ end = ZV, end_byte = ZV_BYTE;
+    }
+  else
+    {
+      if (!end)
+ end = BEGV, end_byte = BEGV_BYTE;
+    }
+  if (end_byte == -1)
+    end_byte = CHAR_TO_BYTE (end);
+
+  if (shortage != 0)
+    *shortage = 0;
+
+  immediate_quit = allow_quit;
+
+  if (count > 0)
+    while (start != end)
+      {
+        /* Our innermost scanning loop is very simple; it doesn't know
+           about gaps, buffer ends, or the newline cache.  ceiling is
+           the position of the last character before the next such
+           obstacle --- the last character the dumb search loop should
+           examine.  */
+ ptrdiff_t tem, ceiling_byte = end_byte - 1;
+
+ if (start_byte == -1)
+  start_byte = CHAR_TO_BYTE (start);
+
+        /* The dumb loop can only scan text stored in contiguous
+           bytes. BUFFER_CEILING_OF returns the last character
+           position that is contiguous, so the ceiling is the
+           position after that.  */
+ tem = BUFFER_CEILING_OF (start_byte);
+ ceiling_byte = min (tem, ceiling_byte);
+
+        {
+          /* The termination address of the dumb loop.  */
+  unsigned char *lim_addr = BYTE_POS_ADDR (ceiling_byte) + 1;
+  ptrdiff_t lim_byte = ceiling_byte + 1;
+
+  /* Nonpositive offsets (relative to LIM_ADDR and LIM_BYTE)
+     of the base, the cursor, and the next line.  */
+  ptrdiff_t base = start_byte - lim_byte;
+  ptrdiff_t cursor, next;
+
+  for (cursor = base; cursor < 0; cursor = next)
+    {
+              /* The dumb loop.  */
+      unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor);
+      next = nl ? nl - lim_addr : 0;
+
+              if (! nl)
+ break;
+      next++;
+
+      if (--count == 0)
+ {
+  immediate_quit = 0;
+  if (bytepos)
+    *bytepos = lim_byte + next;
+  return BYTE_TO_CHAR (lim_byte + next);
+ }
+            }
+
+  start_byte = lim_byte;
+  start = BYTE_TO_CHAR (start_byte);
+        }
+      }
+
+  immediate_quit = 0;
+  if (shortage)
+    *shortage = count;
+  if (bytepos)
+    {
+      *bytepos = start_byte == -1 ? CHAR_TO_BYTE (start) : start_byte;
+      eassert (*bytepos == CHAR_TO_BYTE (start));
+    }
+  return start;
+}
+
+DEFUN ("newline-cache-check", Fnewline_cache_check, Snewline_cache_check,
+       0, 1, 0,
+       doc: /* Check the newline cache of BUFFER against buffer contents.
+
+BUFFER defaults to the current buffer.
+
+Value is an array of 2 sub-arrays of buffer positions for newlines,
+the first based on the cache, the second based on actually scanning
+the buffer.  If the buffer doesn't have a cache, the value is nil.  */)
+  (Lisp_Object buffer)
+{
+  struct buffer *buf, *old = NULL;
+  ptrdiff_t shortage, nl_count_cache, nl_count_buf;
+  Lisp_Object cache_newlines, buf_newlines, val;
+  ptrdiff_t from, found, i;
+
+  if (NILP (buffer))
+    buf = current_buffer;
+  else
+    {
+      CHECK_BUFFER (buffer);
+      buf = XBUFFER (buffer);
+      old = current_buffer;
+    }
+  if (buf->base_buffer)
+    buf = buf->base_buffer;
+
+  /* If the buffer doesn't have a newline cache, return nil.  */
+  if (NILP (BVAR (buf, cache_long_scans))
+      || buf->newline_cache == NULL)
+    return Qnil;
+
+  /* find_newline can only work on the current buffer.  */
+  if (old != NULL)
+    set_buffer_internal_1 (buf);
+
+  /* How many newlines are there according to the cache?  */
+  find_newline (BEGV, BEGV_BYTE, ZV, ZV_BYTE,
+ TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_cache = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+
+  /* Create vector and populate it.  */
+  cache_newlines = make_uninit_vector (nl_count_cache);
+
+  if (nl_count_cache)
+    {
+      for (from = BEGV, found = from, i = 0; from < ZV; from = found, i++)
+ {
+  ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+  found = find_newline (from, from_byte, 0, -1, 1, &shortage,
+ NULL, true);
+  if (shortage != 0 || i >= nl_count_cache)
+    break;
+  ASET (cache_newlines, i, make_number (found - 1));
+ }
+      /* Fill the rest of slots with an invalid position.  */
+      for ( ; i < nl_count_cache; i++)
+ ASET (cache_newlines, i, make_number (-1));
+    }
+
+  /* Now do the same, but without using the cache.  */
+  find_newline1 (BEGV, BEGV_BYTE, ZV, ZV_BYTE,
+ TYPE_MAXIMUM (ptrdiff_t), &shortage, NULL, true);
+  nl_count_buf = TYPE_MAXIMUM (ptrdiff_t) - shortage;
+  buf_newlines = make_uninit_vector (nl_count_buf);
+  if (nl_count_buf)
+    {
+      for (from = BEGV, found = from, i = 0; from < ZV; from = found, i++)
+ {
+  ptrdiff_t from_byte = CHAR_TO_BYTE (from);
+
+  found = find_newline1 (from, from_byte, 0, -1, 1, &shortage,
+ NULL, true);
+  if (shortage != 0 || i >= nl_count_buf)
+    break;
+  ASET (buf_newlines, i, make_number (found - 1));
+ }
+      for ( ; i < nl_count_buf; i++)
+ ASET (buf_newlines, i, make_number (-1));
+    }
+
+  /* Construct the value and return it.  */
+  val = make_uninit_vector (2);
+  ASET (val, 0, cache_newlines);
+  ASET (val, 1, buf_newlines);
+
+  if (old != NULL)
+    set_buffer_internal_1 (old);
+  return val;
+}
 
 void
 syms_of_search (void)
@@ -3180,4 +3361,5 @@ is to bind it with `let' around a small
   defsubr (&Smatch_data);
   defsubr (&Sset_match_data);
   defsubr (&Sregexp_quote);
+  defsubr (&Snewline_cache_check);
 }


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Richard Stallman
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    I suggest to try it; you might be surprised.  I tried it on a 7MB mbox
    file, and didn't see any significant slowdown.  The reason is simple:
    the mbox buffer is almost always narrowed, and find_newline, which is
    the workhorse of the function I wrote, and also the main suspect, only
    looks within the restriction.

Well, I could arrange to narrow to the current message
before running the test.  Maybe that will do enough.

--
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.


Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
> Date: Wed, 23 Apr 2014 20:33:10 -0400
> From: Richard Stallman <[hidden email]>
> Cc: [hidden email]
>
>     I suggest to try it; you might be surprised.  I tried it on a 7MB mbox
>     file, and didn't see any significant slowdown.  The reason is simple:
>     the mbox buffer is almost always narrowed, and find_newline, which is
>     the workhorse of the function I wrote, and also the main suspect, only
>     looks within the restriction.
>
> Well, I could arrange to narrow to the current message
> before running the test.  Maybe that will do enough.

Some additional info about this.

I didn't install newline-cache-check in a post-command-hook, primarily
because I was unsure which buffer to check the cache in.  Instead, I
waited for some problem to happen first.  This came in the form of a
qp-encoded mail that Rmail didn't succeed in decoding for display,
complaining thusly:

  Warning: corrupt attribute header in message

I then invoked newline-cache-check in INBOX, which came up OK.  Next,
I invoked newline-cache-check in " *message-viewer INBOX*", and bingo:
it detected a corruption.  The viewer buffer is narrowed to a single
email message, as it should be.  The cache thinks that there are only
2 newlines in the narrowed region, while the truth is there are much
more, and what the cache thinks is actually a tiny subset of the
truth:

(newline-cache-check (get-buffer " *message-viewer INBOX*"))
 => [[27793944 27797783] [27788878 27788912 27788938 27788985 27789043 27789110 27789123 27789161 27789182 27789233 27789306 27789344 27789365 27789416 27789488 27789503 27789575 27789645 27789660 27789731 27789771 27789809 27789830 27789881 27789986 27790091 27790187 27790220 27790282 27790355 27790836 27790897 27790959 27791021 27791083 27791145 27791207 27791269 27791331 27791393 27791455 27791486 27791590 27792014 27792056 27792563 27792636 27792680 27792719 27792762 27792840 27792873 27792947 27792965 27793010 27793054 27793112 27793144 27793173 27793174 27793251 27793328 27793405 27793482 27793559 27793636 27793713 27793790 27793867 27793944 27794020 27794097 27794174 27794251 27794328 27794405 27794482 27794559 27794635 27794712 27794789 27794866 27794943 27795020 27795097 27795174 27795251 27795328 27795405 27795482 27795559 27795636 27795713 27795788 27795863 27795939 27796015 27796091 27796166 27796243 27796320 27796397 27796474 27796551 27796628 27796705 27796782 27796859

So now I will install a check in post-command-hook and try to see what
causes the corruption.

(As you see, my Rmail buffer is very large, about 29MB, but I don't
think it matters, because the check is always done in a region
narrowed to a single email message, and how large can that be?)

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
> Date: Fri, 25 Apr 2014 12:20:06 +0300
> Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and
> truncated.
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email]
>
> So now I will install a check in post-command-hook and try to see what
> causes the corruption.

Based on the few incidents I had until now, it looks like the cache
gets corrupted after deleting several mails, then saving the mailbox
buffer.  Not sure yet what exactly causes this.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
> Date: Sat, 26 Apr 2014 22:58:51 +0300
> From: Eli Zaretskii <[hidden email]>
> Cc: [hidden email]
>
> Based on the few incidents I had until now, it looks like the cache
> gets corrupted after deleting several mails, then saving the mailbox
> buffer.  Not sure yet what exactly causes this.

One more important fact: not each time I delete messages and then save
the mailbox, I get the cache corrupted.  So some other factor is at
work here.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Jarek Czekalski
Is it a bug that exhibits through line highlighting? I get text buffers easily malformed in such a way, that instead of a single line being highlighted, it highlights several lines. And the END key goes to the end of this pseudo block of lines. The only way I see to recover is closing the buffer and reopening. Revert does not help.

I'll try to come up with a recipe.
Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
> Date: Tue, 29 Apr 2014 01:41:41 -0700 (PDT)
> From: Jarek Czekalski <[hidden email]>
>
> Is it a bug that exhibits through line highlighting? I get text buffers
> easily malformed in such a way, that instead of a single line being
> highlighted, it highlights several lines. And the END key goes to the end of
> this pseudo block of lines. The only way I see to recover is closing the
> buffer and reopening. Revert does not help.

If the problem goes away when you set cache-long-scans to nil, then
yes, this is related to the newline cache.

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Damien Wyart-2
In reply to this post by Eli Zaretskii
> > Based on the few incidents I had until now, it looks like the cache
> > gets corrupted after deleting several mails, then saving the mailbox
> > buffer. Not sure yet what exactly causes this.

* Eli Zaretskii <[hidden email]> [2014-04-27 05:42]:
> One more important fact: not each time I delete messages and then save
> the mailbox, I get the cache corrupted. So some other factor is at
> work here.

Hi Eli,

Was there some progress on this problem?

24.4 is approaching and I think killing this bug before would be nice.

Thanks,
--
Damien

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Stefan Monnier
> Was there some progress on this problem?
> 24.4 is approaching and I think killing this bug before would be nice.

I don't see this bug in the list of "important" bugs.
Does it have a bug-tracking number?


        Stefan

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Damien Wyart-2
> Date: Wed, 21 May 2014 10:41:13 +0200
> From: Damien Wyart <[hidden email]>
> Cc: [hidden email]
>
> > > Based on the few incidents I had until now, it looks like the cache
> > > gets corrupted after deleting several mails, then saving the mailbox
> > > buffer. Not sure yet what exactly causes this.
>
> * Eli Zaretskii <[hidden email]> [2014-04-27 05:42]:
> > One more important fact: not each time I delete messages and then save
> > the mailbox, I get the cache corrupted. So some other factor is at
> > work here.
>
> Hi Eli,
>
> Was there some progress on this problem?
>
> 24.4 is approaching and I think killing this bug before would be nice.

No real progress, unfortunately.  The bug is very elusive, almost a
Heisenbug, so I cannot reliably reproduce it, and thus still don't
have a simple recipe starting from "emacs -Q" that I could use to
investigate it and verify that it's fixed.

Without a simple reliable recipe, it is extremely hard to find the
root cause, because the newline cache is consulted _a_lot_, and on top
of that Rmail swaps buffer text, which makes traces even harder to
understand and interpret.

I'm beginning to think that maybe the only practical solution, at
least for 24.4, is to disable caching in Rmail buffers (since I saw no
reports of similar problems in any other modes).

Reply | Threaded
Open this post in threaded view
|

Re: newline cache

Eli Zaretskii
In reply to this post by Damien Wyart-2
> Date: Wed, 21 May 2014 10:41:13 +0200
> From: Damien Wyart <[hidden email]>
> Cc: [hidden email]
>
> > > Based on the few incidents I had until now, it looks like the cache
> > > gets corrupted after deleting several mails, then saving the mailbox
> > > buffer. Not sure yet what exactly causes this.
>
> * Eli Zaretskii <[hidden email]> [2014-04-27 05:42]:
> > One more important fact: not each time I delete messages and then save
> > the mailbox, I get the cache corrupted. So some other factor is at
> > work here.
>
> Hi Eli,
>
> Was there some progress on this problem?
>
> 24.4 is approaching and I think killing this bug before would be nice.

No real progress, unfortunately.  The bug is very elusive, almost a
Heisenbug, so I cannot reliably reproduce it, and thus still don't
have a simple recipe starting from "emacs -Q" that I could use to
investigate it and verify that it's fixed.

Without a simple reliable recipe, it is extremely hard to find the
root cause, because the newline cache is consulted _a_lot_, and on top
of that Rmail swaps buffer text, which makes traces even harder to
understand and interpret.

I'm beginning to think that maybe the only practical solution, at
least for 24.4, is to disable caching in Rmail buffers (since I saw no
reports of similar problems in any other modes).

Volunteers are welcome to help debug this, or provide additional
details.

12