# bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 According to the Emacs manual (section 37.26 Bidirectional Display) >  Emacs provides a “Full Bidirectionality” class implementation of the >  UBA, consistent with the requirements of the Unicode Standard v8.0. And again (section 22.19 Bidirectional Editing) > Emacs implements the Unicode Bidirectional Algorithm described in the Unicode Standard Annex #9, for reordering of bidirectional text for display. However these statements are false. Emacs does not implement the Unicode Bidirectional Algorithm correctly, and therefore does not even provide 'Implicit bidirectionality', which is the minimal level of conformance listed in section 4.2 'Explicit Formatting Character' of the Unicode 8.0.0 Bidirectional Algorithm specifications (www.unicode.org/reports/tr9/tr9-33.html), let alone 'Full bidirectionality'. The reason has to do with the way the Emacs bidi implementation recognizes separate paragraphs, which is inconsistent with the Unicode specifications. The unicode Bidirectional Algorithm, specify (section 3 'Basic Display Algorithm') > The algorithm reorders text only within a paragraph; characters in one > paragraph have no effect on characters in a different > paragraph. Paragraphs are divided by the Paragraph Separator or > appropriate Newline Function (for guidelines on the handling of CR, > LF, and CRLF, see Section 4.4, Directionality, and Section 5.8, > Newline Guidelines of [Unicode]). However Emacs, by its own admition (section 22.19 Bidirectional Editing), take the following approach: > Paragraph boundaries are empty lines, i.e., lines consisting entirely of whitespace characters. I'll repeat: according to Unicode a paragraph ends with a paragraph separator. What constitutes a paragraph separator is specified precisely in section 5.8 'Newline Guidelines' of The Unicode Standard version 8.0.0. For instance, on a MacOS X system, it is LF (line feed, Unicode 000A). The formatting effects of the bidi algorithm must not cross the paragraph separator boundary. And yet in Emacs the formatting extend beyond the paragraph separator, and this is the case on all operating systems. Consider, for instance, the following example. ILLUSTRATION: An English paragraph directly following a Hebrew paragraph is formatted like Hebrew text. http://imgur.com/3eyrUfAThe first, Hebrew paragraph is formatted correctly, however the second, English paragraph is formatted wrongly, as though it was a Hebrew paragraph: it is right justified, the question mark appears on the left, and so does the cursor. Once an empty paragraph is inserted between the two paragraph, the English paragraph is formatted correctly. ILLUSTRATION: When paragraphs are separated by an empty paragraph, they are formatted correctly. http://imgur.com/ZsHGkwfThis is not just a theoretical question of conformance to standards; this problem has practical consequences. Consider, for instance, a LaTeX document for typesetting Hebrew text. Normally in order to eliminate the usual leading indentation of the first line of a paragraph, a \noinent command is placed at the beginning of the paragraph. However, because the Unicode bidi algorithm determins the directionality of a paragraph based on its first word, the Hebrew text is formatted like English text. This is not a problem; it is to be expected. ILLUSTRATION: A LaTeX document for typesetting a Hebrew paragraph with no indentation of the first line. http://imgur.com/xYUkZKrOne way to resolve this is to explicitly change the directionality of the paragraph, however, disregarding the fact that this is not currently possible due to a separate Emacs bug, even if it were possible, it would affect the placement of the backslash at the beginning of the \noindent command, which will no longer look like a LaTeX command. ILLUSTRATION: Explicitly changing the directionality of the paragraph. http://imgur.com/sPcVReA(Note: This is a screenshot of a Microsoft Word application, since due to a bug, Emacs doesn't currently enable to change the automatically determined directionality of a paragraph.) So the best way to resolve this problem would be to place the \noindent command on a separate paragraph. Unfortunately, here Emacs' faulty implementatino of the Unicode bidi algorithm rears its ugly head. Since Emacs doesn't recognize the paragraph separator for what it is, it will format the Hebrew text wrongly as though it were an English text. ILLUSTRATION: Putting the \noindent on a separate paragraph results in the Hebrew text being formatted like English text http://imgur.com/44ds6rKPlacing an empty paragraph between the \noindent' command and the Hebrew text will resolve the formatting problem inside the Emacs editor, but now the \indent command, which only affects the current LaTeX paragraphs (LaTeX paragraphs are ended by an empty line), no longer eliminates the indentation of the first line of the Hebrew paragraph in the typeset file. In GNU Emacs 25.1.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21 Version 10.9.5 (Build 13F1911))  of 2016-09-21 built on builder10-9.porkrind.org Windowing system distributor 'Apple', version 10.3.1504 Configured using:  'configure --with-ns '--enable-locallisppath=/Library/Application  Support/Emacs/${version}/site-lisp:/Library/Application Support/Emacs/site-lisp' --with-modules' Configured features: NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES Important settings: value of$LANG: en_US.UTF-8   locale-coding-system: utf-8-unix Major mode: Fundamental Minor modes in effect:   ivy-mode: t   shell-dirtrack-mode: t   projectile-mode: t   helm-descbinds-mode: t   async-bytecomp-package-mode: t   tooltip-mode: t   global-eldoc-mode: t   electric-indent-mode: t   mouse-wheel-mode: t   tool-bar-mode: t   menu-bar-mode: t   file-name-shadow-mode: t   global-font-lock-mode: t   blink-cursor-mode: t   auto-composition-mode: t   auto-encryption-mode: t   auto-compression-mode: t   buffer-read-only: t   column-number-mode: t   line-number-mode: t   transient-mark-mode: t Recent messages: ad-handle-definition: ‘ibuffer’ got redefined Turn on helm-projectile key bindings For information about GNU Emacs and the GNU system, type C-h C-a. Load-path shadows: /Users/itaiberli/.emacs.d/elpa/seq-2.20/seq hides /Applications/Emacs.app/Contents/Resources/lisp/emacs-lisp/seq Features: (shadow sort mail-extr emacsbug message rfc822 mml mml-sec epg mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mail-utils colir color counsel jka-compr esh-util etags xref project swiper reftex reftex-vars two-column ivy delsel ivy-overlay helm-projectile helm-files rx image-dired tramp tramp-compat tramp-loaddefs trampver shell pcomplete format-spec dired-x dired-aux ffap helm-tags helm-bookmark helm-adaptive helm-info bookmark pp helm-external helm-net browse-url xml url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util url-parse auth-source gnus-util mm-util help-fns mail-prsvr password-cache url-vars mailcap helm-buffers helm-grep helm-regexp helm-utils helm-locate helm-help helm-types projectile grep compile comint ansi-color ring ibuf-ext ibuffer thingatpt helm-descbinds helm easy-mmode helm-source cl-seq eieio-compat eieio eieio-core helm-multi-match helm-lib dired helm-config helm-easymenu cl-macs async-bytecomp async advice edmacro kmacro finder-inf tex-site info package epg-config seq byte-opt gv bytecomp byte-compile cl-extra help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel ns-win ucs-normalize term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote kqueue cocoa ns multi-tty make-network-process emacs) Memory information: ((conses 16 312045 13704)  (symbols 48 30403 0)  (miscs 40 88 192)  (strings 32 51754 11765)  (string-bytes 1 1669992)  (vectors 16 50218)  (vector-slots 8 844617 7052)  (floats 8 564 218)  (intervals 56 242 111)  (buffers 976 18))
## bug#27526: Explicit directionality marks CAN be inserted!

 I'd like to retract my statement I made in the LaTeX example that inserting explicit directionality marks doesn't work in Emacs. It does.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 In reply to this post by Itai Berli > From: Itai Berli <[hidden email]> > Date: Thu, 29 Jun 2017 12:16:00 +0300 > > I'll repeat: according to Unicode a paragraph ends with a paragraph > separator. What constitutes a paragraph separator is specified precisely > in section 5.8 'Newline Guidelines' of The Unicode Standard version > 8.0.0. For instance, on a MacOS X system, it is LF (line feed, > Unicode 000A). The formatting effects of the bidi algorithm must not > cross the paragraph separator boundary. > > And yet in Emacs the formatting extend beyond the paragraph separator, > and this is the case on all operating systems. Consider, for instance, > the following example. The UBA allows applications to employ "higher-level protocols" when deciding on base paragraph direction.  See section 4.3 in UAX#9 and specifically clause HL1 there. This is what Emacs does: it applies its own heuristics for this decision.  The reason for that is that Emacs's implementation of the UBA must work reasonably well in plain-text buffers, where typically long paragraphs are broken into lines by newline characters (which are paragraph separators according to the UBA), and many times the partition into lines is done by auto-fill or similar features, thus making the first character of the next line fairly arbitrary.  Using the UBA paragraph-direction determination would then produce unacceptable results, whereby the direction of a part of a paragraph could change in unpredictable ways when text is refilled. > Consider, for > instance, a LaTeX document for typesetting Hebrew > text. Normally in order to eliminate the usual leading indentation of > the first line of a paragraph, a \noinent command is placed at the > beginning of the paragraph. However, because the Unicode bidi algorithm > determins the directionality of a paragraph based on its first word, the > Hebrew text is formatted like English text. This is not a problem; it is > to be expected. The Emacs bidirectional display doesn't have special facilities for marked-up text, such as TeX and HTML/XML.  Because those markups use punctuation characters for their markup, doing so in RTL context can produce unpleasant results in the default display, as you point out. You can alleviate this to some extent by (in your case) starting the paragraph with an RLM control character before \noindent, optionally followed by an LRM or enclosing \noindent in LRE..PDF (so that the backslash displays to the left of "noindent").  This is admittedly a bit awkward, but I think the results are still acceptable. I will gladly work with anyone who'd volunteer to introduce features required to better support markup languages.  This will require low-level display changes and some support from the relevant major modes to use those features.  For now, the demand was sufficiently low (I think you are about the second person to raise the issue since bidirectional display debuted in Emacs 24.1) to keep this issue low on our TODO. > One way to resolve this is to explicitly change the directionality of the > paragraph, however, disregarding the fact that this is not currently > possible due to a separate Emacs bug, even if it were possible, it would > affect the placement of the backslash at the beginning of the > \noindent` command, which will no longer look like a LaTeX command. I think my suggestion above fixes this latter issue as well. Thanks.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 In reply to this post by Itai Berli > The UBA allows applications to employ "higher-level protocols" when > deciding on base paragraph direction.  See section 4.3 in UAX#9 and specifically clause HL1 there. > This is what Emacs does: it applies its own heuristics for this > decision.  The reason for that is that Emacs's implementation of the > UBA must work reasonably well in plain-text buffers, where typically > long paragraphs are broken into lines by newline characters (which are > paragraph separators according to the UBA), and many times the > partition into lines is done by auto-fill or similar features, thus > making the first character of the next line fairly arbitrary.  Using > the UBA paragraph-direction determination would then produce > unacceptable results, whereby the direction of a part of a paragraph > could change in unpredictable ways when text is refilled.  As I understand it, the "higher-level protocols" provision is intended  to allow for such things as table cells, elements of structured markup  languages, and word processors that use an idio-syncratic  implementation of a paragraph separator *under the hood*. It is not  intended for plain running text; for this the standard specifies  explicitly what the paragraph separators for every operating system  are. > typically long paragraphs are broken into lines by newline characters I see no evidence of the validity of this statement on my system (Emacs 25.1.1). But even if this were so, it would still not merit *hard-coding* the paragraph separator as a blank line, as there are situations (such as the one I presented in my bug report) that require a diffferent configuration. > You can alleviate this to some extent by ...(in your case) starting > the paragraph with an RLM control character before \noindent, > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so > that the backslash displays to the left of "noindent").  This is > admittedly a bit awkward, but I think the results are still acceptable. As you mentioned, the solution is cubersome. It might have been acceptable if this was the sole issue, but this example illustrates just one of several problems that arise due to current paragraph separator convention. In conclusion, and on a personal note, I implore you to change this behavior, and to do so as soon as possible, and not only for specialized markup documents, but for every document. I am currently working on my thesis. Emacs is useless to me as a text editor of Hebrew texts without this feature. This is no exaggeration. The original reason I chose Emacs over other editors was because of the combination of AUCTeX and the promise of full Unicode compatibility. AUCTeX has delivered on its promise, but in the area of Unicode, as far as my needs are concerned it is if there was no Unicode support at all, and I will be sadly forced to look for a different editor.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 I'd like to add another reason why this behavior is problematic: it breaks interoperability with other plain text editors, since the text will not be displayed the same way. Consider, for instance, the very same plain text filein GEdit: http://imgur.com/Iw4yrdQin Emacs: http://imgur.com/7kfWseEFinally, the question of whether Emacs behavior is consistent with the UBA specifications is debatable, since when UBA section 3 states "Paragraphs may also be determined by higher-level protocols" the question is what exactly the "also" means: is it that the higher-level protocols (HLP) can decide that a newline character is not a paragraph boundary, as Emacs does, or is it that the HLP can only declare paragraph boundaries in addition to paragraph separator characters?On Thu, Jun 29, 2017 at 9:36 PM, Itai Berli wrote:> The UBA allows applications to employ "higher-level protocols" when > deciding on base paragraph direction.  See section 4.3 in UAX#9 and specifically clause HL1 there. > This is what Emacs does: it applies its own heuristics for this > decision.  The reason for that is that Emacs's implementation of the > UBA must work reasonably well in plain-text buffers, where typically > long paragraphs are broken into lines by newline characters (which are > paragraph separators according to the UBA), and many times the > partition into lines is done by auto-fill or similar features, thus > making the first character of the next line fairly arbitrary.  Using > the UBA paragraph-direction determination would then produce > unacceptable results, whereby the direction of a part of a paragraph > could change in unpredictable ways when text is refilled.  As I understand it, the "higher-level protocols" provision is intended  to allow for such things as table cells, elements of structured markup  languages, and word processors that use an idio-syncratic  implementation of a paragraph separator *under the hood*. It is not  intended for plain running text; for this the standard specifies  explicitly what the paragraph separators for every operating system  are. > typically long paragraphs are broken into lines by newline characters I see no evidence of the validity of this statement on my system (Emacs 25.1.1). But even if this were so, it would still not merit *hard-coding* the paragraph separator as a blank line, as there are situations (such as the one I presented in my bug report) that require a diffferent configuration. > You can alleviate this to some extent by ...(in your case) starting > the paragraph with an RLM control character before \noindent, > optionally followed by an LRM or enclosing \noindent in LRE..PDF (so > that the backslash displays to the left of "noindent").  This is > admittedly a bit awkward, but I think the results are still acceptable. As you mentioned, the solution is cubersome. It might have been acceptable if this was the sole issue, but this example illustrates just one of several problems that arise due to current paragraph separator convention. In conclusion, and on a personal note, I implore you to change this behavior, and to do so as soon as possible, and not only for specialized markup documents, but for every document. I am currently working on my thesis. Emacs is useless to me as a text editor of Hebrew texts without this feature. This is no exaggeration. The original reason I chose Emacs over other editors was because of the combination of AUCTeX and the promise of full Unicode compatibility. AUCTeX has delivered on its promise, but in the area of Unicode, as far as my needs are concerned it is if there was no Unicode support at all, and I will be sadly forced to look for a different editor.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 13:42:19 +0300 > > I'd like to add another reason why this behavior is problematic: it breaks interoperability with other plain text > editors, since the text will not be displayed the same way. Consider, for instance, the very same plain text file > in GEdit: http://imgur.com/Iw4yrdQ> in Emacs: http://imgur.com/7kfWseEAs I already explained, the behavior of GEdit is unacceptable for Emacs, because most modes derived from Text mode tend to deal with buffers where lines are broken by newlines, so potentially switching paragraph direction just because a newline happens to be there would have devastating effect on the text as displayed.  This is perhaps in contrast with other editors and word-processors which mostly deal with long lines without hard newlines.  That's why the notion of paragraph in Emacs's UBA implementation was chosen to fit the traditional Emacs definition of paragraph in text-mode and its derivatives. > Finally, the question of whether Emacs behavior is consistent with the UBA specifications is debatable, since > when UBA section 3 states "Paragraphs may also be determined by higher-level protocols" the question is > what exactly the "also" means: is it that the higher-level protocols (HLP) can decide that a newline character is > not a paragraph boundary, as Emacs does, or is it that the HLP can only declare paragraph boundaries in > addition to paragraph separator characters? It is clear from the context and the example following the above sentence that "also" doesn't mean "in addition". However, the main issue is not the paragraph boundary, the main issue is how the base direction of the paragraph is determined.  Because no matter where the paragraph boundary is, if the base direction is not recalculated there, then the fact that the boundary is there doesn't matter. From Section 4.3 Higher-Level Protocols of the UAX#9:   HL1. Override P3, and set the paragraph embedding level        explicitly. This does not apply when deciding how to treat FSI        in rule X5c.        . A higher-level protocol may set any paragraph level. This can         be done on the basis of the context, such as on a table cell,         paragraph, document, or system level. (P2 may be skipped if         P3 is overridden). [...]        . A higher-level protocol may apply rules equivalent to P2 and         P3 but default to level 1 (RTL) rather than 0 (LTR) to match         overall RTL context.        . A higher-level protocol may use an entirely different         algorithm that heuristically auto-detects the paragraph         embedding level based on the paragraph text and its         context. For example, it could base it on whether there are         more RTL characters in the text than LTR. As another example,         when the paragraph contains no strong characters, its         direction could be determined by the levels of the paragraphs         before and after. And Section 3.3.1, which describes the P1, P2, and P3 paragraph-level rules, says:   Whenever a higher-level protocol specifies the paragraph level,   rules P2 and P3 may be overridden: see HL1. So an application is allowed to override _all_ of the paragraph-level rules, and do what suits it best.  And based on some non-negligible experience with bidi-aware applications, I submit that an application that does _not_ employ some higher-level protocol for base paragraph direction will violate user expectations when working with plain text. E.g., try reading in MS Outlook an unformatted text message which has a lot of RTL text mixed with LTR.  It's unreadable; I always copy/paste it into Emacs, and only then I'm able to read it.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 18:57:33 +0300 > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the > newline character as the paragraph separator or an empty line? Could be.  I'd need to carefully review the code to say for sure. Originally, the regexp which defines where paragraph begins was customizable, but it led to grave bugs, so I removed that.  Maybe a more restricted facility could avoid such pitfalls.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 If you can do it, that'll be fantastic. And while you're perusing the code, perhaps you can see if it is also possible to allow the user to decide whether they want the bidi control characters to be visible or notOn Tue, Jul 4, 2017 at 7:18 PM, Eli Zaretskii wrote:> From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 18:57:33 +0300 > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the > newline character as the paragraph separator or an empty line? Could be.  I'd need to carefully review the code to say for sure. Originally, the regexp which defines where paragraph begins was customizable, but it led to grave bugs, so I removed that.  Maybe a more restricted facility could avoid such pitfalls.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 19:37:04 +0300 > > And while you're perusing the code, perhaps you can see if it is also > possible to allow the user to decide whether they want the bidi control characters to be visible or not You can do that already: just customize glyphless-char-display-control to be 'zero-width' for the 'format-control' class, and these characters will become invisible.  Didn't I mention that up-thread?
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 You did, but it would be much nicer for a noob like me to be able to simply type in my .emacs file something like: (bidi.markers.visible false), or maybe even(bidi.markers.ALM null)(bidi.markers.RLM ⊲)(bidi.markers.LRM ⊳)...Isn't the Bidi feature important and complicated enough to merit its own tailored set of customizable parameters?On Tue, Jul 4, 2017 at 7:47 PM, Eli Zaretskii wrote:> From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 19:37:04 +0300 > > And while you're perusing the code, perhaps you can see if it is also > possible to allow the user to decide whether they want the bidi control characters to be visible or not You can do that already: just customize glyphless-char-display-control to be 'zero-width' for the 'format-control' class, and these characters will become invisible.  Didn't I mention that up-thread?
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 20:01:25 +0300 > > You did, but it would be much nicer for a noob like me to be able to simply type in my .emacs file something > like: (bidi.markers.visible false), or maybe even > > (bidi.markers.ALM null) > (bidi.markers.RLM ⊲) > (bidi.markers.LRM ⊳) Sorry, I don't see why the exact way how to customize this is so important.  glyphless-char-display-control is a user-level customizable variable, not some obscure feature that requires Lisp programming to tailor it to your needs. > Isn't the Bidi feature important and complicated enough to merit its own tailored set of customizable > parameters? It does have its private customizations, but this one isn't one of them, I don't see why it should be.  The characters of the Cf general category are quite a few, and Emacs handled them all the same, because they all have the same nature.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 Is there any progress with allowing the user to customize the end-of-paragraph mark to be the OS paragraph separator character?On Tue, Jul 4, 2017 at 8:46 PM, Eli Zaretskii wrote:> From: Itai Berli <[hidden email]> > Date: Tue, 4 Jul 2017 20:01:25 +0300 > > You did, but it would be much nicer for a noob like me to be able to simply type in my .emacs file something > like: (bidi.markers.visible false), or maybe even > > (bidi.markers.ALM null) > (bidi.markers.RLM ⊲) > (bidi.markers.LRM ⊳) Sorry, I don't see why the exact way how to customize this is so important.  glyphless-char-display-control is a user-level customizable variable, not some obscure feature that requires Lisp programming to tailor it to your needs. > Isn't the Bidi feature important and complicated enough to merit its own tailored set of customizable > parameters? It does have its private customizations, but this one isn't one of them, I don't see why it should be.  The characters of the Cf general category are quite a few, and Emacs handled them all the same, because they all have the same nature.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Wed, 12 Jul 2017 18:10:19 +0300 > > Is there any progress with allowing the user to customize the end-of-paragraph mark to be the OS paragraph > separator character? No, I didn't yet have time to work on that.  (And I think you were talking about the newline character, not the paragraph separator character.)
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > I think you were talking about the newline character, not the paragraph separator character.On UNIX and contemporary macOS it's U+000A (LF), on Windows it's the sequence U+000D U+000A (CR LF).On Wed, Jul 12, 2017 at 6:36 PM, Eli Zaretskii wrote:> From: Itai Berli <[hidden email]> > Date: Wed, 12 Jul 2017 18:10:19 +0300 > > Is there any progress with allowing the user to customize the end-of-paragraph mark to be the OS paragraph > separator character? No, I didn't yet have time to work on that.  (And I think you were talking about the newline character, not the paragraph separator character.)
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 > From: Itai Berli <[hidden email]> > Date: Wed, 12 Jul 2017 18:52:10 +0300 > > > I think you were talking about the newline character, not the paragraph separator character. > > On UNIX and contemporary macOS it's U+000A (LF), on Windows it's the sequence U+000D U+000A (CR > LF). Not in the Emacs buffer: there we have only the newline (a.k.a. "LF") characters.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 In reply to this post by Eli Zaretskii > Date: Tue, 04 Jul 2017 19:18:39 +0300 > From: Eli Zaretskii <[hidden email]> > Cc: [hidden email] > > > From: Itai Berli <[hidden email]> > > Date: Tue, 4 Jul 2017 18:57:33 +0300 > > > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a > > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the > > newline character as the paragraph separator or an empty line? > > Could be.  I'd need to carefully review the code to say for sure. > Originally, the regexp which defines where paragraph begins was > customizable, but it led to grave bugs, so I removed that.  Maybe a > more restricted facility could avoid such pitfalls. It turned out to be relatively easy, so I implemented this on the master branch of the Emacs Git repository.  There are two new variables that you should set to "^" to get the behavior you wanted. I hope you can build the master branch and see whether the new facilities solve your case. Thanks.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 Thanks. I've never built Emacs from source. I think it might be easier for me to wait till this patch makes it to the official release.On Mon, Jul 17, 2017 at 5:54 PM, Eli Zaretskii wrote:> Date: Tue, 04 Jul 2017 19:18:39 +0300 > From: Eli Zaretskii <[hidden email]> > Cc: [hidden email] > > > From: Itai Berli <[hidden email]> > > Date: Tue, 4 Jul 2017 18:57:33 +0300 > > > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a > > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the > > newline character as the paragraph separator or an empty line? > > Could be.  I'd need to carefully review the code to say for sure. > Originally, the regexp which defines where paragraph begins was > customizable, but it led to grave bugs, so I removed that.  Maybe a > more restricted facility could avoid such pitfalls. It turned out to be relatively easy, so I implemented this on the master branch of the Emacs Git repository.  There are two new variables that you should set to "^" to get the behavior you wanted. I hope you can build the master branch and see whether the new facilities solve your case. Thanks.
## bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator

 On Jul 18, 2017, at 0:16, Itai Berli <[hidden email]> wrote:Thanks. I've never built Emacs from source. I think it might be easier for me to wait till this patch makes it to the official release.It's actually pretty easy to build from source. The easiest way (that depends on your platform) is to install the version that corresponds to HEAD. The slightly less trivial way is toget the code from Savannah:https://savannah.gnu.org/projects/emacsclone the code and follow the instructions.I got used to doing that a few weeks ago and it is fascinating to see all the new features pouring in everyday.Jean-ChristopheOn Mon, Jul 17, 2017 at 5:54 PM, Eli Zaretskii wrote:> Date: Tue, 04 Jul 2017 19:18:39 +0300 > From: Eli Zaretskii <[hidden email]> > Cc: [hidden email] > > > From: Itai Berli <[hidden email]> > > Date: Tue, 4 Jul 2017 18:57:33 +0300 > > > > How about letting the user decide what's best for them? Would it be possible to add an option to Emacs that a > > user can set, say, in their .emacs file, which will determine whether the bidi imiplementation will consider the > > newline character as the paragraph separator or an empty line? > > Could be.  I'd need to carefully review the code to say for sure. > Originally, the regexp which defines where paragraph begins was > customizable, but it led to grave bugs, so I removed that.  Maybe a > more restricted facility could avoid such pitfalls. It turned out to be relatively easy, so I implemented this on the master branch of the Emacs Git repository.  There are two new variables that you should set to "^" to get the behavior you wanted. I hope you can build the master branch and see whether the new facilities solve your case. Thanks.