bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

Eli Zaretskii
> From: handa <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Sun, 28 Mar 2021 23:29:41 +0900
>
> > In any case, the problem is not with encoding, the problem is with
> > decoding.  Encoding doesn't have this problem because we always encode
> > more than enough (we use the value of BYTE as the count of
> > _characters_ to encode, so for ISO-2022 encoding it is usually much
> > more than needed).  By contrast, when decoding, we decode exactly
> > BYTE+1 bytes, which then hits the problem if that offset is inside a
> > shift sequence.
>
> Then, that implementation should be changed.
>
> Any coding system can have :post-read-conversion and
> :pre-write-conversion functions, it is not guaranteed that encoded byte
> length is greater than the number of characters.

Agreed, but AFAICT, ISO-2022-JP doesn't have any of these attributes,
right?



Reply | Threaded
Open this post in threaded view
|

bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

handa
In article <[hidden email]>, Eli Zaretskii <[hidden email]> writes:

> > Any coding system can have :post-read-conversion and
> > :pre-write-conversion functions, it is not guaranteed that encoded byte
> > length is greater than the number of characters.

> Agreed, but AFAICT, ISO-2022-JP doesn't have any of these attributes,
> right?

Yes, but one can add them by coding-system-put.

By the way, what is the intention of filepos-to-bufferpos?  Why that
function was introduce?

---
K. Handa
[hidden email]



Reply | Threaded
Open this post in threaded view
|

bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

Eli Zaretskii
> From: handa <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Fri, 02 Apr 2021 00:14:02 +0900
>
> In article <[hidden email]>, Eli Zaretskii <[hidden email]> writes:
>
> > > Any coding system can have :post-read-conversion and
> > > :pre-write-conversion functions, it is not guaranteed that encoded byte
> > > length is greater than the number of characters.
>
> > Agreed, but AFAICT, ISO-2022-JP doesn't have any of these attributes,
> > right?
>
> Yes, but one can add them by coding-system-put.

Leaving the :pre-write/:post-read-conversion use case aside, do we
have some means of find where ISO-2022 shift-in/out sequence begins
and ends, so that we never try to decode a partial sequence (and
produce "characters" that are not really in the original buffer)?
If not, where can I find the description of every kind of such
sequences, i.e. sequences that modify the decoder state without
producing any characters?

(UTF-8 has the same issue, btw, but in that case we have a simpler
solution.)