compute ISBN-10, char-to-int?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

compute ISBN-10, char-to-int?

Emacs - Help mailing list
Another quality release from
uXu and THE SECRET EMPIRE, this time there sure
was a long distance before the earlier, if
anyone remembers that one... hm... anyway
perhaps _the next one_ will be ISBN-13?
Or wasn't that NASA:s bad luck number?
Typing this in 2017, when the US astronauts,
since abandoning their space shuttle project,
have been reduced to mere *passengers* onboard
Russian Soyuz crafts? You know what I'm saying?

Anyway, questions:

that `char-to-int' looks a bit strange...

?

;;; -*- lexical-binding: t -*-

;; This file: http://user.it.uu.se/~embe8573/emacs-init/isbn-new.el
;;            https://dataswamp.org/~incal/emacs-init/isbn-new.el

;; Old ISBN stuff, partially still in use: (?)
;;   https://dataswamp.org/~incal/emacs-init/isbn.el

;; NOTE: This isn't a replacement of the old
;;       stuff URLd above, this is an all new
;;       little project to compute ISBN
;;       checksums with Elisp!

;; here is how the ISBN-10 stuff works:
;;   https://dataswamp.org/~incal/books/isbn.txt

(require 'cl-lib)

(defun char-to-int (c)
  (string-to-number (char-to-string c) ))
;; test:
;; (char-to-int ?0)

(defun checksum-isbn-10 (isbn)
  (let*((isbn-list      (string-to-list isbn))
        (isbn-numbers   (remove ?- isbn-list))
        (isbn-numbers-9 (cl-subseq isbn-numbers 0 9))
        (isbn-ints      (cl-map 'list
                                (lambda (e) (char-to-int e))
                                isbn-numbers-9) )
        (sum          0)
        )
    (cl-loop for e in isbn-ints
             for i downfrom 10
             do (cl-incf sum (* e i)) )
    (let ((checksum (mod (- 11 (mod sum 11)) 11)))
      (if (= 10 checksum) "X" checksum) )))


;; 9 test from [1]:
;;
;; (checksum-isbn-10 "91-7054-940-0")  ; 0 (#o0, #x0, ?\C-@)
;; (checksum-isbn-10 "0-201-53992-6")  ; 6 (#o6, #x6, ?\C-f)
;; (checksum-isbn-10 "91-85668-01-X")  ; "X"
;; (checksum-isbn-10 "91-7089-710-7")  ; 7 (#o7, #x7, ?\C-g)
;; (checksum-isbn-10 "9177988515")     ; 5 (#o5, #x5, ?\C-e)
;; (checksum-isbn-10 "0312168144")     ; 4 (#o4, #x4, ?\C-d)
;; (checksum-isbn-10 "1-4012-0622-0")  ; 0 (#o0, #x0, ?\C-@)
;; (checksum-isbn-10 "91-510-6483-9")  ; 9 (#o11, #x9, ?\C-i)
;; (checksum-isbn-10 "91-88930-23-8")  ; 8 (#o10, #x8, ?\C-h)
;;
;;
;; [1] https://dataswamp.org/~incal/books/books.bib

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Tomas Zerolo
On Thu, Sep 05, 2019 at 03:32:55AM +0200, Emanuel Berg via Users list for the GNU Emacs text editor wrote:
> Another quality release from
> uXu and THE SECRET EMPIRE [...]

> Anyway, questions:
>
> that `char-to-int' looks a bit strange...

[...]

> (defun char-to-int (c)
>   (string-to-number (char-to-string c) ))

It looks a bit roundabout, sure. But how would you do it without
making any assumptions about the underlying encoding?

If you have no scruples, and since chars in Emacs Lisp are simply
integers, and since encoding is almost-UTF-8 which is basically
ASCII, you could do

(defun char-to-int (c)
  (- c ?0))

but...

  - it's ugly
  - I don't know if that addresses your question

It would be faster, yes. But that will start to count once we
have ISBN-4294967296 or something. By then, you'll have a faster
computer, too ;-)

Cheers
-- t

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Emacs - Help mailing list
tomas wrote:

> It looks a bit roundabout, sure. But how
> would you do it without making any
> assumptions about the underlying encoding?
>
> If you have no scruples, and since chars in
> Emacs Lisp are simply integers, and since
> encoding is almost-UTF-8 which is basically
> ASCII, you could do
>
> (defun char-to-int (c)
>   (- c ?0))
>
> but...
>
>   - it's ugly

Why, I think it's great! New version:
  <https://dataswamp.org/~incal/emacs-init/isbn-new.el>

>   - I don't know if that addresses your
>     question

Me neither... wait, what was my
question exactly?

Yeah, why isn't there a "char-to-int" in
vanilla Emacs already? Should be a pretty
standard thing, right?

> It would be faster, yes. But that will start
> to count once we have ISBN-4294967296 or
> something. By then, you'll have a faster
> computer, too ;-)

... I'm not following? :)

That isn't a valid ISBN-10, the checksum
(check digit) is 3:

  (mod (- 11 (mod (+ (* 4 10)
                     (* 2  9)
                     (* 9  8)
                     (* 4  7)
                     (* 9  6)
                     (* 6  5)
                     (* 7  4)
                     (* 2  3)
                     (* 9  2))
                  11)) 11) ; 3

  (checksum-isbn-10 "4294967296") ; 3

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Eli Zaretskii
In reply to this post by Tomas Zerolo
> Date: Thu, 5 Sep 2019 08:44:37 +0200
> From: <[hidden email]>
>
> [...] since chars in Emacs Lisp are simply integers, and since
> encoding is almost-UTF-8 which is basically ASCII [...]

Actually, encoding is not relevant here.  We are not talking about how
characters are stored in buffers and strings, we are talking about the
characters themselves.  A character is represented by an integer whose
value is that character's Unicode codepoint.

Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Tomas Zerolo
On Thu, Sep 05, 2019 at 09:08:55PM +0300, Eli Zaretskii wrote:

> > Date: Thu, 5 Sep 2019 08:44:37 +0200
> > From: <[hidden email]>
> >
> > [...] since chars in Emacs Lisp are simply integers, and since
> > encoding is almost-UTF-8 which is basically ASCII [...]
>
> Actually, encoding is not relevant here.  We are not talking about how
> characters are stored in buffers and strings, we are talking about the
> characters themselves.  A character is represented by an integer whose
> value is that character's Unicode codepoint.
Well, if it's always Unicode code point, then we can make the above
"official".

Thanks, Eli
-- t

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Emacs - Help mailing list
tomas wrote:

> Well, if it's always Unicode code point, then
> we can make the above "official".

Am I normal or orthogonal? I still don't
understand this isn't already in Emacs.
Perhaps in some prominent ELPA library that one
should have by now but hasn't, and everybody
else has it, like in school but with
bubble gum instead?

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Eli Zaretskii
> Date: Thu, 05 Sep 2019 21:16:06 +0200
> From: Emanuel Berg via Users list for the GNU Emacs text editor <[hidden email]>
>
> I still don't understand this isn't already in Emacs.

IMO, it isn't important enough to be in Emacs.  That one particular
application needs it is not yet a sign it should be in core.

Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Tomas Zerolo
On Fri, Sep 06, 2019 at 09:58:28AM +0300, Eli Zaretskii wrote:
> > Date: Thu, 05 Sep 2019 21:16:06 +0200
> > From: Emanuel Berg via Users list for the GNU Emacs text editor <[hidden email]>
> >
> > I still don't understand this isn't already in Emacs.
>
> IMO, it isn't important enough to be in Emacs.  That one particular
> application needs it is not yet a sign it should be in core.

And -- hey. It's a (short) one-liner, after all!

Cheers
-- t

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Emacs - Help mailing list
tomas wrote:

> And -- hey. It's a (short) one-liner,
> after all!

Conversions between types and representations
are the building blocks of the universe.

Might as well have to spill milk over your
chemistry book so to have h2o in it.

Milk is 87% water:

   https://www.dairyherd.com/article/whole-lot-water-goes-milk

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Eli Zaretskii
> Date: Fri, 06 Sep 2019 17:07:26 +0200
> From: Emanuel Berg via Users list for the GNU Emacs text editor <[hidden email]>
>
> Conversions between types and representations
> are the building blocks of the universe.

But what we are discussing here is not a conversion.  Characters in
Emacs _are_ integers.

Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Stefan Monnier
In reply to this post by Tomas Zerolo
>> > [...] since chars in Emacs Lisp are simply integers, and since
>> > encoding is almost-UTF-8 which is basically ASCII [...]
>> Actually, encoding is not relevant here.  We are not talking about how
>> characters are stored in buffers and strings, we are talking about the
>> characters themselves.  A character is represented by an integer whose
>> value is that character's Unicode codepoint.
> Well, if it's always Unicode code point, then we can make the above
> "official".

(- c ?0) and (+ n ?0) work not just with Unicode code points, but with
code points in any character set that is sane enough to put the digits
from 0 to 9 consecutively in this order.  That's the case in ASCII,
EBCDIC, and all other charsets I know.

I think the main threat could come from a charset where digits appear
multiple times (e.g. some kind of iso-2022 system where some of the
sub-charsets also include digits), in which case (+ n ?0) would still
work but (- c ?0) could occasionally fail.  But, I don't know of such
a charset either.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Date: Sat, 07 Sep 2019 13:13:45 -0400
>
> I think the main threat could come from a charset where digits appear
> multiple times (e.g. some kind of iso-2022 system where some of the
> sub-charsets also include digits), in which case (+ n ?0) would still
> work but (- c ?0) could occasionally fail.  But, I don't know of such
> a charset either.

All true, but not really relevant to the issue at hand, because in
Emacs characters are always Unicode codepoints.

Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Stefan Monnier
> All true, but not really relevant to the issue at hand, because in
> Emacs characters are always Unicode codepoints.

That hasn't always been the case, tho: they started as basically ASCII,
and then switched to some iso-2022 system (in Emacs-20) before getting
to the current unicode (in Emacs-23).

I must admit that it seems highly unlikely it will change in the
foreseeable future, but it's always a possibility (Unicode has its
shortcomings, so it's possible that we'll be using something else next
century).


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Eli Zaretskii
> From: Stefan Monnier <[hidden email]>
> Date: Sat, 07 Sep 2019 13:45:35 -0400
>
> I must admit that it seems highly unlikely it will change in the
> foreseeable future, but it's always a possibility

More than highly unlikely, I'd say.

Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Stefan Monnier
>> I must admit that it seems highly unlikely it will change in the
>> foreseeable future, but it's always a possibility
> More than highly unlikely, I'd say.

Agreed.  But even for those paranoid enough to worry about that,
(+ n ?0) and (- c ?0) should be safe ways to convert between a digit
character and its numerical value.


        Stefan


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Emacs - Help mailing list
In reply to this post by Stefan Monnier
Stefan Monnier wrote:

> (- c ?0) and (+ n ?0) work not just with
> Unicode code points, but with code points in
> any character set that is sane enough to put
> the digits from 0 to 9 consecutively in this
> order. That's the case in ASCII, EBCDIC, and
> all other charsets I know.

EBCDIC = Extended Binary-Coded Decimal Interchange Code

http://www.barrcentral.com/help/3270/appendix_b._ascii_and_ebcdic_tables.htm?sa=X&ved=2ahUKEwiTptvPpcTkAhWqpIsKHVtRCRgQ9QF6BAgLEAI
   
Seems to have something to do with IBM...

--
underground experts united
http://user.it.uu.se/~embe8573
https://dataswamp.org/~incal


Reply | Threaded
Open this post in threaded view
|

Re: compute ISBN-10, char-to-int?

Perry Smith-2


> On Sep 9, 2019, at 12:47 PM, Emanuel Berg via Users list for the GNU Emacs text editor <[hidden email]> wrote:
>
> Stefan Monnier wrote:
>
>> (- c ?0) and (+ n ?0) work not just with
>> Unicode code points, but with code points in
>> any character set that is sane enough to put
>> the digits from 0 to 9 consecutively in this
>> order. That's the case in ASCII, EBCDIC, and
>> all other charsets I know.
>
> EBCDIC = Extended Binary-Coded Decimal Interchange Code
>
> http://www.barrcentral.com/help/3270/appendix_b._ascii_and_ebcdic_tables.htm?sa=X&ved=2ahUKEwiTptvPpcTkAhWqpIsKHVtRCRgQ9QF6BAgLEAI
>
> Seems to have something to do with IBM...
IBM has the concept of “code pages”… I think that concept is not unique to IBM.  Examples would be all of the ISO-8859-nn code pages where there is one (more or less) for each country or region.  (And then you get into terminal specific code pages and all sorts of fun mental illnesses).

I wrote a web app talking to an old IBM system and to my shock… EBCDIC also has code pages — roughly one per region or country.


signature.asc (849 bytes) Download Attachment