LISPUSER

LISPMEMOLisp is a programmable programming language. -- John Foderaro

(top)  (memo)  (rss)

浮動小数点演算 と Consing

某掲示板で,浮動小数点演算時に consing してしまって効率が悪いという話が でていたので,手元の環境でちょっと試してみました.SBCL 0.9.9.31, LispWorks for Windows 4.4.6 では consing なしの演算が可能でした. CLISP では現在 consing なしの浮動小数点演算には成功していません.

(defun float-test ()
  (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)
            #+lispworks (float 0)))
  (let ((a  #(1.0d0 2.0d0 3.0d0 4.0d0 5.0d0))
        (p1 #(1 2 3))
        (p2 #(1 2 3 1 2 3)))
    (declare (type simple-vector a))
    (declare (type simple-vector p1 p2))
    #+lispworks (time (lw-float-sum a p1))
    #+lispworks (time (lw-float-sum a p2))
    #+sbcl (time (sbcl-float-sum a p1))
    #+sbcl (time (sbcl-float-sum a p2))
    #+clisp (time (clisp-float-sum a p1))
    #+clisp (time (clisp-float-sum a p2))
    ))

(defun sbcl-float-sum (array positions)
  (declare (type simple-vector array positions))
  (loop for n of-type fixnum across positions
       summing (aref array n) of-type double-float))

(defun lw-float-sum (array positions)
  (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)
                      #+lispworks (float 0)))
  (declare (type simple-vector array positions))
  ;; humm ?
  (loop for n of-type fixnum across positions
        summing (the double-float (svref array (the fixnum n)))
                of-type double-float))

(defun clisp-float-sum (array positions)
  (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)))
  (declare (type simple-vector array positions))
  (loop for n of-type fixnum across positions
        summing (the double-float (svref array (the fixnum n)))
                of-type double-float))

この中で LispWorks の LOOP マクロの summing 節に対する型指定 of-type double-float で結果がかわってきたのでちょっと調べてみました.

まず SBCL での LOOP マクロの展開してみます.

CL-USER> (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
                                       summing (aref array n) of-type double-float)))

(BLOCK NIL
  (LET ((N 0)
        (#:LOOP-ACROSS-VECTOR-4944 POSITIONS)
        (#:LOOP-ACROSS-INDEX-4945 0)
        (#:LOOP-ACROSS-LIMIT-4946 0))
    (DECLARE (TYPE FIXNUM #:LOOP-ACROSS-LIMIT-4946)
             (TYPE FIXNUM #:LOOP-ACROSS-INDEX-4945)
             (TYPE (OR (MEMBER NIL) VECTOR) #:LOOP-ACROSS-VECTOR-4944)
             (TYPE FIXNUM N))
    (LET ((#:LOOP-SUM-4947 0.0d0))
      (DECLARE (TYPE DOUBLE-FLOAT #:LOOP-SUM-4947))
      (SB-LOOP::LOOP-BODY
       ((SETQ #:LOOP-ACROSS-LIMIT-4946
                (LENGTH #:LOOP-ACROSS-VECTOR-4944)))
       ((WHEN (>= #:LOOP-ACROSS-INDEX-4945 #:LOOP-ACROSS-LIMIT-4946)
          (GO SB-LOOP::END-LOOP))
        (SB-LOOP::LOOP-REALLY-DESETQ N
                                     (AREF #:LOOP-ACROSS-VECTOR-4944
                                           #:LOOP-ACROSS-INDEX-4945))
        NIL
        (SB-LOOP::LOOP-REALLY-DESETQ #:LOOP-ACROSS-INDEX-4945
                                     (1+ #:LOOP-ACROSS-INDEX-4945)))
       ((SETQ #:LOOP-SUM-4947 (+ #:LOOP-SUM-4947 (AREF ARRAY N))))
       ((WHEN (>= #:LOOP-ACROSS-INDEX-4945 #:LOOP-ACROSS-LIMIT-4946)
          (GO SB-LOOP::END-LOOP))
        (SB-LOOP::LOOP-REALLY-DESETQ N
                                     (AREF #:LOOP-ACROSS-VECTOR-4944
                                           #:LOOP-ACROSS-INDEX-4945))
        NIL
        (SB-LOOP::LOOP-REALLY-DESETQ #:LOOP-ACROSS-INDEX-4945
                                     (1+ #:LOOP-ACROSS-INDEX-4945)))
       ((RETURN-FROM NIL #:LOOP-SUM-4947))))))
; No value
CL-USER>

つづいて LispWorks の LOOP マクロの展開結果 ()

CL-USER 7 > (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
                                          summing (svref array (the fixnum n)) of-type double-float)))

(BLOCK NIL
  (MACROLET ((LOOP-FINISH () '(GO #:|end-loop-48049|)))
    (LET ((#:|across-expr-48052| POSITIONS) (#:|across-length-48053| 0) (#:|across-counter-48054| 0) (N 0))
      (DECLARE (TYPE VECTOR #:|across-expr-48052|))
      (DECLARE (TYPE FIXNUM #:|across-length-48053|))
      (DECLARE (TYPE FIXNUM #:|across-counter-48054|))
      (DECLARE (TYPE FIXNUM N))
      (LET ((#:|accumulator-48050| (LOAD-TIME-VALUE '0.0)))
        (DECLARE (TYPE DOUBLE-FLOAT #:|accumulator-48050|))
        (TAGBODY (PROGN
                   (LET ((#:|temp-48055| (SYSTEM::ACROSS-CHECK-IS-VECTOR #:|across-expr-48052|)))
                     (SETQ #:|across-length-48053| #:|temp-48055|))
                   (WHEN (OR (= 0 #:|across-length-48053|)) (GO #:|end-loop-48049|))
                   (SETQ N (AREF #:|across-expr-48052| 0)))
         #:|begin-loop-48048| (INCF #:|accumulator-48050| (SVREF ARRAY (THE FIXNUM N)))
                 (PROGN
                   (LET ((#:|temp-48056| (THE FIXNUM (1+ #:|across-counter-48054|))))
                     (SETQ #:|across-counter-48054| #:|temp-48056|))
                   (WHEN (OR (= #:|across-counter-48054| #:|across-length-48053|)) (GO #:|end-loop-48049|))
                   (SETQ N (AREF #:|across-expr-48052| #:|across-counter-48054|)))
                 (GO #:|begin-loop-48048|)
         #:|end-loop-48049| (RETURN-FROM NIL #:|accumulator-48050|))))))

CL-USER 8 > (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
                                          summing (the double-float (svref array (the fixnum n))) of-type double-float)))

(BLOCK NIL
  (MACROLET ((LOOP-FINISH () '(GO #:|end-loop-48059|)))
    (LET ((#:|across-expr-48062| POSITIONS) (#:|across-length-48063| 0) (#:|across-counter-48064| 0) (N 0))
      (DECLARE (TYPE VECTOR #:|across-expr-48062|))
      (DECLARE (TYPE FIXNUM #:|across-length-48063|))
      (DECLARE (TYPE FIXNUM #:|across-counter-48064|))
      (DECLARE (TYPE FIXNUM N))
      (LET ((#:|accumulator-48060| (LOAD-TIME-VALUE '0.0)))
        (DECLARE (TYPE DOUBLE-FLOAT #:|accumulator-48060|))
        (TAGBODY (PROGN
                   (LET ((#:|temp-48065| (SYSTEM::ACROSS-CHECK-IS-VECTOR #:|across-expr-48062|)))
                     (SETQ #:|across-length-48063| #:|temp-48065|))
                   (WHEN (OR (= 0 #:|across-length-48063|)) (GO #:|end-loop-48059|))
                   (SETQ N (AREF #:|across-expr-48062| 0)))
         #:|begin-loop-48058| (INCF #:|accumulator-48060| (THE DOUBLE-FLOAT (SVREF ARRAY (THE FIXNUM N))))
                 (PROGN
                   (LET ((#:|temp-48066| (THE FIXNUM (1+ #:|across-counter-48064|))))
                     (SETQ #:|across-counter-48064| #:|temp-48066|))
                   (WHEN (OR (= #:|across-counter-48064| #:|across-length-48063|)) (GO #:|end-loop-48059|))
                   (SETQ N (AREF #:|across-expr-48062| #:|across-counter-48064|)))
                 (GO #:|begin-loop-48058|)
         #:|end-loop-48059| (RETURN-FROM NIL #:|accumulator-48060|))))))

CL-USER 9 > 

前者と後者の違いは accumulator を足す時の式が (incf accumulator (svref array n)) に展開されるか (incf [accumulator] (the double-float (svrefarray n))) に展開されるか,の二点のみです.したがって,ここがメモリ消費 量の違いとなっていると考えられます.summing の of-type では計算結果は double-float だと指定されていますが,計算途中の式の型指定が不十分なため LispWorks が box しているのでしょう.確認のため,incf をする関数を定義して disassemble してみます.

CL-USER 33 > (defun f (x y) (declare (optimize (speed 3) (safety 0) (debug 0) (float 0))
               (double-float x))
               (incf x y))
F

CL-USER 34 > (disassemble *)
206969B2:
       0:      55               push  ebp
       1:      89E5             move  ebp, esp
       3:      83EC0C           sub   esp, C
       6:      C70424450C0000   move  [esp], C45
      13:      50               push  eax
      14:      8B7D08           move  edi, [ebp+8]
      17:      DD4704           fldl  [edi+4]
      20:      DD5DF8           fstpl [ebp-8]
      23:      83EC0C           sub   esp, C
      26:      C70424450C0000   move  [esp], C45
      33:      DD45F8           fldl  [ebp-8]
      36:      DD5C2404         fstpl [esp+4]
      40:      B501             moveb ch, 1
      42:      FF1528231120     call  [20112328]       ; SYSTEM::RAW-FAST-BOX-DOUBLE
      48:      8B4DF0           move  ecx, [ebp-10]
      51:      0AC8             orb   cl, al
      53:      753E             jne   L2
      55:      89C7             move  edi, eax
      57:      037DF0           add   edi, [ebp-10]
      60:      7037             jo    L2
L1:   62:      DD4704           fldl  [edi+4]
      65:      FF75F4           push  [ebp-C]
      68:      FF75F0           push  [ebp-10]
      71:      8B75F8           move  esi, [ebp-8]
      74:      8975F0           move  [ebp-10], esi
      77:      8B75FC           move  esi, [ebp-4]
      80:      8975F4           move  [ebp-C], esi
      83:      8B7500           move  esi, [ebp]
      86:      8975F8           move  [ebp-8], esi
      89:      83ED08           sub   ebp, 8
      92:      8B750C           move  esi, [ebp+C]
      95:      897504           move  [ebp+4], esi
      98:      DD5D0C           fstpl [ebp+C]
     101:      C74508450C0000   move  [ebp+8], C45
     108:      B501             moveb ch, 1
     110:      C9               leave 
     111:      FF2528231120     jmp   [20112328]       ; SYSTEM::RAW-FAST-BOX-DOUBLE
L2:  117:      50               push  eax
     118:      8B45F0           move  eax, [ebp-10]
     121:      E81AC13100       call  209B2B4A         ; #<function 209B2B4A>
     126:      89C7             move  edi, eax
     128:      EBBC             jmp   L1
NIL


CL-USER 35 > (defun f (x y)
               (declare (optimize (speed 3) (safety 0) (debug 0) (float 0)) (double-float x))
               (incf x (the double-float y)))
F

CL-USER 36 > (disassemble *)
2068401A:
       0:      55               push  ebp
       1:      89E5             move  ebp, esp
       3:      83EC14           sub   esp, 14
       6:      C7042445140000   move  [esp], 1445
      13:      8B7D08           move  edi, [ebp+8]
      16:      DD4704           fldl  [edi+4]
      19:      DD4004           fldl  [eax+4]
      22:      DEC1             faddp st(1), st
      24:      FF75F0           push  [ebp-10]
      27:      FF75EC           push  [ebp-14]
      30:      8B75F4           move  esi, [ebp-C]
      33:      8975EC           move  [ebp-14], esi
      36:      8B75F8           move  esi, [ebp-8]
      39:      8975F0           move  [ebp-10], esi
      42:      8B75FC           move  esi, [ebp-4]
      45:      8975F4           move  [ebp-C], esi
      48:      8B7500           move  esi, [ebp]
      51:      8975F8           move  [ebp-8], esi
      54:      83ED08           sub   ebp, 8
      57:      8B750C           move  esi, [ebp+C]
      60:      897504           move  [ebp+4], esi
      63:      DD5D0C           fstpl [ebp+C]
      66:      C74508450C0000   move  [ebp+8], C45
      73:      B501             moveb ch, 1
      75:      C9               leave 
      76:      FF2528231120     jmp   [20112328]       ; SYSTEM::RAW-FAST-BOX-DOUBLE
NIL

CL-USER 37 > 

このように型指定のありなしで box 処理の回数がかわっています.

ちなみに SBCL でも summing の後の the double-float 指定は入れたほうが処 理速度が速いです.コンパイラが文句いっているように + の第二引数の型が確 定していない場合には GENERIC-+ が呼ばれるため,メソッド呼び出しが入りま す.LispWorks と同様 the double-float 指定をしておけば,コンパイル時に インライン展開されてるため,関数呼び出しのコスト分お特です.ただし GENERIC-+ が consing しないため,今回の目標である consing なしの演算と いう今回の目標からははずれますが….

CL-USER> (defun f (x y)
           (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)) (type double-float x))
           (the double-float (+ x y)))
; in: LAMBDA NIL
;     (+ X Y)
; 
; note: unable to
;   optimize
; due to type uncertainty:
;   The second argument is a NUMBER, not a RATIONAL.
; 
; note: unable to
;   optimize
; due to type uncertainty:
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
; 
; note: unable to
;   optimize
; due to type uncertainty:
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
; 
; note: unable to
;   optimize
; due to type uncertainty:
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
; 
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
; 
; compilation unit finished
;   printed 5 notes
STYLE-WARNING: redefining F in DEFUN
F
CL-USER> (disassemble *)
; 0B23FB8E:       E8BD05DCF5       CALL #x1000150             ; GENERIC-+
                                                              ; no-arg-parsing entry point
;       93:       8BE3             MOV ESP, EBX
;       95:       8B4DF8           MOV ECX, [EBP-8]
;       98:       8B45FC           MOV EAX, [EBP-4]
;       9B:       83C102           ADD ECX, 2
;       9E:       8BE5             MOV ESP, EBP
;       A0:       8BE8             MOV EBP, EAX
;       A2:       FFE1             JMP ECX
;       A4:       90               NOP
;       A5:       90               NOP
;       A6:       90               NOP
;       A7:       90               NOP
;       A8:       CC0A             BREAK 10                   ; error trap
;       AA:       02               BYTE #X02
;       AB:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
;       AC:       CD               BYTE #XCD                  ; EBX
; 
NIL
CL-USER> (defun f (x y)
           (declare (optimize (speed 3) (space 0) (safety 0) (debug 0)) (type double-float x))
           (the double-float (+ x (the double-float y))))
; in: LAMBDA NIL
;     (SB-INT:NAMED-LAMBDA F
;                        (X Y)
;                        (DECLARE
;                         (OPTIMIZE (SPEED 3) (SPACE 0) (SAFETY 0) (DEBUG 0))
;                         (TYPE DOUBLE-FLOAT X))
;                        (BLOCK F (THE DOUBLE-FLOAT (+ X (THE DOUBLE-FLOAT Y)))))
; ==>
;   #'(SB-INT:NAMED-LAMBDA F
;                          (X Y)
;                          (DECLARE
;                           (OPTIMIZE (SPEED 3) (SPACE 0) (SAFETY 0) (DEBUG 0))
;                           (TYPE DOUBLE-FLOAT X))
;                          (BLOCK F
;                            (THE DOUBLE-FLOAT (+ X (THE DOUBLE-FLOAT Y)))))
; 
; note: doing float to pointer coercion (cost 13) to "<return value>"
; 
; compilation unit finished
;   printed 1 note
STYLE-WARNING: redefining F in DEFUN
F
CL-USER> (disassemble *)
; 0B2A6753:       DDD9             FSTPD FR1                  ; no-arg-parsing entry point
;       55:       DD4701           FLDD [EDI+1]
;       58:       D9C9             FXCH FR1
;       5A:       D8C1             FADDD FR1
;       5C:       9B               WAIT
;       5D:       64               BYTE #X64
;       5E:       C6054800000004   MOV BYTE PTR [#x48], 4
;       65:       BA10000000       MOV EDX, 16
;       6A:       64               BYTE #X64
;       6B:       031520000000     ADD EDX, [#x20]
;       71:       64               BYTE #X64
;       72:       3B1524000000     CMP EDX, [#x24]
;       78:       7607             JBE L0
;       7A:       E861BCDBFC       CALL #x80623E0             ; alloc_overflow_edx
;       7F:       EB0A             JMP L1
;       81: L0:   64               BYTE #X64
;       82:       891520000000     MOV [#x20], EDX
;       88:       83EA10           SUB EDX, 16
;       8B: L1:   C70216030000     MOV DWORD PTR [EDX], 790
;       91:       8D5207           LEA EDX, [EDX+7]
;       94:       DD5201           FSTD [EDX+1]
;       97:       64               BYTE #X64
;       98:       C6054800000000   MOV BYTE PTR [#x48], 0
;       9F:       64               BYTE #X64
;       A0:       803D4C00000000   CMP BYTE PTR [#x4C], 0
;       A7:       7402             JEQ L2
;       A9:       CC09             BREAK 9                    ; pending interrupt trap
;       AB: L2:   8B4DF8           MOV ECX, [EBP-8]
;       AE:       8B45FC           MOV EAX, [EBP-4]
;       B1:       83C102           ADD ECX, 2
;       B4:       8BE5             MOV ESP, EBP
;       B6:       8BE8             MOV EBP, EAX
;       B8:       FFE1             JMP ECX
;       BA:       90               NOP
;       BB:       90               NOP
;       BC:       90               NOP
;       BD:       90               NOP
;       BE:       90               NOP
;       BF:       90               NOP
;       C0:       CC0A             BREAK 10                   ; error trap
;       C2:       02               BYTE #X02
;       C3:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
;       C4:       CD               BYTE #XCD                  ; EBX
; 
NIL
CL-USER> 

浮動小数点演算から話がズレまくってますが,結論としては LOOP マクロで of-type だけでなく計算過程で使われる式にもちゃんと型指定を入れたほうが 効率の良いコードが生成される事が確認できました.

posted: 2006/02/19 21:30 | permanent link to this entry | Tags: LISP

(top)  (memo)  (rss)