(top)  (memo)  (rss)
某掲示板で,浮動小数点演算時に consing してしまって効率が悪いという話が でていたので,手元の環境でちょっと試してみました.SBCL 0.9.9.31, LispWorks for Windows 4.4.6 では consing なしの演算が可能でした. CLISP では現在 consing なしの浮動小数点演算には成功していません.
(defun float-test ()
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0)
#+lispworks (float 0)))
(let ((a #(1.0d0 2.0d0 3.0d0 4.0d0 5.0d0))
(p1 #(1 2 3))
(p2 #(1 2 3 1 2 3)))
(declare (type simple-vector a))
(declare (type simple-vector p1 p2))
#+lispworks (time (lw-float-sum a p1))
#+lispworks (time (lw-float-sum a p2))
#+sbcl (time (sbcl-float-sum a p1))
#+sbcl (time (sbcl-float-sum a p2))
#+clisp (time (clisp-float-sum a p1))
#+clisp (time (clisp-float-sum a p2))
))
(defun sbcl-float-sum (array positions)
(declare (type simple-vector array positions))
(loop for n of-type fixnum across positions
summing (aref array n) of-type double-float))
(defun lw-float-sum (array positions)
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0)
#+lispworks (float 0)))
(declare (type simple-vector array positions))
;; humm ?
(loop for n of-type fixnum across positions
summing (the double-float (svref array (the fixnum n)))
of-type double-float))
(defun clisp-float-sum (array positions)
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0)))
(declare (type simple-vector array positions))
(loop for n of-type fixnum across positions
summing (the double-float (svref array (the fixnum n)))
of-type double-float))
この中で LispWorks の LOOP マクロの summing 節に対する型指定 of-type double-float で結果がかわってきたのでちょっと調べてみました.
まず SBCL での LOOP マクロの展開してみます.
CL-USER> (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
summing (aref array n) of-type double-float)))
(BLOCK NIL
(LET ((N 0)
(#:LOOP-ACROSS-VECTOR-4944 POSITIONS)
(#:LOOP-ACROSS-INDEX-4945 0)
(#:LOOP-ACROSS-LIMIT-4946 0))
(DECLARE (TYPE FIXNUM #:LOOP-ACROSS-LIMIT-4946)
(TYPE FIXNUM #:LOOP-ACROSS-INDEX-4945)
(TYPE (OR (MEMBER NIL) VECTOR) #:LOOP-ACROSS-VECTOR-4944)
(TYPE FIXNUM N))
(LET ((#:LOOP-SUM-4947 0.0d0))
(DECLARE (TYPE DOUBLE-FLOAT #:LOOP-SUM-4947))
(SB-LOOP::LOOP-BODY
((SETQ #:LOOP-ACROSS-LIMIT-4946
(LENGTH #:LOOP-ACROSS-VECTOR-4944)))
((WHEN (>= #:LOOP-ACROSS-INDEX-4945 #:LOOP-ACROSS-LIMIT-4946)
(GO SB-LOOP::END-LOOP))
(SB-LOOP::LOOP-REALLY-DESETQ N
(AREF #:LOOP-ACROSS-VECTOR-4944
#:LOOP-ACROSS-INDEX-4945))
NIL
(SB-LOOP::LOOP-REALLY-DESETQ #:LOOP-ACROSS-INDEX-4945
(1+ #:LOOP-ACROSS-INDEX-4945)))
((SETQ #:LOOP-SUM-4947 (+ #:LOOP-SUM-4947 (AREF ARRAY N))))
((WHEN (>= #:LOOP-ACROSS-INDEX-4945 #:LOOP-ACROSS-LIMIT-4946)
(GO SB-LOOP::END-LOOP))
(SB-LOOP::LOOP-REALLY-DESETQ N
(AREF #:LOOP-ACROSS-VECTOR-4944
#:LOOP-ACROSS-INDEX-4945))
NIL
(SB-LOOP::LOOP-REALLY-DESETQ #:LOOP-ACROSS-INDEX-4945
(1+ #:LOOP-ACROSS-INDEX-4945)))
((RETURN-FROM NIL #:LOOP-SUM-4947))))))
; No value
CL-USER>
つづいて LispWorks の LOOP マクロの展開結果 ()
CL-USER 7 > (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
summing (svref array (the fixnum n)) of-type double-float)))
(BLOCK NIL
(MACROLET ((LOOP-FINISH () '(GO #:|end-loop-48049|)))
(LET ((#:|across-expr-48052| POSITIONS) (#:|across-length-48053| 0) (#:|across-counter-48054| 0) (N 0))
(DECLARE (TYPE VECTOR #:|across-expr-48052|))
(DECLARE (TYPE FIXNUM #:|across-length-48053|))
(DECLARE (TYPE FIXNUM #:|across-counter-48054|))
(DECLARE (TYPE FIXNUM N))
(LET ((#:|accumulator-48050| (LOAD-TIME-VALUE '0.0)))
(DECLARE (TYPE DOUBLE-FLOAT #:|accumulator-48050|))
(TAGBODY (PROGN
(LET ((#:|temp-48055| (SYSTEM::ACROSS-CHECK-IS-VECTOR #:|across-expr-48052|)))
(SETQ #:|across-length-48053| #:|temp-48055|))
(WHEN (OR (= 0 #:|across-length-48053|)) (GO #:|end-loop-48049|))
(SETQ N (AREF #:|across-expr-48052| 0)))
#:|begin-loop-48048| (INCF #:|accumulator-48050| (SVREF ARRAY (THE FIXNUM N)))
(PROGN
(LET ((#:|temp-48056| (THE FIXNUM (1+ #:|across-counter-48054|))))
(SETQ #:|across-counter-48054| #:|temp-48056|))
(WHEN (OR (= #:|across-counter-48054| #:|across-length-48053|)) (GO #:|end-loop-48049|))
(SETQ N (AREF #:|across-expr-48052| #:|across-counter-48054|)))
(GO #:|begin-loop-48048|)
#:|end-loop-48049| (RETURN-FROM NIL #:|accumulator-48050|))))))
CL-USER 8 > (pprint (macroexpand-1 '(loop for n of-type fixnum across positions
summing (the double-float (svref array (the fixnum n))) of-type double-float)))
(BLOCK NIL
(MACROLET ((LOOP-FINISH () '(GO #:|end-loop-48059|)))
(LET ((#:|across-expr-48062| POSITIONS) (#:|across-length-48063| 0) (#:|across-counter-48064| 0) (N 0))
(DECLARE (TYPE VECTOR #:|across-expr-48062|))
(DECLARE (TYPE FIXNUM #:|across-length-48063|))
(DECLARE (TYPE FIXNUM #:|across-counter-48064|))
(DECLARE (TYPE FIXNUM N))
(LET ((#:|accumulator-48060| (LOAD-TIME-VALUE '0.0)))
(DECLARE (TYPE DOUBLE-FLOAT #:|accumulator-48060|))
(TAGBODY (PROGN
(LET ((#:|temp-48065| (SYSTEM::ACROSS-CHECK-IS-VECTOR #:|across-expr-48062|)))
(SETQ #:|across-length-48063| #:|temp-48065|))
(WHEN (OR (= 0 #:|across-length-48063|)) (GO #:|end-loop-48059|))
(SETQ N (AREF #:|across-expr-48062| 0)))
#:|begin-loop-48058| (INCF #:|accumulator-48060| (THE DOUBLE-FLOAT (SVREF ARRAY (THE FIXNUM N))))
(PROGN
(LET ((#:|temp-48066| (THE FIXNUM (1+ #:|across-counter-48064|))))
(SETQ #:|across-counter-48064| #:|temp-48066|))
(WHEN (OR (= #:|across-counter-48064| #:|across-length-48063|)) (GO #:|end-loop-48059|))
(SETQ N (AREF #:|across-expr-48062| #:|across-counter-48064|)))
(GO #:|begin-loop-48058|)
#:|end-loop-48059| (RETURN-FROM NIL #:|accumulator-48060|))))))
CL-USER 9 >
前者と後者の違いは accumulator を足す時の式が (incf accumulator (svref array n)) に展開されるか (incf [accumulator] (the double-float (svrefarray n))) に展開されるか,の二点のみです.したがって,ここがメモリ消費 量の違いとなっていると考えられます.summing の of-type では計算結果は double-float だと指定されていますが,計算途中の式の型指定が不十分なため LispWorks が box しているのでしょう.確認のため,incf をする関数を定義して disassemble してみます.
CL-USER 33 > (defun f (x y) (declare (optimize (speed 3) (safety 0) (debug 0) (float 0))
(double-float x))
(incf x y))
F
CL-USER 34 > (disassemble *)
206969B2:
0: 55 push ebp
1: 89E5 move ebp, esp
3: 83EC0C sub esp, C
6: C70424450C0000 move [esp], C45
13: 50 push eax
14: 8B7D08 move edi, [ebp+8]
17: DD4704 fldl [edi+4]
20: DD5DF8 fstpl [ebp-8]
23: 83EC0C sub esp, C
26: C70424450C0000 move [esp], C45
33: DD45F8 fldl [ebp-8]
36: DD5C2404 fstpl [esp+4]
40: B501 moveb ch, 1
42: FF1528231120 call [20112328] ; SYSTEM::RAW-FAST-BOX-DOUBLE
48: 8B4DF0 move ecx, [ebp-10]
51: 0AC8 orb cl, al
53: 753E jne L2
55: 89C7 move edi, eax
57: 037DF0 add edi, [ebp-10]
60: 7037 jo L2
L1: 62: DD4704 fldl [edi+4]
65: FF75F4 push [ebp-C]
68: FF75F0 push [ebp-10]
71: 8B75F8 move esi, [ebp-8]
74: 8975F0 move [ebp-10], esi
77: 8B75FC move esi, [ebp-4]
80: 8975F4 move [ebp-C], esi
83: 8B7500 move esi, [ebp]
86: 8975F8 move [ebp-8], esi
89: 83ED08 sub ebp, 8
92: 8B750C move esi, [ebp+C]
95: 897504 move [ebp+4], esi
98: DD5D0C fstpl [ebp+C]
101: C74508450C0000 move [ebp+8], C45
108: B501 moveb ch, 1
110: C9 leave
111: FF2528231120 jmp [20112328] ; SYSTEM::RAW-FAST-BOX-DOUBLE
L2: 117: 50 push eax
118: 8B45F0 move eax, [ebp-10]
121: E81AC13100 call 209B2B4A ; #<function 209B2B4A>
126: 89C7 move edi, eax
128: EBBC jmp L1
NIL
CL-USER 35 > (defun f (x y)
(declare (optimize (speed 3) (safety 0) (debug 0) (float 0)) (double-float x))
(incf x (the double-float y)))
F
CL-USER 36 > (disassemble *)
2068401A:
0: 55 push ebp
1: 89E5 move ebp, esp
3: 83EC14 sub esp, 14
6: C7042445140000 move [esp], 1445
13: 8B7D08 move edi, [ebp+8]
16: DD4704 fldl [edi+4]
19: DD4004 fldl [eax+4]
22: DEC1 faddp st(1), st
24: FF75F0 push [ebp-10]
27: FF75EC push [ebp-14]
30: 8B75F4 move esi, [ebp-C]
33: 8975EC move [ebp-14], esi
36: 8B75F8 move esi, [ebp-8]
39: 8975F0 move [ebp-10], esi
42: 8B75FC move esi, [ebp-4]
45: 8975F4 move [ebp-C], esi
48: 8B7500 move esi, [ebp]
51: 8975F8 move [ebp-8], esi
54: 83ED08 sub ebp, 8
57: 8B750C move esi, [ebp+C]
60: 897504 move [ebp+4], esi
63: DD5D0C fstpl [ebp+C]
66: C74508450C0000 move [ebp+8], C45
73: B501 moveb ch, 1
75: C9 leave
76: FF2528231120 jmp [20112328] ; SYSTEM::RAW-FAST-BOX-DOUBLE
NIL
CL-USER 37 >
このように型指定のありなしで box 処理の回数がかわっています.
ちなみに SBCL でも summing の後の the double-float 指定は入れたほうが処 理速度が速いです.コンパイラが文句いっているように + の第二引数の型が確 定していない場合には GENERIC-+ が呼ばれるため,メソッド呼び出しが入りま す.LispWorks と同様 the double-float 指定をしておけば,コンパイル時に インライン展開されてるため,関数呼び出しのコスト分お特です.ただし GENERIC-+ が consing しないため,今回の目標である consing なしの演算と いう今回の目標からははずれますが….
CL-USER> (defun f (x y)
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0)) (type double-float x))
(the double-float (+ x y)))
; in: LAMBDA NIL
; (+ X Y)
;
; note: unable to
; optimize
; due to type uncertainty:
; The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
; optimize
; due to type uncertainty:
; The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
; optimize
; due to type uncertainty:
; The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
; optimize
; due to type uncertainty:
; The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: forced to do GENERIC-+ (cost 10)
; unable to do inline float arithmetic (cost 2) because:
; The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; compilation unit finished
; printed 5 notes
STYLE-WARNING: redefining F in DEFUN
F
CL-USER> (disassemble *)
; 0B23FB8E: E8BD05DCF5 CALL #x1000150 ; GENERIC-+
; no-arg-parsing entry point
; 93: 8BE3 MOV ESP, EBX
; 95: 8B4DF8 MOV ECX, [EBP-8]
; 98: 8B45FC MOV EAX, [EBP-4]
; 9B: 83C102 ADD ECX, 2
; 9E: 8BE5 MOV ESP, EBP
; A0: 8BE8 MOV EBP, EAX
; A2: FFE1 JMP ECX
; A4: 90 NOP
; A5: 90 NOP
; A6: 90 NOP
; A7: 90 NOP
; A8: CC0A BREAK 10 ; error trap
; AA: 02 BYTE #X02
; AB: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; AC: CD BYTE #XCD ; EBX
;
NIL
CL-USER> (defun f (x y)
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0)) (type double-float x))
(the double-float (+ x (the double-float y))))
; in: LAMBDA NIL
; (SB-INT:NAMED-LAMBDA F
; (X Y)
; (DECLARE
; (OPTIMIZE (SPEED 3) (SPACE 0) (SAFETY 0) (DEBUG 0))
; (TYPE DOUBLE-FLOAT X))
; (BLOCK F (THE DOUBLE-FLOAT (+ X (THE DOUBLE-FLOAT Y)))))
; ==>
; #'(SB-INT:NAMED-LAMBDA F
; (X Y)
; (DECLARE
; (OPTIMIZE (SPEED 3) (SPACE 0) (SAFETY 0) (DEBUG 0))
; (TYPE DOUBLE-FLOAT X))
; (BLOCK F
; (THE DOUBLE-FLOAT (+ X (THE DOUBLE-FLOAT Y)))))
;
; note: doing float to pointer coercion (cost 13) to "<return value>"
;
; compilation unit finished
; printed 1 note
STYLE-WARNING: redefining F in DEFUN
F
CL-USER> (disassemble *)
; 0B2A6753: DDD9 FSTPD FR1 ; no-arg-parsing entry point
; 55: DD4701 FLDD [EDI+1]
; 58: D9C9 FXCH FR1
; 5A: D8C1 FADDD FR1
; 5C: 9B WAIT
; 5D: 64 BYTE #X64
; 5E: C6054800000004 MOV BYTE PTR [#x48], 4
; 65: BA10000000 MOV EDX, 16
; 6A: 64 BYTE #X64
; 6B: 031520000000 ADD EDX, [#x20]
; 71: 64 BYTE #X64
; 72: 3B1524000000 CMP EDX, [#x24]
; 78: 7607 JBE L0
; 7A: E861BCDBFC CALL #x80623E0 ; alloc_overflow_edx
; 7F: EB0A JMP L1
; 81: L0: 64 BYTE #X64
; 82: 891520000000 MOV [#x20], EDX
; 88: 83EA10 SUB EDX, 16
; 8B: L1: C70216030000 MOV DWORD PTR [EDX], 790
; 91: 8D5207 LEA EDX, [EDX+7]
; 94: DD5201 FSTD [EDX+1]
; 97: 64 BYTE #X64
; 98: C6054800000000 MOV BYTE PTR [#x48], 0
; 9F: 64 BYTE #X64
; A0: 803D4C00000000 CMP BYTE PTR [#x4C], 0
; A7: 7402 JEQ L2
; A9: CC09 BREAK 9 ; pending interrupt trap
; AB: L2: 8B4DF8 MOV ECX, [EBP-8]
; AE: 8B45FC MOV EAX, [EBP-4]
; B1: 83C102 ADD ECX, 2
; B4: 8BE5 MOV ESP, EBP
; B6: 8BE8 MOV EBP, EAX
; B8: FFE1 JMP ECX
; BA: 90 NOP
; BB: 90 NOP
; BC: 90 NOP
; BD: 90 NOP
; BE: 90 NOP
; BF: 90 NOP
; C0: CC0A BREAK 10 ; error trap
; C2: 02 BYTE #X02
; C3: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; C4: CD BYTE #XCD ; EBX
;
NIL
CL-USER>
浮動小数点演算から話がズレまくってますが,結論としては LOOP マクロで of-type だけでなく計算過程で使われる式にもちゃんと型指定を入れたほうが 効率の良いコードが生成される事が確認できました.
posted: 2006/02/19 21:30 | permanent link to this entry | Tags: LISP