A Small Performance Improvement in math/big
29 Nov 2018
Alexander Döring
Smallpdf
29 Nov 2018
Alexander Döring
Smallpdf
Package big implements arbitrary-precision arithmetic (big numbers).
The following numeric types are supported:
Int signed integers Rat rational numbers Float floating-point numbers
type Int func NewInt(x int64) *Int func (z *Int) Add(x, y *Int) *Int func (z *Int) Mul(x, y *Int) *Int ...
type Int func NewInt(x int64) *Int func (z *Int) Add(x, y *Int) *Int func (z *Int) Mul(x, y *Int) *Int ...
Squaring is a special case of Mul
With the basic block
type nat []Word
with
type Word uint
we can define
type Int struct { neg bool abs nat }
func (z *Int) Mul(x, y *Int) *Int { if x == y { z.abs = z.abs.sqr(x.abs) z.neg = false return z } z.abs = z.abs.mul(x.abs, y.abs) z.neg = len(z.abs) > 0 && x.neg != y.neg return z }
func (z nat) mul(x, y nat) nat { ... if n < karatsubaThreshold { z = basicMul(x, y) return z.norm() } ... karatsuba(z, x0, y0) ... return z.norm() }
Schoolbook multiplication
54 * 31 --------- 5*3 + 5*1 + 4*3 + 4*1 --------- = 5*3 * 100 + 5*1 * 10 + 4*3 * 10 + 4*1
using 4 multiplications (n^2)
10Calculate product of differences
54 * 31 --------- 5*3 + (5-4)*(1-3) + 5*3 + 4*1 + 4*1
using 3 multiplications (n^log(3))
11This is possible.
12func (z nat) mul(x, y nat) nat { ... if n < karatsubaThreshold { z = basicMul(x, y) return z.norm() } ... karatsuba(z, x0, y0) ... return z.norm() }
func karatsuba(z, x, y nat)
Avoid allocations:
Example:
Split x
into 2 parts:
n := len(x) n2 := n >> 1 x1, x0 := x[n2:], x[0:n2]
func (z *Int) Mul(x, y *Int) *Int { if x == y { z.abs = z.abs.sqr(x.abs) z.neg = false return z } z.abs = z.abs.mul(x.abs, y.abs) z.neg = len(z.abs) > 0 && x.neg != y.neg return z }
func (z nat) sqr(x nat) nat { ... if n < basicSqrThreshold { basicMul(z, x, x) return z.norm() } if n < karatsubaSqrThreshold { basicSqr(z, x) return z.norm() } ... karatsuba(z, x0, x0) ... return z.norm() }
func (z nat) sqr(x nat) nat { ... if n < basicSqrThreshold { basicMul(z, x, x) return z.norm() } if n < karatsubaSqrThreshold { basicSqr(z, x) return z.norm() } ... karatsubaSqr(z, x0) ... return z.norm() }
Calculate product of differences
54 * 31 --------- 5*3 + (5-4)*(1-3) + 5*3 + 4*1 + 4*1
using 3 multiplications (n^log(3))
18Specialize karatsuba
53 * 53 -------- 5*5 + (5-3)*(3-5) + 5*5 + 3*3 + 3*3
func (z *Int) Mul(x, y *Int) *Int
now uses
for x!=y
:
basicMul
karatsuba
for x==y
:
basicMul
basicSqr
karatsubaSqr
for mul:
var karatsubaThreshold = 40
for sqr:
var basicSqrThreshold = 20 var karatsubaSqrThreshold = 400
for mul:
var karatsubaThreshold = 40
for sqr:
var basicSqrThreshold = 20 var karatsubaSqrThreshold = 260
Run in math/big
:
go test -v -calibrate --run TestCalibrate
Output:
... Calibrating threshold between basicSqr(x) and karatsubaSqr(x) Looking for a timing difference for x between 200 - 500 words by 10 step words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true ...
Using:
benchstat old.txt new.txt
We get:
name old time/op new time/op delta NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49) NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47) NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48) NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44) NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48) NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47) NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45) NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48) NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48) NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48) NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48) NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49)
2 of 17 TryBots failed: Failed on nacl-amd64p32 Failed on windows-amd64-2016
Nacl:
FAIL: TestSqr (0.00s) panic: runtime error: index out of range
Windows:
unexpected fault address 0xc000800000 fatal error: fault [signal 0xc0000005 code=0x0 addr=0xc000800000 pc=0x5ea20e]
Both when calling basicSqr
from karatsubaSqr
.
Looks like an issue with assembly.
Using purego
build tag:
Problem is reproducable locally.
Requirements for basicSqr
(Comment):
len(x) > 0, len(z) >= 2*len(x)
Requirements for basicSqr
(Comment):
len(x) > 0, len(z) >= 2*len(x)
Actual requirements for basicSqr
:
len(x) > 0, len(z) == 2*len(x)
29 Nov 2018
Alexander Döring
Smallpdf