A Small Performance Improvement in math/big

29 Nov 2018

Alexander Döring

Smallpdf

Topics Touched

internals of math/big
some math
performance

Issue 23221

Package math/big

Package big implements arbitrary-precision arithmetic (big numbers).

The following numeric types are supported:

Int     signed integers
Rat     rational numbers
Float   floating-point numbers

Int

type Int

func NewInt(x int64) *Int

func (z *Int) Add(x, y *Int) *Int

func (z *Int) Mul(x, y *Int) *Int

...

Int

type Int

func NewInt(x int64) *Int

func (z *Int) Add(x, y *Int) *Int

func (z *Int) Mul(x, y *Int) *Int

...

Squaring is a special case of Mul

Looking Inside

With the basic block

type nat []Word

with

type Word uint

we can define

type Int struct {
    neg bool
    abs nat
}

Method Int.Mul

func (z *Int) Mul(x, y *Int) *Int {
    if x == y {
        z.abs = z.abs.sqr(x.abs)
        z.neg = false
        return z
    }
    z.abs = z.abs.mul(x.abs, y.abs)
    z.neg = len(z.abs) > 0 && x.neg != y.neg
    return z
}

Method nat.mul

func (z nat) mul(x, y nat) nat {

    ...

    if n < karatsubaThreshold {
        z = basicMul(x, y)
        return z.norm()
    }

    ...
    karatsuba(z, x0, y0)
    ...

    return z.norm()
}

Function basicMul

Schoolbook multiplication

   54 * 31
   ---------
   5*3
+        5*1
+        4*3
+              4*1
   ---------
=  5*3 * 100 + 5*1 * 10 + 4*3 * 10 + 4*1

using 4 multiplications (n^2)

Idea Karatsuba

Calculate product of differences

   54 * 31
   ---------
   5*3
+        (5-4)*(1-3) + 5*3 + 4*1
+                                      4*1

using 3 multiplications (n^log(3))

Idea Karastuba

Use karatsuba recursively
Until we can use basicMul
Length has to be equal and even

This is possible.

Looking Back: Method nat.mul

func (z nat) mul(x, y nat) nat {

    ...

    if n < karatsubaThreshold {
        z = basicMul(x, y)
        return z.norm()
    }

    ...
    karatsuba(z, x0, y0)
    ...

    return z.norm()
}

Function karatsuba

func karatsuba(z, x, y nat)

Avoid allocations:

Use z to store intermediate results
Reslice x, y and z

Example:

Split x into 2 parts:

     n := len(x)
    n2 := n >> 1              
x1, x0 := x[n2:], x[0:n2]

Looking Back: Method int.Mul

func (z *Int) Mul(x, y *Int) *Int {
    if x == y {
        z.abs = z.abs.sqr(x.abs)
        z.neg = false
        return z
    }
    z.abs = z.abs.mul(x.abs, y.abs)
    z.neg = len(z.abs) > 0 && x.neg != y.neg
    return z
}

Method nat.Sqr before go 1.11

func (z nat) sqr(x nat) nat {

    ...

    if n < basicSqrThreshold {
        basicMul(z, x, x)
        return z.norm()
    }
    if n < karatsubaSqrThreshold {
        basicSqr(z, x)
        return z.norm()
    }

    ...
    karatsuba(z, x0, x0)
    ...

    return z.norm()
}

Method nat.Sqr in go 1.11

func (z nat) sqr(x nat) nat {

    ...

    if n < basicSqrThreshold {
        basicMul(z, x, x)
        return z.norm()
    }
    if n < karatsubaSqrThreshold {
        basicSqr(z, x)
        return z.norm()
    }

    ...
    karatsubaSqr(z, x0)
    ...

    return z.norm()
}

Looking Back: Idea Karatsuba

Calculate product of differences

   54 * 31
   ---------
   5*3
+        (5-4)*(1-3) + 5*3 + 4*1
+                                      4*1

using 3 multiplications (n^log(3))

Idea KaratsubaSqr

Specialize karatsuba

    53 * 53
    --------
    5*5
+        (5-3)*(3-5) + 5*5 + 3*3
+                                   3*3

Recap Int.Mul

func (z *Int) Mul(x, y *Int) *Int

now uses

for x!=y:

basicMul
karatsuba

for x==y:

basicMul
basicSqr
karatsubaSqr

Thresholds Before

for mul:

var karatsubaThreshold = 40

for sqr:

var basicSqrThreshold = 20
var karatsubaSqrThreshold = 400

Threshold Now

for mul:

var karatsubaThreshold = 40

for sqr:

var basicSqrThreshold = 20
var karatsubaSqrThreshold = 260

Threshold Calibration

Run in math/big:

go test -v -calibrate --run TestCalibrate

Output:

...
Calibrating threshold between basicSqr(x) and karatsubaSqr(x)
Looking for a timing difference for x between 200 - 500 words by 10 step
words = 200 deltaT =     -980ns (  -7%) is karatsubaSqr(x) better: false
words = 210 deltaT =     -773ns (  -5%) is karatsubaSqr(x) better: false
words = 220 deltaT =     -695ns (  -4%) is karatsubaSqr(x) better: false
words = 230 deltaT =     -570ns (  -3%) is karatsubaSqr(x) better: false
words = 240 deltaT =     -458ns (  -2%) is karatsubaSqr(x) better: false
words = 250 deltaT =      -63ns (   0%) is karatsubaSqr(x) better: false
words = 260 deltaT =      118ns (   0%) is karatsubaSqr(x) better: true  threshold  found
words = 270 deltaT =      377ns (   1%) is karatsubaSqr(x) better: true
words = 280 deltaT =      765ns (   3%) is karatsubaSqr(x) better: true
words = 290 deltaT =      673ns (   2%) is karatsubaSqr(x) better: true
words = 300 deltaT =      502ns (   1%) is karatsubaSqr(x) better: true
words = 310 deltaT =      629ns (   2%) is karatsubaSqr(x) better: true
words = 320 deltaT =    1.011µs (   3%) is karatsubaSqr(x) better: true
...

Issue 25580

Performance Improvement

Using:

benchstat old.txt new.txt

We get:

name           old time/op  new time/op  delta
NatSqr/8-4      131ns ± 1%   131ns ± 1%     ~     (p=0.870 n=46+49)
NatSqr/10-4     175ns ± 1%   175ns ± 1%   +0.38%  (p=0.000 n=49+47)
NatSqr/20-4     426ns ± 1%   429ns ± 1%   +0.84%  (p=0.000 n=46+48)
NatSqr/30-4     702ns ± 2%   699ns ± 1%   -0.38%  (p=0.011 n=46+44)
NatSqr/50-4    1.44µs ± 2%  1.43µs ± 1%   -0.54%  (p=0.010 n=48+48)
NatSqr/80-4    2.85µs ± 1%  2.87µs ± 1%   +0.68%  (p=0.000 n=47+47)
NatSqr/100-4   4.06µs ± 1%  4.07µs ± 1%   +0.29%  (p=0.000 n=46+45)
NatSqr/200-4   13.4µs ± 1%  13.5µs ± 1%   +0.73%  (p=0.000 n=48+48)
NatSqr/300-4   28.5µs ± 1%  28.2µs ± 1%   -1.22%  (p=0.000 n=46+48)
NatSqr/500-4   81.9µs ± 1%  67.0µs ± 1%  -18.25%  (p=0.000 n=48+48)
NatSqr/800-4    161µs ± 1%   140µs ± 1%  -13.29%  (p=0.000 n=47+48)
NatSqr/1000-4   245µs ± 1%   207µs ± 1%  -15.17%  (p=0.000 n=49+49)

Mail In CL

wait a couple of weeks
fix some things
trybots run

Trybots

2 of 17 TryBots failed:
Failed on nacl-amd64p32
Failed on windows-amd64-2016

Nacl:

FAIL: TestSqr (0.00s)
panic: runtime error: index out of range

Windows:

unexpected fault address 0xc000800000
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0xc000800000 pc=0x5ea20e]

Both when calling basicSqr from karatsubaSqr.

Trybot problem

Looks like an issue with assembly.

Using purego build tag:
Problem is reproducable locally.

Trybot problem

Requirements for basicSqr (Comment):

len(x) > 0, len(z) >= 2*len(x)

Trybot problem

Requirements for basicSqr (Comment):

len(x) > 0, len(z) >= 2*len(x)

Actual requirements for basicSqr:

len(x) > 0, len(z) == 2*len(x)

Related Issues

goissues.org/math/big

Thank you

29 Nov 2018

Alexander Döring

Smallpdf