Toggle Mode

Why Linear Congruence Generators aren't really random & how to break them

Libertarians, look away. This post might cause an existential crisis by reminding you of how difficult randomness is to attain. Determinists, welcome.
image by xkcd, comic 221

The other day, I was messing with Java and its Math.Random() static method. For those not familiar with what this method does, it generates a ‘random’ double in the range $[0,1)$. I messed around with this method a bit before I figured out I could transform this range to any $[a,b]$ I wanted, $a,b \in \mathbb{Z}$, by doing:1

int randInt = (int) ((b-a+1)*Math.Random()+a) 

Pretty neat, right?

So I went on and decided to generate a few random numbers to test out what I had:

int a = 1;
int b = 10;
int numSamples = 20;
ArrayList<Integer> l = new ArrayList<>();
for (int i=0; i<numSamples; i++){
    int rand = (int) ((b - a + 1) * Math.random() + a);
    l.add(rand);
}
for (int num: l){
    System.out.print(num+", ");
}
3, 2, 8, 4, 1, 5, 5, 3, 10, 1, 9, 1, 1, 2, 9, 1, 2, 3, 2, 7, 
Process finished with exit code 0

To my surprise, this ‘random’ number generator had generated $1$ 5 times, making $\frac 1 4$ of all numbers generated a $1$

Perhaps it was just the scarce sample size, I thought, and I decided to better visualize my data:

int a = 1;
int b = 10;
int numSamples = 30;
ArrayList<Integer> l = new ArrayList<>();
HashMap<Integer, Integer> hashMap = new HashMap<>();

// initialize map 
for (int i = a; i <= b; i++) {
    hashMap.put(i, 0);
}

// insert random ints into list
for (int i = 0; i < numSamples; i++) {
    int rand = (int) ((b - a + 1) * Math.random() + a);
    l.add(rand);
}

// move list to map
for (int i : l) {
    hashMap.put(i, hashMap.get(i) + 1);
}

// print
System.out.println("Results:");
for (int key : hashMap.keySet()) {
    System.out.println(key + ": " + hashMap.get(key));
}
Results:
1: 1
2: 1
3: 6
4: 5
5: 3
6: 0
7: 4
8: 6
9: 1
10: 3

Process finished with exit code 0

$3,8$ 6 times?! That doesn’t seem random at all! Repeating with 100 samples:

Results:
1: 4
2: 15
3: 10
4: 15
5: 10
6: 11
7: 6
8: 13
9: 6
10: 10

Process finished with exit code 0

Hmm… Still very out of proportion..

I eventually tried a million samples and the results, although more uniform, still seemed unpredictable. Why was this? I went and asked ChatGPT for answers.

Introduction to LCGs

GPT postulated that the inaccuracy comes from my several floating arithmetic2 operations perhaps resulting in errors— exacerbated by my casting of the double to an int.

However, it also mentioned that these results may just be coincidental, as Math.Random() uses a Linear Congruential Generator (LCG) to generate numbers, which is statistically reliable but not cryptgraphically so.

This concession piqued my interest. Why wasn’t something random seeming not reliable? And why was the distinction between statistical reliability and cryptographical reliablity important? So I set out to find more about these LCGs, coming across their definition.

Given an initial state $X_0$, an LCG generates subsequent 'random' numbers by using: $$X_{n+1}=aX_n +b \pmod m$$ where $a$ is called the multiplier, $b$ the addend, and $m$ the modulus or base.

The state in this definition is derived from some source of entropy, like the system’s time.

So it then takes the state, and using some absurdly large modulus and weird addends and multipliers, determines the next state. For example, for $X_0=23, a=13, b=5, m=48$, we have:

\begin{align*} X_1 &= aX_0 +b \pmod m = 13(23)+5 \pmod{48} = 16\\ X_2 &= aX_1 +b \pmod m = 13(17)+5 \pmod{48} =21\\ X_3 &= aX_2 +b \pmod m = 13(47)+5 \pmod{48} =38\\ \vdots \end{align*}

In fact, this sequence has been proven to only repeat after $m$ terms by the Hull-Dobell Theorem. That is, this will generate a sequence of random numbers for 48 times, after which it returns to $X_0$ and repeats. You can test this out yourself by implementing an lcg() method yourself and running it in a loop. 3

And, as it turns out the sequences generated by such an algorithm are indeed statistically random4. So what about cryptographic reliability? We know the method is deterministic, that is, we can find the next numbers of the sequence given some— but only if we also know the parameters $a, b, m$. But we don’t!

Well, it turns out that a little bit of number theory magic and wizardry can be performed to find $a, b, m$ by manipulating terms of the sequence.

Breaking LCGs

This section will be dedicated to reviewing a method on how to break LCGs, motivated (but heavily adjusted and standardized5) by James Reeds’ initial demonstration in 1977.

To break an LCG, it suffices to be able to predict the next term in a sequence of given consecutive terms by finding $a, b, m$.

Finding the modulus

We begin by getting any 3 numbers generated by a LCG, call them $X_n, X_{n+1}, X_{n+2}$. Then, by the definition of an LCG:

\begin{align} X_n &= aX_{n-1} +b &&\pmod m \tag{I}\\ X_{n+1} &= aX_{n} +b &&\pmod m \tag{II}\\ X_{n+2} &= aX_{n+1} +b &&\pmod m \tag{III} \end{align}

Recall that the definition of modular arithmetic is as follows:

For $X,a,b,c,d,M \in \mathbb{Z}$: $$X=a \pmod b \iff (X-a)\mid b \iff \frac{b}{(X-a)}=k$$ where $a\mid b$ means a divides b, so $\frac{b}{a}=k$ for some $k\in\mathbb{Z}$.
It is also useful to demonstrate some facts whose proofs are trivial and left as an exercise to the reader.

So we may subtract $\text{(I)}$ from $\text{(II)}$ and $\text{(II)}$ from $\text{(III)}$ to obtain the following, eliminating $b$:

\begin{align} X_{n+1}-X_{n} &= aX_n-aX_{n-1} = a(X_n-X_{n-1}) &&\pmod m\tag{IV}\\ X_{n+2}-X_{n+1} &= aX_{n+1}-aX_n = a(X_{n+1}-X_n) &&\pmod m\tag{V} \end{align}

Then, multiply $\text{(IV)}$ by $(X_{n+1}-X_n)$ and $\text{(V)}$ by $(X_n-X_{n-1})$ and subtract the resulting equations to eliminate $a$:

\begin{align} (X_{n+1}-X_{n})(X_{n+1}-X_n) &= \cancel{a(X_n-X_{n-1})(X_{n+1}-X_n)} &&\pmod m\\ -(X_{n+2}-X_{n+1})(X_n-X_{n-1}) &= \cancel{a(X_{n+1}-X_n)(X_n-X_{n-1})} &&\pmod m \\ (X_{n+1}-X_{n})^2-(X_{n+2}-X_{n+1})(X_n-X_{n-1}) &= 0 \pmod m \tag{VI} \end{align}

So many $X$s! But now we have gotten rid of both $a, b$, albeit at the cost of our eyes having to see some of the most disgusting notation on Earth. So let’s try to clean this up;

You may notice that $(X_{n+2}-X_{n+1})$ in (VI) can be rewritten as $(X_{(n+1)+1}-X_{(n+1)})$

Also, $(X_{n}-X_{n-1})=(X_{(n-1)+1}-X_{(n-1)})$

This is great because now we can set:

\begin{align*} S_n &= X_{n+1}-X_n &&\pmod m \\ \text{and so, (VI) becomes:}\\ Q_n &= S_{n+1}^2-S_{n+2} S_n &&\pmod m\\ \text{which we just showed:}\\ Q_n &= 0 &&\pmod m \end{align*}

Okay, now take a deep breath. The hardest part is almost over.

Now, this indicates that $Q_n$ is a multiple of $m$. That is, $Q_n =k_n m$ for some $0<=k<Q_n$. Furthermore, we notice that $Q_n/m=k_n$ implies that $m$ is a divisor of $Q_n$.

Consider $L_n$, a set of all divisors of $Q_n$. Then:

$$L_n \equiv \{1, d_1, d_2, d_3, \ldots \}$$

Where $m$ must be in this list. Now, if $m$ is in the interior of the set, then it isn’t really mathematically striking. However, if $m$ is the largest element of this set, then it is called the Greatest Divisor of $Q_n$, which can be forcefully computed by several algorithms6.

These algorithms can be extended to find the Greatest Common Divisor of two (or more) numbers, denoted by $\gcd(a,b)$.

With this in mind, consider $\gcd(Q_{n},Q_{n+1})$. Observe7:

\begin{align*} \gcd(Q_{n},Q_{n+1})&=\gcd(k_n m,k_{n+1}m)\\ &=m\gcd(k_n, k_{n+1}) \end{align*}

But we wish to seek $\gcd(Q_{n},Q_{n+1})=m$

\begin{align*} \gcd(Q_{n},Q_{n+1})&=m\\ m\gcd(k_n, k_{n+1})&=m\\ \gcd(k_n, k_{n+1})&=1 \end{align*}

To those of you who are familiar with number theory, this just lit the brightest bulb in your head.

To those of you who aren’t, here’s why:

In number theory, it is a well-known fact that the probability that 2 integers are co-prime8 is $\frac{6}{\pi^2}$, $\pi$ sneaking its way into the probability through the Reimann-Zeta function, which can be used to express the probability that $n$ integers are co-prime with $\frac{1}{\zeta(n)}$9

Euler, in one of his many strokes of genius, proved that $\zeta(2)=\frac{\pi^2}{6}$10. Thus the probability of our two integers being co-prime is $\frac{1}{\zeta(2)}=\frac{6}{\pi^2}\approx 60\%$. And this probability increases exponentially for greater numbers of integers given, allowing us to find $m$ with fantastic certainty— if we simply generate $\gcd(Q_1, Q_2, Q_3, \ldots, Q_{n-3})$.

Luckily for us, any good computer is able to do this with high efficiency. Thus, $m$ is found.11

class modulusFinder{
    private int[] outputs; // the lcg outputs, len>=3
    private int[] Sn; // Xn+1-Xn
    private int[] Qn; // Sn+1^2 -Sn+2*Sn
    private int mod;

    public modulusFinder(int[] outputs){
        this.outputs=outputs;
        Sn= new int[outputs.length-1];
        for (int i=0; i<outputs.length-1; i++){
            Sn[i]=outputs[i+1]-outputs[i];
        }
        Qn=new int[Sn.length];
        for (int i=0; i< Sn.length-2; i++){
            Qn[i]=Sn[i+1]*Sn[i+1]-Sn[i+2]*Sn[i];
        }
        mod=gcdArr(Qn); // call gcd of all Qn
    }
    /* finds the gcd of two numbers using a modified euclidean algo */
    public static int gcd(int x, int y){
        if (y == 0) return x;
        return gcd(y, x%y);
    }
    /* calls gcd on all elements of an array */
    public static int gcdArr(int[] arr) {
        int result = arr[0];
        for (int i : arr) {
            result = gcd(result, i);
            if (result == 1) return 1;
        }
        return result;
    }

    /* what we built up to. */
    public void getModulus(){
        System.out.println("The modulus m is: "+mod);
    }
}

Finding the multiplier

From here on out, everything becomes quite trivial. Recall $\text{(V)}$:

\begin{align*} X_{n+2}-X_{n+1} &= a(X_{n+1}-X_n) &&\pmod m \tag{V}\\ a &= (X_{n+2}-X_{n+1})(X_{n+1}-X_n)^{-1} &&\pmod m \end{align*}

where $(X_{n+1}-X_n)^{-1}$ is the modular inverse of the argument12 with respect to $m$. Thus, $a$ is found.

/* returns coefficients of bezouts identity & the gcd. */
public static int[] extendedgcd(int a, int b){
    if (b==0){
        int[] base = {a, 1, 0}; // the base case has gcd(a,0)=a(1)+0(0)
        return base;
    }
    int[] subdiv = extendedgcd(b, a%b); // break up the problem 
    int gcd =subdiv[0]; 
    int subdivX = subdiv[1];
    int subdivY = subdiv[2];
    int divX = subdivY;
    int divY= subdivX-(a/b)*subdivY;
    int[] div = {gcd, divX, divY};
    return div;
}
/* takes the inverse, with precondition of coprimality of parameters */
public static int modInv(int a, int m){
    int[] result = extendedgcd(a,m);
    int inv = result[1];
    return (inv%m +m) % m;
}

Finding the addend

Finally, recall $\text{(II)}$.

\begin{align*} X_{n+1} &= aX_{n} +b &&\pmod m \tag{II}\\ b &= X_{n+1}-aX_{n} &&\pmod m \end{align*}

Plain and simple, we are done!

Final Code 13

The following code breaks any LCG, implementing the math we outlined in this blog, provided you have over 5 outputs of the LCG and there is no processing done to the outputs.

class lcgCracker{
    private int[] outputs; // the lcg outputs, len>=5, preferably more
    private int[] Sn; // Xn+1-Xn
    private int[] Qn; // Sn+1^2 -Sn+2*Sn
    private int mod;
    private int mult;
    private int add;
    private int next;

    public lcgCracker(int[] outputs){
        this.outputs=outputs;
        Sn= new int[outputs.length-1];
        for (int i=0; i<outputs.length-1; i++){
            Sn[i]=outputs[i+1]-outputs[i];
        }
        Qn=new int[Sn.length];
        for (int i=0; i< Sn.length-2; i++){
            Qn[i]=Sn[i+1]*Sn[i+1]-Sn[i+2]*Sn[i];
        }
        mod=gcdArr(Qn); // call gcd of all Qn
        // compute multiplier (a)
        mult = ((outputs[1] - outputs[2]) * modInv((outputs[0] - outputs[1]), mod)) % mod;
        if (mult < 0) mult += mod; // ensure positive

        // compute addend (b)
        add = (outputs[1] - mult * outputs[0]) % mod;
        if (add < 0) add += mod; // ensure positive

        // Predict the next term
        next = (mult * outputs[outputs.length - 1] + add) % mod;
        if (next < 0) next += mod; // ensure positive
    }

    /* returns coefficients of bezouts identity & the gcd. */
    public static int[] extendedgcd(int a, int b){
        if (b==0){
            int[] base = {a, 1, 0}; // the base case has gcd(a,0)=a(1)+0(0)
            return base;
        }
        int[] subdiv = extendedgcd(b, a%b); // break up the problem
        int gcd =subdiv[0];
        int subdivX = subdiv[1];
        int subdivY = subdiv[2];
        int divX = subdivY;
        int divY= subdivX-(a/b)*subdivY; // see footnote 13
        int[] div = {gcd, divX, divY};
        return div;
    }

    /* calls gcd on all elements of an array */
    public static int gcdArr(int[] arr) {
        int result = arr[0];
        for (int i : arr) {
            result = Math.abs(extendedgcd(result, i)[0]);
            if (result == 1) return 1;
        }
        return result;
    }

    /* takes the inverse, with precondition of coprimality of parameters */
    public static int modInv(int a, int m){
        a = (a % m + m) % m; // this code is because of java's modulo operator shenanigans.
        int[] result = extendedgcd(a,m);
        int inv = result[1];
        return (inv%m +m) % m;
    }
    public void runCracker(){
        System.out.println("The multiplier is: "+mult);
        System.out.println("The addend is: "+add);
        System.out.println("The modulus is: "+mod);
        System.out.println("And the next term in the lcg is: "+ next);
    }
}

Java’s LCG: Attempts at cryptographic reliability

I encourage you to try running this code with your own LCG implementation and test it out! See if our math checks out!

However, if you try to call this class on outputs generated from, say Java’s LCG, it won’t return the right parameters at all. Why is this?

After some digging, I found the code responsible for Java’s Math.Random() method:

/**
     * The internal state associated with this pseudorandom number generator.
     * (The specs for the methods in this class describe the ongoing
     * computation of this value.)
     */
    private final AtomicLong seed;

    private static final long multiplier = 0x5DEECE66DL; // 25214903917
    private static final long addend = 0xBL; // 11
    private static final long mask = (1L << 48) - 1; // 1L is 1, shift to left 48 units for 10000000.... then -1 to get 01111... (48 times)

    protected int next(int bits) {
        long oldseed, nextseed;
        AtomicLong seed = this.seed;
        do {
            oldseed = seed.get();
            nextseed = (oldseed * multiplier + addend) & mask;
        } while (!seed.compareAndSet(oldseed, nextseed));
        return (int)(nextseed >>> (48 - bits)); // shift nextseed by 48-bits right ; fill the left with 48-bits zeroes
    }

    @Override
    public long nextLong() {
        // it's okay that the bottom word remains signed.
        return ((long)(next(32)) << 32) + next(32); //gens a 64 bit number, upper half is next(32), lower half is next(32) (second call)
    }

    default double nextDouble() {
        return (nextLong() >>> (Double.SIZE - Double.PRECISION)) * 0x1.0p-53; //extract top 53 bits only, and multiply by 2exp-53 to map to a decimal in 0,1
    }
// the math class then calls this nextdouble method for .Random()

So much processing of next(bits)! The makers of Java’s LCG probably realized that LCGs… well, aren’t the most secure, and decided to add several layers of processing to make it a bit more secure;

Despite all this processing, however, the output is still not random.

Frieze, Kannan, Lagarias outlined an algorithm in a 1984 IEEE presentation which mathematically proved that all LCGs are predictable in polynomial time, given that two fifths of the original sequence’s leading bits were provided. This built on prior solutions to breaking LCGs with all the bits of the outputs, and demonstrated that—contrary to prior belief— manipulation of LCG outputs does NOT make them secure.

Frieze, Kannan, Lagarias, et al. later went on to strengthen their result to any linear congruences manipulating the outputs in 1988.

Conclusion

Well wasn’t that one hell of a trip? We’ve just built a Java class to break LCGs!

What I hope to convey through this post is the immense power that a solid grasp of mathematics— in this case, number theory— can have in Computer Science.

If you’ve taken a look at the timeline on my home page, you’ll know that my passion for math predates my interest in CS. But once I saw how seamlessly math and CS complement14 each other, I found myself yearning to dive deeper into the intersection15 between these two fields. It’s a space that continues to fuel my curiosity, and I’m excited to explore it further.

  1. $[0,1)\to (0,b-a+1)\to (a,b+1) \to [a,b]$ 

  2. See Floating-point arithmetic accuracy problems 

  3.         public static int lcg(int x, int a, int b, int m){
                return (a*x+b)%m;}
    
    
        int state=23;
        int xn=state;
        int count=0;
        while ((count==0) || (xn!=state)){
            xn = lcg(xn,13,5,48);
            count++;
        }
        System.out.println(count);
    

  4. See Knuth’s The Art of Computer Programming Chapter 3.2, in which he proves this fact. 

  5. Reeds’ original note does not develop a solid algorithm to break LCGs, it only serves to demonstrate that they are breakable. 

  6. See the Euclidean Algorithm

  7. To show that $\gcd(ma,mb)=m\gcd(a,b)$ holds, consider Bezout’s Identity (to prove this, use the division algorithm!), which states that $\gcd(a,b)=ax+by$ for some $x,y\in\mathbb{Z}$. Then;

    $$\gcd(ma,mb)=max+mby=m(ax+by)=m\gcd(a,b)$$

  8. They share no prime factors, ie: their $\gcd()$ is 1, which is exactly what we have. 

  9. See Probability of coprimality of many integers 

  10. I also have a ground-breaking proof of this problem but its too complicated to $\LaTeX$ and won’t fit in the footnotes anyway. 

  11. Of course, this requires at least 2 outputs of $Q_n$, meaning it requires 5 LCG outputs, with which it will break the LCG with roughly 60% certainty $\left(\frac{6}{\pi^2}\right)$. With 6 outputs, this becomes $\frac{1}{\zeta(3)}\approx 83\%$ 

  12. This can also done using an extension of the Euclidean algorithm called the Extended Euclidean Algorithm. It does this by noting that since the inverse $X$ of $a \pmod m$ exists iff:

    \begin{align*} aX &= 1 &&\pmod m\\ \implies \frac{aX-1}{m}&=k\implies aX+m(-k)=1 \end{align*}

    which prompts the usage of Bezout’s identity to find the coefficients through recording the remainders of the algorithm. If the coprimality clause is not met, then one can adjust the formula through algebra and try again. Perhaps I shall make a blog post building up the modular inverse from the division algorithm..? 

  13. How the extended GCD algorithm works, in short, is by reducing our problem to $\gcd(b, b\pmod a)$ (verify this is the same as $\gcd(a,b)$ using Definition of modular arithmetics).

    \begin{align*} \gcd(b, b\pmod a)&=\gcd(a,b)\\ &= bx'+(b\pmod a)y'\\ &=bx'+\left(a-\lfloor a/b \rfloor b\right)y'\\ &=bx'+ay'-b\lfloor a/b \rfloor y'\\ &=ay'+b\left(x'-\lfloor a/b \rfloor y'\right)\\ \gcd(a,b)&=ay'+b\left(x'-\lfloor a/b \rfloor y'\right) \end{align*}

    where $x’, y’$ are the bezout coefficients of the subproblem. We know this will always terminate at {gcd, 1, 0}, since $x’=1, y’=0$ because $\gcd(a,0)=a=a(1)+0(0)$, and so we exploit that to find the rest, moving upwards. 

  14. math pun! 

  15. ok, this is the last math pun i promise..