Thursday, March 20, 2008

Extended Euclidean Algorithm Factorization Method

Extended Euclidean Algorithm Factorization Method

The idea for this method came from the desire to minimize the work involved in factoring a number N using a Fermat difference of squares method which uses a factor base that can become large for large N. In that process, we discovered a method to factorize N that looks very simple and very promising. It is based on the solution of linear Diophantine equations using the extended Euclidean algorithm (EEA) and subsequent quadratic equations.

Method

We start with the usual equation N=p*q in which p and q are primes with p>q for simplicity. As it stands, this equation is not very useful if we were asked to factorize N. The idea was thus to look for a way to represent N or equivalently p and q in a form that is better suited for the factorization problem. It turned out that an arithmetic progression representation of p and q is the best possible representation for the problem at hand, namely, the factorization of N=p*q. The first one that comes to mind is the 6k±1 because any prime number, except the first two (2 and 3), can be represented as an element of this progression.

Before we describe the method, we will present the notation that will be used.

It turned out that it is useful to further split the progressions 6k±1 into 3 different ones, each on the basis of the sum of digits (Sd) property of the integers. It should be noted that the Sd is periodic and has a period of 3 within each of these two 6k±1 arithmetic progressions. This is done to group the elements with the same Sd in the same progression as seen below.

The 6k+1 can be transformed into:

F1 = 7 + 18k with Sd =7

F2 = 13 + 18k with Sd =4

F3 = 19 + 18k with Sd =1

and the 6k-1 can be transformed into:

F4 = 5 + 18k with Sd =5

F5 = 11 + 18k with Sd =2

F6 = 17 + 18k with Sd =8

We now give the multiplication tables for the sum of digits M1, M2 and M3 representing all the possible multiplications of numbers of the form:

M1 = (6k1 + 1)*(6k2 + 1)

M2 = (6k1 - 1)*(6k2 - 1)

M3 = (6k1 - 1)*(6k2 + 1)

M1

1

4

7

1

1



4

4

7


7

7

1

4

M2

2

5

8

2

4



5

1

7


8

7

4

1

M3

1

4

7

2

2

8

5

5

5

2

8

8

8

5

2

Looking at these matrices, we see that there are 4 ways to generate a number with a sum of digits of 1, 4 or 7 but only 3 ways for a number with Sd of 2, 5 or 8.

We will provide few examples to show how the method works. We are looking to factorize N=p*q with a given sum of digits. We will provide an example for each sum of digits Sd.

Example 1 SD=7 N=31*373=11563

We know that N with Sd=7 could have been the product of a 1*7, a 7*1, a 4*4, a 5*5 or a 8*2 pair of (p,q). Let us assume that it is a 4*4 product. N can thus be represented as:

N = (13 + 18*k1)*(13 + 18*k2)

= 132 + 18*13(k1 + k2) + 182*k1*k2

We can write L = (N - 132)/18 and get L = 13*(k1 + k2) + 18*k1*k2.

At this point, we let x = k1 + k2 and y = k1*k2 and we end up with the equation

L = 13x + 18y

It is a simple linear Diophantine equation the solution of which will provide us with a pair of (x,y) which can then be used to find the indices k1 and k2 and build the factors p and q from them. We require of x and y to have the same sign and to be integers.

All the calculations were done on Dario Alejandro Alpern’s web site

http://www.alpertron.com.ar/QUAD.HTM

L = 13x + 18y becomes 633 = 13x + 18y

The solution is:

x = 21 + 18t

y = 20 - 13t

Whenever possible, we always try the pair (x,y) with t=0. In our case we have

x = 21 = k1 + k2

y = 20 = k1* k2

Solving for k1, we get the quadratic equation

k12 – 21*k1 + 20 = 0,

which admits solutions k=1, 20. We use these values to build the factors

q = 13 + 18*k1 = 13 + 18*1 = 31

p = 13 + 18*k2 = 13 + 18*20 = 373

At this point, we have factored N=11563=31*373. In theory, we should also have tried all the possible combinations of sum of digits products that can lead to N. We skip this step at this time.

We will make the important remark that had we used the simple hyperbolic equation

18k1*k2 + 13k1 + 13k2 = 633, we would have gotten k1 and k2 directly. However, this way does not really factorize N but uses instead the factorization of N as a prerequisite to find the integer solutions to the hyperbolic equation.

At this point, we will show how easy it is to factor N when it is of the form 1*1, 4*4 or 7*7, 2*2 or 8*8. But before we will say a word about what this method is about and one on what this method is not about.

What this method is not about:

It is not about setting the quadratic equation with the parameter t by solving

x = 21 + 18t = k1 + k2

y = 20 - 13t = k1* k2

to get the following quadratic equation in (k,t)

k12 – (21 + 18t)*k1 + 20 – 13t = 0

then solving it in terms of t by requiring the discriminant

(21+18t)^2-4*(20-13t) to be a perfect square. This will lead us to a difference of squares equation which will not help us factor N. It brings to mind a picture of a cat chasing its tail. To factor N, we transform the original problem into finding a solution of a linear Diophantine equation in terms of a parameter t. To find the correct value of the parameter t, we setup a quadratic equation that will lead us right back to where we started: solving a Fermat difference of squares equation to factor N.

What this method is about:

Once the linear Diophantine equation has been setup and solved in term of the parameter t, we use approximations to x and y together with the fact that (x,y) must be positive (or at least have the same sign) to find the interval of the allowed values of t. Once we have the interval for t, we use the lowest value of t (very often a negative value) to start the iterative process with the quadratic equation.

In the previous example, the condition of (x,y) positive would have required t to be

-1 ≤ t 1

So, the iteration should have been started with t=-1 and the discriminant of the quadratic equation will not have been found to be a perfect square. We then will have tried the value t=0 which would have been found to make the discriminant a perfect square.

Later on, we will provide few very good approximations to x,y which in general will restrict the t interval to being as small as possible thus making it possible for the iterative process to converge to the correct value of k1 and k2 after trying few values of t.

Example 2 SD=1 N=73*199=14527

Again, we will not consider all the possible cases and instead will assume the number N to be of the form 1*1. It can therefore be represented as

N = (19 + 18k1)*(19 + 18k2) = 192 + 18*19*(k1 + k2) + 182*k1*k2

L = (N - 192)/18 = 787 = 19(k1 + k2) + 18*k1k2 and again

787 = 19x + 18y

The solution is

x = 13 + 18t

y = 30 - 19t

Again, we always try the simplest solution, the one with t=0 and we get

x = k1 + k2 = 13

y = k1*k2 = 30

and we get the quadratic equation k12 – 13*k1 + 30 = 0 with solutions

k1=3, 30 so that we get for the factors p and q

q = 19 + 3*18 = 73

p = 19 + 10*18 = 199

We will make the remark that it looks like if the number is a 1*1, 4*4, 7*7….the t=0 is always the solution. But in fact, it is not always the case.

Example 3 SD=4 N=97*223=21631

We again skip the step that will show that the products 1*1, 2*2 or 8*5 cannot produce the solution. N, being a number with Sd=4 can be represented as a 7*7 product

N = (7 + 18*k1)*(7 + 18*k2)

= 49 + 18*7*(k1 + k2) + 182*k1*k2

again L = (N - 49)/18 = 7x + 18y

1199 = 7x + 18y with solution

x = 17 + 18t

y = 60 – 7t

Again, we try the t=0 solution

x = k1 + k2 = 17

y = k1*k2 = 60 and we get the quadratic

k12 – 17*k1 + 60 = 0 giving

k1=5, 12 and

q = 7 + 5*18 = 97

p = 7 + 12*18 = 223

We see that it seems easy to factor numbers coming from products of pairs of (p,q) with sum of digits of the type i*i , the diagonal elements of M1, M2 and M3. These types of numbers represent 9 elements out of a total of 21 or 3/7.

We will give yet another way to find a lower and upper bound for t when dealing with this type of numbers ( Sd=i*i) thru an example.

Let’s look at N=145*289=41905, a Sd=1 number which can be represented as

N = (19 + 18k1)*(19 + 18k2) with solutions

x = 58 + 18t

y = 67 – 19t

In this case, restricting (x,y) to positive values leads to


x = 58 + 18t > 0 or t > -58/18=-3

y = 67 – 19t > 0 or t < 19="3

-3 ≤ t 3

It is the best possible bound when requiring (x,y) > 0.

However, √N = 204 so (204-19)/18 = 10.2 = k where k is an approximate value of k1 or k2. So we can say that 102 < y =" k1*k2 <>2. Now we need to determine the bounds from these inequalities.

67 – 19t > 100

67 – 19t <>

which gives t=-2 as the only possible value. This t value is the exact value as can be easily checked. Of course, we should have checked that N does not admit any other sum of digits representation.

We should always determine the t interval using both the condition that the pair (x,y) must be positive (or of the same sign) and the one providing an upper and lower bound for y=k1*k2 and use whichever is more convenient.

At this point we need to show that the linear Diophantine equation method works just as well for cross products in the sum of digits, ie numbers of the form 1*7, 1*4, 2*5…

Example 4 SD=1=4*7 N=49*61=2989

N can be represented as

N = (13 + 18*k1)*(7 + 18*k2)

= 13*7 + 18*(13k2 + 7k1) + 182*k1*k2

L = (N-91)/18 = 13k2 + 7k1 + 18k1*k2

= 7k1 + 7k2 + 6k2 + 18k1*k2

= 7(k1 + k2) + 6k2*[1 + 3k1]

161 = 7x + 6y whose solution is

x = 11 + 6t

y = 14 - 7t

The t=0 solution is not guaranteed to work but it should always be tested as long as the pair (x,y) has the same sign for t=0.

We require x = 11 + 6t > 0 or t ≥ -1. We also require y = 14 - 7t > 0 or t ≤ 2. That leaves only the values of t=2, 0, -1 which will guarantee a positive sign (or the same sign) for the pair (x,y). We cheat a bit and try the value t=-1 to get x=5 and y=21 and the quadratic equation

3k12 – 11k1 + 16 = 0 with solutions

k1=2, 3 so that

q = 13 + 2*18 = 49

p = 7 + 3*18 = 61

Since N=49*61 is a Sd=1 number, it could also have been the product a 1*1 pair (p,q)

(a 8*8, a 5*2). At this point, we have to look at that possibility before we can rule it out.

N = (19 + 18k1)*(19 + 18k2)

= 192 + 18*19*(k1 + k2) + 182*k1k2

L = (N - 192)/18 = 146 = 19(k1 + k2) + 18*k1k2 or

146 = 19x + 18 y whose solution is

x = 2 + 18t

y = 6 – 19t

We can check that the value of (x,y) for t=0 is not the solution by solving the corresponding quadratic and we will find out that it is not. So we must look for a value of t that may provide us with a solution.

We know that x = 2 + 18t >0 so t≥0 and y = 6 - 19t >0 or t≤0. This basically rules out any possible value of t and we simply conclude that the equation does not admit a solution and consequently that N is not a 1*1 number. We can rule out the other sum of digit possibilities just like we did for the 1*1 before we can safely conclude that we have found the solution. We know that the factorization of a number N into its prime factors is unique so if we run into the factors before we are done testing all the possibilities, we can stop.

We make the remark that determining the correct value of t that may lead to a good pair of (x,y) is the hardest thing of this method. We will now present a way that will help narrow down even further the possible values of t.

Consider the previous example of N=49*61= 2989.

We can write L = 7k1 + 13k2 + 18*k1k2 like we did to get

L = 7(k1 + k2) + 6k1*[1 + 3k2]

L = 7x + 6y

But also

L = 13k1 – 6k1 + 13k2 + 18*k1k2 to get

L = 13(k1 + k2) + 6k1[3k2 - 1]

L = 13x + 6y

So we end up with two linear Diophantine equations for the same number N

161 = 7x + 6y

161 = 13x + 6y

with solutions

x = 11 + 6t

y = 14 - 7t

and

x = 11 + 6t

y = 3 - 13t

It is easy to see that since x = k1 + k2, x must have the same value in both equations, y does not have to and in general it does not. So again, using the conditions on the sign of the pair (x,y) we can determine that the value of t=-1 is a good one to try. We do and find out that it leads to integer values of (x,y) and to integer values of the indices k1 and k2.

In the most general case, x, while it must have the same value since it represents the value of k1 + k2 for the same number N, does not necessarily have the same expression in both solutions like above. In that case, we get one more equation to use as a supplementary condition on t. The example of N = 61*301 is such an example.

We get x = 67 + 6t and x = 85 + 6t, which when equated provide the equation

67 + 6t = 85 + 6t’ or t = t’ +3. This simple fact can be used to redefine the allowed values of t using the allowed values of t’. It could save some computational time if the t interval is reduced.

Could we have used the arithmetic progressions 6k±1 to factor N? The answer is yes. In fact, we can use any arithmetic progression. Some offer advantages over others. We will give a simple example for the 6k+1 and point to one important advantage over the previous method.

In general, N = (6k1 + 1)*(6k2 + 1) = 36k1k2 + 6(k1 + k2) + 1.

Case of N = (6*24+1)*(6*42+1) = 145*253 = 36685

In the notation of the previous method, this number would be a 1*1 which can be represented as:

N = (19 + 7*18)*(19 + 13*18) = 36685

The solution is

x = 56 + 18t

y = 53 – 19t

which admit integer solutions for (x,y) for t=-2. Now, going back to the 6k+1 method,

we can write L = (N-1)/6 = 6k1k2 + k1 + k2 = 6y + x = 6114

We feed this equation to the web solver and get:

x = 168 + 6t

y = 991 - t

We use the condition that (x,y) must have the same sign and get:

x > 0 or t > -168/6 or t > -28

y > 0 or t <>

At this point, we know that -28 <>1*k2 and thus the contribution from it to L (or N) is much larger than that of x = k1+k2. This tells us that 991 is much closer to the real value of y than is 168 to x. Consequently, we always start checking for a solution from the negative values of t. In fact, for composite numbers (large N), a t=tmax can be determined using the above fact in the following way

y >> x or 991 – t >> 168 + 6t giving

tmax = 117 and we can replace the previous condition on t by the new one

-28 <>

but I am not sure if we can declare N prime if we can’t find a solution with integer k1 and k2 after having tested all values of t below tmax.

The value of t that solves the quadratic equation for k1 or k2 is t=-17.

The exact values of x and y are

x = 168 + 6*(-17) = 66 = 24 + 42

y = 991 – (-17) = 1008 = 24*42

If we could find a better approximation to y than y~(N-1)/36, we could get a better value for t since everything in this method hinges on the value of t. The narrower the t interval, the better. It turned out that there is such an approximation to y.

N = 36y + 6x +1

However, setting x = k1 + k2 to 0 to get an approximate y is not good enough.

Remember q=6k1 + 1 and p=6k2 + 1 so x becomes x = (p+q-2)/6 = k1 + k2. But we know that (p+q) > 2*√N and x becomes x ~ (2*√N – 2)/6. N can thus be rewritten as

N ~ 36y + 2*√N – 2 + 1 from which we get y = [(N + 1) – 2*√N]/36.

This is the best possible approximation for y which we will call y(app). For the previous example y(app) is

y(app) = (36685 +1 – 2*191)/36 = 1008 so we can write

y = 991 – t ≤ 1008 or t ≥ -17 which in this case happens to be the exact value of t as we have seen before.

Before we conclude, let’s recap the different approximations to k1, k2 and k1*k2.

For the 6k±1, we have just seen that:

y(app) = k1*k2 = [(N + 1) ± 2*√N]/36 with a + sign for the 6k+1 and a – sign for the 6k-1.

For the i*i sum of digits (the diagonal elements of the sum of digits), we have:

k = (√N – b(i))/18 with b(i)=5,7,11,13,17,19 and y(app) becomes

k^2 ≤ y(app)= k1*k2 ≤ (k+1)^2

For the i*j sum of digits (the cross products in the sum of digits), we will give only one, that of the 4*7 as they are easy to derive. However, remember that y is not equal to k1*k2 but to y = k(1+3k). As before k can be approximated by

k = (√N – b(i))/18 with b(i)=5,7,11,13,17,19 and y(app) becomes

k(1+3k) ≤ y(app) = 6k1*[1 + 3k2] ≤ (k+1)(1+3(k+1))

Finally, there is always the condition that the pair (x,y) must be positive. All these conditions severely restrict the interval of allowed values of t and therefore minimize the computational effort.

It is easy to see that for the i*i sum of digits, the length of the k interval is

(k+1)^2 - k^2 = 2k+1

and that of the 4*7 sum of digits is

(k+1)(1+3(k+1)) – k(1+3k) = 2(3k+2)

However, the number of t iterations is always much smaller than the length of the k interval as we have seen in every example. We recommend to always start the iterations with the lower bound of the parameter t (the most negative or the least positive).

Remember, we do not really need to check the values of t that make x = k1 + k2 ≤ 2*k

Where k is an approximate k calculated by anyone of the above approximations (ie the lower bound value of k used to calculate y(app)). This will cut the number of quadratic equations to check for a given value of (x,y). It is always useful to use the condition x = k1 + k2 ≥ 2*k because it may further shrink the t interval and provide us with a better starting point than the most negative t value.

In conclusion, it was shown that factorization of a number N can be mapped into finding integer solutions to a set of linear Diophantine equations and subsequent quadratic equations. This method seems to be well adapted to handling large numbers that are so important for some well known encryption methods. Going from a simple 3 digit number to a 1000 digit number will not involve more than solving the same number of linear Diophantine equations using the well known extended (or generalized) Euclidean algorithm and few quadratic equations. We have also provided a way to get a good starting guess for the parameter t which makes this method very attractive as a factorization method. However, at this point, more work needs to be done to determine the scaling of the method when it is used for large N. What is left to be done cannot be done by hand, so I am appealing to the community of mathematicians for help in analyzing and programming this method.