Thursday, April 3, 2008

Fermat Factorization Method Revisited

Abstract:

A simple method using unknown properties of the Fermat factorization method will be described below. This method exploits patterns in numbers. It will be shown that Fermat’s method extends well beyond what has been recognized it can do.

The Method

We start with the usual equation

N=p*q

in which p and q are primes with p>q for simplicity. As it stands, this equation is not very useful if we were asked to factorize N. The idea was thus to look for a way to represent N or equivalently p and q in a form that is better suited for the factorization problem.

The starting point for the search for that representation is the well known arithmetic progressions 6*k+1 and 6*k-1. These two progressions represent all the numbers, primes or composites, except the multiples of the first two primes 2 and 3.

6*k+1=7, 13, 19,…

6*k-1=5, 11, 17,…

Within one of these two progressions, the difference between two numbers is always a multiple of 6 since we can write

(6*k1+1) - (6*k2+1) = 6*(k1-k2)

and

(6*k1-1) - (6*k2-1) = 6*(k1-k2)

We can therefore write

P=q+6*k with k an integer which can take any integer value. Using this simple fact, we can rewrite N=p*q as N = (q+6*k)*q=q2 + 6*k*q and we get the usual quadratic equation:

q2+6kq-N=0

whose well known solutions can be written as:

k± =-3 ± √( 9k2+N)

For k± to be an integer, 9*k2+N must be a perfect square. So we set it equal to

9k2+N = m2

This equation is not valid for numbers N of the form

N = (6k1-1)*(6k2+1) since the difference between p and q cannot be written as a multiple of 6. However, It is easy to transform any number of the form N=6*k-1 into a number of the form N=6*k+1 simply by multiplying it by 5 to get a new number N for which the previous equation applies.

Numbers which are multiples of 2 and 3 will not be considered here either because it is easy to extract factors of 2 and 3 and then apply the above method if the result is a number of the form N = 6k+1. This issue will not be revisited.

Since p=q+6k, we can write also k=(p-q)/6 and deduce that m=(p+q)/2 from 9k2+N = m2 with p=m+3k and q=m-3k. The equation 9k2 + N = m2 can be rewritten

N=m2-9k2

This is the well known Fermat’s difference of squares equation. This equation tells us that we could factor N if we knew (p+q) or (p-q) or at least the way these two quantities vary. It turned out that (p+q) and (p-q) variations are not random or erratic but obey a simple law with the precision of a metronome.

Before we describe these variations, we will present another element which is important for the factorization problem. It turned out that it is useful to further split the progressions 6k±1 into 3 different ones, each on the basis of the sum of digits (Sd) property of the integers. It should be noted that the Sd is periodic and has a period of 3.

The 6k+1 can be transformed into:

F(1)=7 + 18k with Sd=7

F(2)=13 + 18k with Sd=4

F(3)=19 + 18k with Sd=1

and the 6k-1 can be transformed into:

F(4)=5 + 18k with Sd=5

F(5)=11 + 18k with Sd=2

F(6)=17 + 18k with Sd=8

We now give the multiplication tables for the sum of digits M1, M2 and M3 representing all the possible multiplications of numbers of the form:

M1=(6k1+1)*(6k2 +1)

M2=(6k1-1)*(6k2-1)

M3=(6k1-1)*(6k2+1)

M1

1

4

7

1

1

4

4

7

7

7

1

4

M2

2

5

8

2

4

5

1

7

8

7

4

1

M3

1

4

7

2

2

8

5

5

5

2

8

8

8

5

2

Looking at these matrices, we see that there are 4 ways to generate a number with a sum of digits of 1, 4 or 7 but only 3 ways for a number with Sd of 2, 5 or 8.

We are now ready to show how the quantities (p+q) and (p-q) vary. We will use tables to illustrate that. These tables are of course incomplete since the process can be continued indefinitely.

Table 1 for Sd=1

p

q

N

m=(p+q)/2

k=(p-q)/6

7

13

91

10

1

7

31

217

19

4

7

49

343

28

7

7

67

469

37

10

Table 2 for Sd=7

p

q

N

m=(p+q)/2

k=(p-q)/6

7

19

133

13

2

7

37

259

22

5

7

55

385

31

8

7

91

637

49

14

Table 3 for Sd=4

p

q

N

m=(p+q)/2

k=(p-q)/6

7

25

175

16

3

7

43

301

25

6

7

61

427

34

9

7

79

553

43

12

7

97

679

52

15

Tables 4, 5 and 6 and 7, 8 and 9 will show the same pattern regardless of the sum of digits of N. The value of (p+q)/2 increases by 9 and that of k increases by 3 within a given table. This simply means that regardless of the number p and q used to produce N, the value of (p+q) and (p-q) can only take a specific value. This property is very important when we are trying to factor a number N since it severely limits the allowed values for (p+q) and (p-q).

We are now ready to give an example of factorization of N=31*373=11563.

Sd(n)=7 so N can only be the product of a 4*4, a 1*7 or a 7*1 p and q values on one hand and of a possible 8*2 or 5*5 number. The last two are usually easily eliminated.

At this point, we need to guess a starting value for m. Since (p+q)> 2√N, it is easy to see that m>√N. √N here is the integer value of √N.

Since √N=107, we get

m=m(07)+9*i>107 with m(07) is the starting value of m for the very first product for Sd=7. The value of m(07) is 13. We know that all subsequent values of m can be calculated by using m=m(07)+9*i with i given by i>(107-13)/9=10.44 but since i must be an integer, we round if off to 11.

We get for m, m=13+9*11=112 and since m=(p+q)/2=112, we get the following for the allowed values of (p+q), p+q=2*112=224.

At this point, the factor base is uniquely determined. It consists of pairs whose sum adds up to 224. For clarity, we list them below.

q(i)

p(i)

1

223

7

217

13

211

19

205

25

199

31

193

37

187

43

181

49

175

55

169

61

163

67

157

73

151

79

145

85

139

91

133

97

127

103

121

we stop when we reach the last q< √N. In practice, few more pairs can be discarded since their product is bigger than N and consequently, q cannot be a factor. In the example above, 85*139=11815 > N=11563 so there is really no need to include pairs with q>85.

At this point, we know that q, the smaller factor of N, is among the q(i) of the factor base.

For clarity, we regroup the pairs according to the sum of digits in three different groups or tables.

Table 1 Sd=1*7=7

q(i)

p(i)

m

k(i)=(p-q)/6

1

223

112

37

19

205

112

31

37

187

112

25

55

169

112

19

73

151

112

13

Table 2 Sd=4*4=7

q(i)

p(i)

m

k(i)=(p-q)/6

13

211

112

33

31

193

112

27

49

175

112

21

67

157

112

15

85

139

112

9

Table 3 Sd=7*1=7

q(i)

p(i)

m

k(i)=(p-q)/6

7

217

112

35

25

199

112

29

43

181

112

23

61

163

112

17

79

145

112

11

Note that for the same value of m, the value of k decreases by 6 when q increases by 18.

We can at this point evaluate the GCD(N, q(i)) to find the factor q but it takes a lot of work to check every q(i).

To decrease the amount of work needed to factor N by evaluating the GCD, we can do some more preparatory work and use the parameter k=(p-q)/6 introduced earlier. Let’s go back to the original equation

N+9k2=m2

We now know that, as a general rule, m increases by 9 and k increases by 3 so we can write:

N = (m+9*n)2-9*(k+3*n)2 for some unknown (m,k,n)

And if we had a pair (m,k) that is in sync, we will be done because this equation is linear in n. We can rewrite the equation as:

N = (m+9n-9n-3k)(m+9n+3k+9n)=(m-3k)(m+3k+18n)

Remember m-3k=q(i) and m+3k=p(i) and m+3k+18n=p(i)+18n=p

Every pair (q(i),p(i)) of the factor base produces an N(i)=q(i)*p(i). For one and only one of these pairs, q=q(i)= the smaller factor. For that pair we can write:

N-N(i)=qp-q(i)p(i)=q(i)p-q(i)p(i)=q(i)[p-p(i)]=q(i)*18n since p-p(i)=18n

So the previous equation can be rewritten as:

N = q(i)*[p(i)+18n]=q*[p(i)+18n]

We need to determine a starting point for the iterations, which in this method, is simply the evaluation of the GCD(N, q(i)).

We already have a good value for m, namely the one we started with and used to calculate the factor base. What we need is a way to calculate the corresponding value of the parameter k that is in sync with m.

We start with calculating k from

9k2=m2-N

For N=31*373=11563 and m=112, we get

9k2=1122-11563=981 and k=10.44. This tells us right away that this k, not being an integer, is not the right k. But it does tell us that:

this k is below that of the pair (73,151) in Table 1,

this k is between the k of the pair (85,139)=9 and that of the pair (67,157)=15 in Table 2.

this k is below that of the pair (79,145) in Table 3.

At this point, we can use the higher k values as a starting point in our GCD calculation.

We know that:

k=13 and m=112 are in sync in Table1,

k=15 and m=112 are in sync in Table 2 and

k=11 and m=112 are in sync in Table 3.

But we do not want to waste any time trying values of n that we know do not fit.

For example, the value of n=0 simply means that (p,q) ( the real factors) are one pair among the factor base and we know that when we calculate k like above and we will get a perfect square for 9k^2 and an integer value for k. The number N=10117 is such a number.

So when we calculate k like above and do not get an integer value, we can use the following equation to calculate k to make sure we discard all pairs of (p(i),q(i)) that do not even give an n=1

9*(k+3*1)2=(m+9*1)2-N

In our case we get k+3= √[1212-11563]/9 =18.49 or k=18.49-3=15.49

We know that this value is not correct but it gives us a starting point for k since its value is between two values that are in sync with m.

13 ≤ k ≤ 19

15 ≤ k ≤ 21

11 ≤ k ≤ 17

So instead of starting at k=13, 15, 11 as calculated previously, we can now start at k=19, 21, 17 within each table. We always round off k to the largest value.

We use the equation

N = (m-3k(i))(m+3k(i)+18n)

We can plug in the values of k=19, 21, 17 which are all a good starting point within every table and calculate n or simply calculate GCD(N, q(i)=m-3k(i)) to check if q(i) is a factor.

In our example,

q(1)=112-3*19=55 but the GCD will tell us 55 is not a factor,

q(2)=112-3*21=49 but the GCD will tell us 49 is not a factor,

q(3)=112-3*17=61 but the GCD will tell us 61 is not a factor.

So we increase k by 6 (as k increases by 6 within a given table) and calculate the next three GCD’s to find that:

GCD(N, 112-3*25)=GCD(N,67) which is not a factor,

GCD(N, 112-3*27)=GCD(N,31)=31 and this tells us that 31 is the small factor q. The GCD(N,112-3*23)=GCD(N,43) which is not a factor.

In fact, we did not even need to calculate n. So we know that the pair (q(i),p(i))=(31,193) is the witness pair for the real factors (q,p)=(31,373).

Once we have the factor q, we can calculate p from N or we determine n and calculate it from p=m+3k + 18n=112+3*27+18n=193+180=373. Remember N-N(i)=q(i)*18n so we can easily calculate n from it.

n=[N-N(i)]/(18*q(i))=[11593-31*193]/(18*31)=10

At this point the number N has been factored into N=31*373.

We conclude by saying that it is not known at this point how well this method will perform with large N (in the hundred of digits) but we think it is interesting to find out. Doing calculations by hand is very tedious and when N is large, it is simply out of the question. We would therefore appreciate any feedback from the professionals.

Thursday, March 20, 2008

Extended Euclidean Algorithm Factorization Method

Extended Euclidean Algorithm Factorization Method

The idea for this method came from the desire to minimize the work involved in factoring a number N using a Fermat difference of squares method which uses a factor base that can become large for large N. In that process, we discovered a method to factorize N that looks very simple and very promising. It is based on the solution of linear Diophantine equations using the extended Euclidean algorithm (EEA) and subsequent quadratic equations.

Method

We start with the usual equation N=p*q in which p and q are primes with p>q for simplicity. As it stands, this equation is not very useful if we were asked to factorize N. The idea was thus to look for a way to represent N or equivalently p and q in a form that is better suited for the factorization problem. It turned out that an arithmetic progression representation of p and q is the best possible representation for the problem at hand, namely, the factorization of N=p*q. The first one that comes to mind is the 6k±1 because any prime number, except the first two (2 and 3), can be represented as an element of this progression.

Before we describe the method, we will present the notation that will be used.

It turned out that it is useful to further split the progressions 6k±1 into 3 different ones, each on the basis of the sum of digits (Sd) property of the integers. It should be noted that the Sd is periodic and has a period of 3 within each of these two 6k±1 arithmetic progressions. This is done to group the elements with the same Sd in the same progression as seen below.

The 6k+1 can be transformed into:

F1 = 7 + 18k with Sd =7

F2 = 13 + 18k with Sd =4

F3 = 19 + 18k with Sd =1

and the 6k-1 can be transformed into:

F4 = 5 + 18k with Sd =5

F5 = 11 + 18k with Sd =2

F6 = 17 + 18k with Sd =8

We now give the multiplication tables for the sum of digits M1, M2 and M3 representing all the possible multiplications of numbers of the form:

M1 = (6k1 + 1)*(6k2 + 1)

M2 = (6k1 - 1)*(6k2 - 1)

M3 = (6k1 - 1)*(6k2 + 1)

M1

1

4

7

1

1



4

4

7


7

7

1

4

M2

2

5

8

2

4



5

1

7


8

7

4

1

M3

1

4

7

2

2

8

5

5

5

2

8

8

8

5

2

Looking at these matrices, we see that there are 4 ways to generate a number with a sum of digits of 1, 4 or 7 but only 3 ways for a number with Sd of 2, 5 or 8.

We will provide few examples to show how the method works. We are looking to factorize N=p*q with a given sum of digits. We will provide an example for each sum of digits Sd.

Example 1 SD=7 N=31*373=11563

We know that N with Sd=7 could have been the product of a 1*7, a 7*1, a 4*4, a 5*5 or a 8*2 pair of (p,q). Let us assume that it is a 4*4 product. N can thus be represented as:

N = (13 + 18*k1)*(13 + 18*k2)

= 132 + 18*13(k1 + k2) + 182*k1*k2

We can write L = (N - 132)/18 and get L = 13*(k1 + k2) + 18*k1*k2.

At this point, we let x = k1 + k2 and y = k1*k2 and we end up with the equation

L = 13x + 18y

It is a simple linear Diophantine equation the solution of which will provide us with a pair of (x,y) which can then be used to find the indices k1 and k2 and build the factors p and q from them. We require of x and y to have the same sign and to be integers.

All the calculations were done on Dario Alejandro Alpern’s web site

http://www.alpertron.com.ar/QUAD.HTM

L = 13x + 18y becomes 633 = 13x + 18y

The solution is:

x = 21 + 18t

y = 20 - 13t

Whenever possible, we always try the pair (x,y) with t=0. In our case we have

x = 21 = k1 + k2

y = 20 = k1* k2

Solving for k1, we get the quadratic equation

k12 – 21*k1 + 20 = 0,

which admits solutions k=1, 20. We use these values to build the factors

q = 13 + 18*k1 = 13 + 18*1 = 31

p = 13 + 18*k2 = 13 + 18*20 = 373

At this point, we have factored N=11563=31*373. In theory, we should also have tried all the possible combinations of sum of digits products that can lead to N. We skip this step at this time.

We will make the important remark that had we used the simple hyperbolic equation

18k1*k2 + 13k1 + 13k2 = 633, we would have gotten k1 and k2 directly. However, this way does not really factorize N but uses instead the factorization of N as a prerequisite to find the integer solutions to the hyperbolic equation.

At this point, we will show how easy it is to factor N when it is of the form 1*1, 4*4 or 7*7, 2*2 or 8*8. But before we will say a word about what this method is about and one on what this method is not about.

What this method is not about:

It is not about setting the quadratic equation with the parameter t by solving

x = 21 + 18t = k1 + k2

y = 20 - 13t = k1* k2

to get the following quadratic equation in (k,t)

k12 – (21 + 18t)*k1 + 20 – 13t = 0

then solving it in terms of t by requiring the discriminant

(21+18t)^2-4*(20-13t) to be a perfect square. This will lead us to a difference of squares equation which will not help us factor N. It brings to mind a picture of a cat chasing its tail. To factor N, we transform the original problem into finding a solution of a linear Diophantine equation in terms of a parameter t. To find the correct value of the parameter t, we setup a quadratic equation that will lead us right back to where we started: solving a Fermat difference of squares equation to factor N.

What this method is about:

Once the linear Diophantine equation has been setup and solved in term of the parameter t, we use approximations to x and y together with the fact that (x,y) must be positive (or at least have the same sign) to find the interval of the allowed values of t. Once we have the interval for t, we use the lowest value of t (very often a negative value) to start the iterative process with the quadratic equation.

In the previous example, the condition of (x,y) positive would have required t to be

-1 ≤ t 1

So, the iteration should have been started with t=-1 and the discriminant of the quadratic equation will not have been found to be a perfect square. We then will have tried the value t=0 which would have been found to make the discriminant a perfect square.

Later on, we will provide few very good approximations to x,y which in general will restrict the t interval to being as small as possible thus making it possible for the iterative process to converge to the correct value of k1 and k2 after trying few values of t.

Example 2 SD=1 N=73*199=14527

Again, we will not consider all the possible cases and instead will assume the number N to be of the form 1*1. It can therefore be represented as

N = (19 + 18k1)*(19 + 18k2) = 192 + 18*19*(k1 + k2) + 182*k1*k2

L = (N - 192)/18 = 787 = 19(k1 + k2) + 18*k1k2 and again

787 = 19x + 18y

The solution is

x = 13 + 18t

y = 30 - 19t

Again, we always try the simplest solution, the one with t=0 and we get

x = k1 + k2 = 13

y = k1*k2 = 30

and we get the quadratic equation k12 – 13*k1 + 30 = 0 with solutions

k1=3, 30 so that we get for the factors p and q

q = 19 + 3*18 = 73

p = 19 + 10*18 = 199

We will make the remark that it looks like if the number is a 1*1, 4*4, 7*7….the t=0 is always the solution. But in fact, it is not always the case.

Example 3 SD=4 N=97*223=21631

We again skip the step that will show that the products 1*1, 2*2 or 8*5 cannot produce the solution. N, being a number with Sd=4 can be represented as a 7*7 product

N = (7 + 18*k1)*(7 + 18*k2)

= 49 + 18*7*(k1 + k2) + 182*k1*k2

again L = (N - 49)/18 = 7x + 18y

1199 = 7x + 18y with solution

x = 17 + 18t

y = 60 – 7t

Again, we try the t=0 solution

x = k1 + k2 = 17

y = k1*k2 = 60 and we get the quadratic

k12 – 17*k1 + 60 = 0 giving

k1=5, 12 and

q = 7 + 5*18 = 97

p = 7 + 12*18 = 223

We see that it seems easy to factor numbers coming from products of pairs of (p,q) with sum of digits of the type i*i , the diagonal elements of M1, M2 and M3. These types of numbers represent 9 elements out of a total of 21 or 3/7.

We will give yet another way to find a lower and upper bound for t when dealing with this type of numbers ( Sd=i*i) thru an example.

Let’s look at N=145*289=41905, a Sd=1 number which can be represented as

N = (19 + 18k1)*(19 + 18k2) with solutions

x = 58 + 18t

y = 67 – 19t

In this case, restricting (x,y) to positive values leads to


x = 58 + 18t > 0 or t > -58/18=-3

y = 67 – 19t > 0 or t < 19="3

-3 ≤ t 3

It is the best possible bound when requiring (x,y) > 0.

However, √N = 204 so (204-19)/18 = 10.2 = k where k is an approximate value of k1 or k2. So we can say that 102 < y =" k1*k2 <>2. Now we need to determine the bounds from these inequalities.

67 – 19t > 100

67 – 19t <>

which gives t=-2 as the only possible value. This t value is the exact value as can be easily checked. Of course, we should have checked that N does not admit any other sum of digits representation.

We should always determine the t interval using both the condition that the pair (x,y) must be positive (or of the same sign) and the one providing an upper and lower bound for y=k1*k2 and use whichever is more convenient.

At this point we need to show that the linear Diophantine equation method works just as well for cross products in the sum of digits, ie numbers of the form 1*7, 1*4, 2*5…

Example 4 SD=1=4*7 N=49*61=2989

N can be represented as

N = (13 + 18*k1)*(7 + 18*k2)

= 13*7 + 18*(13k2 + 7k1) + 182*k1*k2

L = (N-91)/18 = 13k2 + 7k1 + 18k1*k2

= 7k1 + 7k2 + 6k2 + 18k1*k2

= 7(k1 + k2) + 6k2*[1 + 3k1]

161 = 7x + 6y whose solution is

x = 11 + 6t

y = 14 - 7t

The t=0 solution is not guaranteed to work but it should always be tested as long as the pair (x,y) has the same sign for t=0.

We require x = 11 + 6t > 0 or t ≥ -1. We also require y = 14 - 7t > 0 or t ≤ 2. That leaves only the values of t=2, 0, -1 which will guarantee a positive sign (or the same sign) for the pair (x,y). We cheat a bit and try the value t=-1 to get x=5 and y=21 and the quadratic equation

3k12 – 11k1 + 16 = 0 with solutions

k1=2, 3 so that

q = 13 + 2*18 = 49

p = 7 + 3*18 = 61

Since N=49*61 is a Sd=1 number, it could also have been the product a 1*1 pair (p,q)

(a 8*8, a 5*2). At this point, we have to look at that possibility before we can rule it out.

N = (19 + 18k1)*(19 + 18k2)

= 192 + 18*19*(k1 + k2) + 182*k1k2

L = (N - 192)/18 = 146 = 19(k1 + k2) + 18*k1k2 or

146 = 19x + 18 y whose solution is

x = 2 + 18t

y = 6 – 19t

We can check that the value of (x,y) for t=0 is not the solution by solving the corresponding quadratic and we will find out that it is not. So we must look for a value of t that may provide us with a solution.

We know that x = 2 + 18t >0 so t≥0 and y = 6 - 19t >0 or t≤0. This basically rules out any possible value of t and we simply conclude that the equation does not admit a solution and consequently that N is not a 1*1 number. We can rule out the other sum of digit possibilities just like we did for the 1*1 before we can safely conclude that we have found the solution. We know that the factorization of a number N into its prime factors is unique so if we run into the factors before we are done testing all the possibilities, we can stop.

We make the remark that determining the correct value of t that may lead to a good pair of (x,y) is the hardest thing of this method. We will now present a way that will help narrow down even further the possible values of t.

Consider the previous example of N=49*61= 2989.

We can write L = 7k1 + 13k2 + 18*k1k2 like we did to get

L = 7(k1 + k2) + 6k1*[1 + 3k2]

L = 7x + 6y

But also

L = 13k1 – 6k1 + 13k2 + 18*k1k2 to get

L = 13(k1 + k2) + 6k1[3k2 - 1]

L = 13x + 6y

So we end up with two linear Diophantine equations for the same number N

161 = 7x + 6y

161 = 13x + 6y

with solutions

x = 11 + 6t

y = 14 - 7t

and

x = 11 + 6t

y = 3 - 13t

It is easy to see that since x = k1 + k2, x must have the same value in both equations, y does not have to and in general it does not. So again, using the conditions on the sign of the pair (x,y) we can determine that the value of t=-1 is a good one to try. We do and find out that it leads to integer values of (x,y) and to integer values of the indices k1 and k2.

In the most general case, x, while it must have the same value since it represents the value of k1 + k2 for the same number N, does not necessarily have the same expression in both solutions like above. In that case, we get one more equation to use as a supplementary condition on t. The example of N = 61*301 is such an example.

We get x = 67 + 6t and x = 85 + 6t, which when equated provide the equation

67 + 6t = 85 + 6t’ or t = t’ +3. This simple fact can be used to redefine the allowed values of t using the allowed values of t’. It could save some computational time if the t interval is reduced.

Could we have used the arithmetic progressions 6k±1 to factor N? The answer is yes. In fact, we can use any arithmetic progression. Some offer advantages over others. We will give a simple example for the 6k+1 and point to one important advantage over the previous method.

In general, N = (6k1 + 1)*(6k2 + 1) = 36k1k2 + 6(k1 + k2) + 1.

Case of N = (6*24+1)*(6*42+1) = 145*253 = 36685

In the notation of the previous method, this number would be a 1*1 which can be represented as:

N = (19 + 7*18)*(19 + 13*18) = 36685

The solution is

x = 56 + 18t

y = 53 – 19t

which admit integer solutions for (x,y) for t=-2. Now, going back to the 6k+1 method,

we can write L = (N-1)/6 = 6k1k2 + k1 + k2 = 6y + x = 6114

We feed this equation to the web solver and get:

x = 168 + 6t

y = 991 - t

We use the condition that (x,y) must have the same sign and get:

x > 0 or t > -168/6 or t > -28

y > 0 or t <>

At this point, we know that -28 <>1*k2 and thus the contribution from it to L (or N) is much larger than that of x = k1+k2. This tells us that 991 is much closer to the real value of y than is 168 to x. Consequently, we always start checking for a solution from the negative values of t. In fact, for composite numbers (large N), a t=tmax can be determined using the above fact in the following way

y >> x or 991 – t >> 168 + 6t giving

tmax = 117 and we can replace the previous condition on t by the new one

-28 <>

but I am not sure if we can declare N prime if we can’t find a solution with integer k1 and k2 after having tested all values of t below tmax.

The value of t that solves the quadratic equation for k1 or k2 is t=-17.

The exact values of x and y are

x = 168 + 6*(-17) = 66 = 24 + 42

y = 991 – (-17) = 1008 = 24*42

If we could find a better approximation to y than y~(N-1)/36, we could get a better value for t since everything in this method hinges on the value of t. The narrower the t interval, the better. It turned out that there is such an approximation to y.

N = 36y + 6x +1

However, setting x = k1 + k2 to 0 to get an approximate y is not good enough.

Remember q=6k1 + 1 and p=6k2 + 1 so x becomes x = (p+q-2)/6 = k1 + k2. But we know that (p+q) > 2*√N and x becomes x ~ (2*√N – 2)/6. N can thus be rewritten as

N ~ 36y + 2*√N – 2 + 1 from which we get y = [(N + 1) – 2*√N]/36.

This is the best possible approximation for y which we will call y(app). For the previous example y(app) is

y(app) = (36685 +1 – 2*191)/36 = 1008 so we can write

y = 991 – t ≤ 1008 or t ≥ -17 which in this case happens to be the exact value of t as we have seen before.

Before we conclude, let’s recap the different approximations to k1, k2 and k1*k2.

For the 6k±1, we have just seen that:

y(app) = k1*k2 = [(N + 1) ± 2*√N]/36 with a + sign for the 6k+1 and a – sign for the 6k-1.

For the i*i sum of digits (the diagonal elements of the sum of digits), we have:

k = (√N – b(i))/18 with b(i)=5,7,11,13,17,19 and y(app) becomes

k^2 ≤ y(app)= k1*k2 ≤ (k+1)^2

For the i*j sum of digits (the cross products in the sum of digits), we will give only one, that of the 4*7 as they are easy to derive. However, remember that y is not equal to k1*k2 but to y = k(1+3k). As before k can be approximated by

k = (√N – b(i))/18 with b(i)=5,7,11,13,17,19 and y(app) becomes

k(1+3k) ≤ y(app) = 6k1*[1 + 3k2] ≤ (k+1)(1+3(k+1))

Finally, there is always the condition that the pair (x,y) must be positive. All these conditions severely restrict the interval of allowed values of t and therefore minimize the computational effort.

It is easy to see that for the i*i sum of digits, the length of the k interval is

(k+1)^2 - k^2 = 2k+1

and that of the 4*7 sum of digits is

(k+1)(1+3(k+1)) – k(1+3k) = 2(3k+2)

However, the number of t iterations is always much smaller than the length of the k interval as we have seen in every example. We recommend to always start the iterations with the lower bound of the parameter t (the most negative or the least positive).

Remember, we do not really need to check the values of t that make x = k1 + k2 ≤ 2*k

Where k is an approximate k calculated by anyone of the above approximations (ie the lower bound value of k used to calculate y(app)). This will cut the number of quadratic equations to check for a given value of (x,y). It is always useful to use the condition x = k1 + k2 ≥ 2*k because it may further shrink the t interval and provide us with a better starting point than the most negative t value.

In conclusion, it was shown that factorization of a number N can be mapped into finding integer solutions to a set of linear Diophantine equations and subsequent quadratic equations. This method seems to be well adapted to handling large numbers that are so important for some well known encryption methods. Going from a simple 3 digit number to a 1000 digit number will not involve more than solving the same number of linear Diophantine equations using the well known extended (or generalized) Euclidean algorithm and few quadratic equations. We have also provided a way to get a good starting guess for the parameter t which makes this method very attractive as a factorization method. However, at this point, more work needs to be done to determine the scaling of the method when it is used for large N. What is left to be done cannot be done by hand, so I am appealing to the community of mathematicians for help in analyzing and programming this method.