Help - Search - Members - Calendar
Full Version: Verifying Prime Numbers
Hydrogenaudio Forums > Misc. > Off-Topic
Sebastian Mares
Hello!

We had to write a small program in Visual Basic today in our computer class that checks if a given number is a prime number or not. The most easy way to check would be writing a loop that starts from 2 and counts up to n-1 and if n mod i (i being the counter) = 0, the number is not prime. However, I saw on the Internet that it's faster to count until the square root of the given number, but is this really fail-safe? Some pages state that this algorithm works only for small numbers, but what is considered small and what not? Also, can someone tell me in a few words why it's actually enough to go up to the square root only?

Regards,
Sebastian
Sebastian Mares
OK, I think I managed to find out why you only have to go up to the square root: if there isn't a factor smaller or equal to the square root of the number we are checking, then there can't be a factor greater than the square root because you would need a smaller number to multiply it by to reach the number being tested. But why does this only work for small numbers? blink.gif
stephanV
Should work for any number.
Sebastian Mares
Yeah, maybe the sites which stated that the algorithm should be used for small numbers only are referring to the performance (looping though 1,000,000 numbers can take some time tongue.gif).
boojum
QUOTE(Sebastian Mares @ Mar 27 2006, 01:21 PM)
Yeah, maybe the sites which stated that the algorithm should be used for small numbers only are referring to the performance (looping though 1,000,000 numbers can take some time tongue.gif).
*




Doesn't working with primes require huge amounts of computer power for the massive work the cpu needs to do? I mean, there are no cute heuristics for discovering primes are there? I understood it to be pretty much a brute force effort. But I am a LibArts major. You want fries with that? cool.gif
legg
Actually, you're doing the most naive approach. The still naive (but not so naive) approach would be to:

try mod 2
try mod 3,5,7,9,11,13,....,n/2-1

There's no need to try all the numbers above n/2 since they are not integer divisors of n. There's also no need to do the mod on even numbers, since all of them are divisible by 2.
woody_woodward
QUOTE(legg @ Mar 27 2006, 04:22 PM)
Actually, you're doing the most naive approach. The still naive (but not so naive) approach would be to:

try mod 2
try mod 3,5,7,9,11,13,....,n/2-1

There's no need to try all the numbers above n/2 since they are not integer divisors of n. There's also no need to do the mod on even numbers, since all of them are divisible by 2.
*


Disclaimer: I'm an old geezer who thinks in FORTRAN, feel free to ignore me.

Legg is on the right track, but it is not necessary to check with all odd integers. Try checking with only the prime numbers up to the square root of the number being tested.
I could write a code snippet, but it would have to be FORTRAN.... maybe BASIC.
legg
QUOTE(woody_woodward @ Mar 27 2006, 09:41 PM)
Disclaimer: I'm an old geezer who thinks in FORTRAN, feel free to ignore me.

Legg is on the right track, but it is not necessary to check with all odd integers.  Try checking with only the prime numbers up to the square root of the number being tested.
I could write a code snippet, but it would have to be FORTRAN....  maybe BASIC.
*




I thought of suggesting that, but you can't have a table of all the prime numbers up to n/2-1. At least not for big numbers. Perhaps have the first 100 primes and then use the strategy of odd numbers.

It all depends on how much time he's willing to spend to have the table ready for the algorithm.
kjoonlee
There was a previous thread about prime numbers: http://www.hydrogenaudio.org/forums/index....showtopic=41474

See also: http://en.wikipedia.org/wiki/Primality_test
Sebastian Mares
I already stated in my first post that the method I suggested was the most simple one and isn't by far effective. My only problem was why it's enough to go up to the square root only, but I figured it out now. For example, if you want to test 23, you only have to test up to 4. Why? If there isn't any divisor within the range 3 to 4, there can't be any divisor above the square root because even multiplying the first divisor above 4 with itself results in 25 (5*5) which is larger than the number we test.
stephanV
QUOTE(Sebastian Mares @ Mar 27 2006, 10:21 PM)
Yeah, maybe the sites which stated that the algorithm should be used for small numbers only are referring to the performance (looping though 1,000,000 numbers can take some time tongue.gif).
*


Even numbers with an order of size near 10^6, would only need about 500 loops as maximum.

Since there is no simple formula to determine if a number is a prime or not, determining if a large number is a prime is always going to take longer than doing the same thing on a small number. smile.gif
Chu
I have a background in cryptography, and you just stepped into a minefield biggrin.gif

Here is the short answer, which basically the wiki page goes into more detail on.

First, we're assuming your testing a number for primality. Generating a prime number is a completly different problem (although in pratice guess -> check is the most common method since the density of primes is still "high" in the range we are dealing with*.

First, you have to decide how certain you must be a number is prime. Showing a number is prime with complete certanty is a slow*. Showing a number is prime to a certain degree of certainty is extreemly fast.

Let's say you go with the latter. Now you have to live with what type of error you want to expect.

(1) Monte-Carlo : You will always get an answer, however, the is some probability that the answer will be wrong.

(2) Las Vegas : You will always get a correct answer, however, there is some probability that you will not get an answer at all.

Depending on the application and hardware avaliable, you need to pick what degree of certainty you can live with and what type of errors you want to introduce. As for specific algorithms -- the wiki article does a pretty good job. The language is terse but the algorithms themselves are pretty easy to understand.

This has been a really simplified explanation, but mostly true. If you are really intersted in the subject -- the first few sections of Chapter 4 of the handbook of applied cryptography go into some detail about this, and in my opinion is written a hundred times better then the wiki article. HAC is free online link.

*The density of primes, more specifically how primes are distributed is one of the most important problems in mathematics today. The zeta function descibes the distribution of primes and in research has shown up in an amazing amount of related fields where one would not expect it -- however the riemann hypothesis which is used to extract the distribution from the function is still unproven. It's one of the most important problems in mathematics today -- as in only 2 other problems even come close. A very terse introduction here. One that you can read without a significant mathematical background is here.

** But getting faster all the time. The best algorithms are significantly better then exponential time and can be considered "fast" from a computational complexity standpoint however are still too slow for real use when generating a large number of primes.
Chu
I just realized I didn't answer the simple questions crying.gif

(1) Why up to sqrt(n) : I stole this from an article because I tend to give long winded explanations:

QUOTE(www.cut-the-knot.org)

To find all prime numbers below a given number N, write all integers from 1 through N in order. One is a not a prime number and is crossed out right away. The algorithm proceed sequentially in steps. On every step, find the first number not yet crossed, mark it as prime and cross out all of its remaining multiples. Repeat this step while the least available number does not exceed the square root of N. This is so because, for a composite number A = a·b less or equal to N, the factors a and b can't both exceed N. Thus any prime factor of A can't exceed A, let alone N. When the algorithm stops, the prime numbers are either marked or not crossed.


(2) "Slow" / "Small"

In computer science computational efficency is given in "Oh" notation. If we say an algorithm is O(x) for example, we are saying that if we need to do the algorithm on x items, then it will take c*x operations -- where c is some constant.

In general, anything polynomial (i.e. O(x^n) or faster) is considered "Fast" and anything slower (i.e. O(2^x)) or worse is "slow". The reason is obvious if you just think about how the functions grow as x does.

You instructor was simplyfing things a bit. The is technically a "fast" algorithm O((n log n)(log log n))** in terms of operations. Two problems. First, n is so ridicluously large that we're still talking age-of-the-universe or worse in real world computational time. In addition, the memory usage of the algorithm is O(n). Generating primes that are 4096 binary digits is a common problem these days. Half of that is 5.2 * 10^1232, which is going to be the average number of digits in your sieve. I think I've seen estimates of the number of atoms in the universe around 10^80 or so.

From the above it should be obvious that the limitating factor is the "small", not the "slow". Let's say you have 1 gig of memory to work with, how big can your sieve be? Assuming each digit takes 32 bits of memory, our table can store up to 262,144,000.

How long would our sieve take to run? (n log n)(log log n) where n is 262,144,000 is about 2,041,877,032 operations (assuming c=1). Assuming 5M intops/sec, that's about 30 minutes of work.

**A sidenote about "hard"/"easy" problems, which is synonmous with "slow"/"fast" in computer science. Exactly what the border between fast and slow algorithms looks like (i.e. P?NP) -- if it exists at all -- is up there with the riemann hypothesis in importance, and is a really interesting problem that unlike the riemann hypothesis and zeta function doesn't take much mathematics to understand. A simple introduciton here, a "terse" introduction (which if you have any background in computer science in some ways is much easier to understand) is here. This problem is also is incredible praticle importance -- if it can be shown P=NP it will literally be a second computer revolution.
Otto42
The PRIMES problem is actually in P, not NP. Look up the AKS primality test. It's not the fastest, but it proves that it's a problem in P.


Background info, for the rest of the class:

To understand P vs. NP, you need to understand what a Turing Machine is.

Back in the 30's, Alan Turing realized that unlike other professions, mathematicians only needed one "tool". A carpenter needs a hammer, a level, saws, etc. You can't cut wood with a hammer, basically. But a mathematician could use a single tool to do everything that they do. He invented that tool, at least conceptually, and it's called a Turing Machine.

His concept of a Turing Machine was based heavily on the technology of the time, but it's easy to generalize it. Imagine an infinitely long strip of paper. You've got a pencil, an eraser, a set of instructions to follow, and a "state". Your list of instructions is nothing more than a table, with states and symbols. What you do is to look up your state and the symbol you happen to be looking at on the paper in your list of instructions, then you do what the instruction says. It will tell you what to change the symbol to, what your new state is, and whether to move left or right on the paper for your next symbol. You simply do this until you reach an instruction that tells you to stop. This is very abstract, but he was able to prove, more or less, that this sort of machine would do any calculation that it was possible to calculate. It all depends on your symbols and instructions and such.

You can break this sort of machine down into two classes.
-The first is the deterministic Turing machine, which is basically what I just described. For any given state and symbol, there is one and only one instruction to follow.
-The second is the non-deterministic Turing machine. It can have more than one instruction to follow for any given state and symbol. In this case, you and the paper are duplicated, and you both continue in new paths, each doing different instructions. Instead of being a straight line algorithim, it's more like a branching tree.

P is the class of problems that we say can be solved, in polynomial time, on the first sort of machine.
NP is the class of problem that we say can be solved, in polynomial time, on the second sort of machine.

All P problems fit into NP as well, because the "tree" in those cases just happens to be a single stick. The real question is: are P and NP actually the same?

See, we know that the second sort of machine can do everything the first sort does. That's a bit obvious. We also know that the first sort of machine can do everything the second sort does as well. That's not so obvious, but it's true. It's just not as fast at it, because all those branching paths have to be computed individually instead of simultaneously. So far, this has *always* meant that the time to solve the problem on the second sort of machine becomes non-polynomial on the first sort of machine. But we have not yet *proven* that it always will work that way. So it's sort of an open question.

Normally, this might just mean that we haven't figured out a way to do it. But we have figured out lots of other things. In working on it, methods were devised that allowed us to transform one sort of problem into other sorts of problems. To prove, basically, that two given problems were actually the same problem. Doesn't help with solving them much, but it did create a new class of problems, called NP-Complete.

NP-Complete problems are sorta the hardest ones of the NP type problems. If anybody ever figured out a way to make an NP-Complete problem into a P problem, then they could use these conversion methods to make ALL NP problems into P problems, because any NP problem can be converted into an NP-Complete problem.

This is significant because modern computers are basically deterministic Turing machines, without the infinitely long memory. They tend to solve P problems really well. They don't do so good at NP problems, because NP's take a lot more time, since we have to convert them to work on deterministic machines and this makes them take a lot longer. Expontial time or worse, usually.

So if somebody proved P=NP, then they could solve the NP problems that much faster. And a lot of important stuff relies on P not being equal to NP, or rather on NP problems taking a long time to solve on current hardware. Or then again, Quantum Computing proposes to change the picture entirely. Because in theory, a quantum computer could be a non-deterministic Turing machine, well suited to solving NP problems directly.

And so on. Probably more than you really needed to know.
legg
Minesweeper is NP-complete. Solve it in P and you'll earn a million dollars.

http://www.claymath.org/Popular_Lectures/Minesweeper/
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.