I just realized I didn't answer the simple questions

(1) Why up to sqrt(n) : I stole this from an article because I tend to give long winded explanations:
QUOTE(www.cut-the-knot.org)
To find all prime numbers below a given number N, write all integers from 1 through N in order. One is a not a prime number and is crossed out right away. The algorithm proceed sequentially in steps. On every step, find the first number not yet crossed, mark it as prime and cross out all of its remaining multiples. Repeat this step while the least available number does not exceed the square root of N. This is so because, for a composite number A = a·b less or equal to N, the factors a and b can't both exceed N. Thus any prime factor of A can't exceed A, let alone N. When the algorithm stops, the prime numbers are either marked or not crossed.
(2) "Slow" / "Small"
In computer science computational efficency is given in "Oh" notation. If we say an algorithm is O(x) for example, we are saying that if we need to do the algorithm on x items, then it will take c*x operations -- where c is some constant.
In general, anything polynomial (i.e. O(x^n) or faster) is considered "Fast" and anything slower (i.e. O(2^x)) or worse is "slow". The reason is obvious if you just think about how the functions grow as x does.
You instructor was simplyfing things a bit. The is technically a "fast" algorithm O((n log n)(log log n))** in terms of operations. Two problems. First, n is so ridicluously large that we're still talking age-of-the-universe or worse in real world computational time. In addition, the memory usage of the algorithm is O(n). Generating primes that are 4096 binary digits is a common problem these days. Half of that is 5.2 * 10^1232, which is going to be the average number of digits in your sieve. I think I've seen estimates of the number of atoms in the universe around 10^80 or so.
From the above it should be obvious that the limitating factor is the "small", not the "slow". Let's say you have 1 gig of memory to work with, how big can your sieve be? Assuming each digit takes 32 bits of memory, our table can store up to 262,144,000.
How long would our sieve take to run? (n log n)(log log n) where n is 262,144,000 is about 2,041,877,032 operations (assuming c=1). Assuming 5M intops/sec, that's about 30 minutes of work.
**A sidenote about "hard"/"easy" problems, which is synonmous with "slow"/"fast" in computer science. Exactly what the border between fast and slow algorithms looks like (i.e. P?NP) -- if it exists at all -- is up there with the riemann hypothesis in importance, and is a really interesting problem that unlike the riemann hypothesis and zeta function doesn't take much mathematics to understand. A simple introduciton
here, a "terse" introduction (which if you have any background in computer science in some ways is much easier to understand) is
here. This problem is also is incredible praticle importance -- if it can be shown P=NP it will literally be a second computer revolution.