本文参考C++版本:如何高效判定、筛选素数
给定n
,我们如何输出n以内(不含n
)的所有素数?
使用Python
,完成函数体,要求返回n
以内的素数个数和这些素数。
def countPrimes(n):
"""
itype -> int
rtype -> int
rtype -> list
"""
关于评价算法速度:如何计算时间复杂度
1.直觉算法
按部就班地遍历2到n-1,依次判断素数。
def countPrimes(n):
primes = []
if n < 2:
return 0, primes
for i in range(2,n):
if isPrime(i):
primes.append(i)
return len(primes), primes
def isPrime(k):
"""
itype: int, >=2
rtype: int
"""
r_isPrime = True
for i in range(2,k):
if k%i == 0:
r_isPrime = False
return r_isPrime
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 10000
The number of primes within 10000 is 1229:
Elapsed time: 2.875
评价
直接地,得到时间复杂度(时间复杂读不考虑系数):
但是这样地算法存在冗余,太耗时了。 对n=20,我们只需要检查
。
- 20=2*10
- 20=4*5
- 20=
- 20=5*4
- 20=10*2 观察到只需要检查20在小于 中的正整数中是否找能被整除的因数,找不到那么后面一半也找不到,找到了就直接判断为素数。
2.平方根算法
检查数k时,遍历因数i
范围为
。
def isPrime(k):
"""
itype: int, >=2
rtype: int
"""
r_isPrime = True
for i in range(2,int(k**0.5)+1):
if k%i == 0:
r_isPrime = False
return r_isPrime
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 10000
The number of primes within 10000 is 1229:
Elapsed time: 2.5625
评价
发现优化得效果也没好多少,用时减少一丁点。函数isPrime()
时间复杂度为
。 发现其实还有冗余,如果已知2为素数,那么就不应该多余地去检查4,6,8这些2的倍数。按照这样的原理,演化到下一个算法。
素数筛-Sieve of Eratosthenes
筛掉已知素数的倍数。
def countPrimes(n):
if n < 2:
return 0, primes
prime_sieve = [1]*n
prime_sieve[0:2] = [0,0]
for i in range(2, n):
if prime_sieve[i]:
for j in range(i*2, n, i):
prime_sieve[j] = 0
return prime_sieve.count(1), [index for index,value in enumerate(prime_sieve) if value == 1]
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 1000000
The number of primes within 1000000 is 78498:
Elapsed time: 0.28125
筛掉倍数后,算法表现提升显著,还有可以提升的空间。 - for i in range(2, n):
是在遍历所有小于n大于等于2的正整数,当n=20,我们不需要遍历所有的大于
的正整数,因为不是素数的肯定通过遍历前面一半的时候筛掉了,是素数的素数筛值保持不变。那么可以将遍历范围变为
。
def countPrimes(n):
if n < 2:
return 0, primes
prime_sieve = [1]*n
prime_sieve[0:2] = [0,0]
for i in range(2, int(n**0.5)+1):
if prime_sieve[i]:
for j in range(i*2, n, i):
prime_sieve[j] = 0
return prime_sieve.count(1), [index for index,value in enumerate(prime_sieve) if value == 1]
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 1000000
The number of primes within 1000000 is 78498:
Elapsed time: 0.1875
OHHHHHHHHHHHHHHHH 外层的循环范围优化了,现在考虑内层循环for j in range(i*2, n, i):
。 当n=20,i=5时会筛掉10,15,20,25...,但是这10,15,20已经被素数2,3筛掉。小于当前素数的因数,要么是更小的素数,要么是更小的素数的倍数,当前素数与这些因数相乘,肯定被筛过了,所以只需要从
开始检查。
def countPrimes(n):
if n < 2:
return 0, primes
prime_sieve = [1]*n
prime_sieve[0:2] = [0,0]
for i in range(2, int(n**0.5)+1):
if prime_sieve[i]:
for j in range(i**2, n, i):
prime_sieve[j] = 0
return prime_sieve.count(1), [index for index,value in enumerate(prime_sieve) if value == 1]
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 1000000
The number of primes within 1000000 is 78498:
Elapsed time: 0.15625
时间复杂度为:
.
评价
以此来看,本文截至此处目前最好的素数算法为Sieve of Eratosthenes
。代码:
def countPrimes(n):
if n < 2:
return 0, primes
prime_sieve = [1]*n
prime_sieve[0:2] = [0,0]
for i in range(2, int(n**0.5)+1):
if prime_sieve[i]:
for j in range(i**2, n, i):
prime_sieve[j] = 0
return prime_sieve.count(1), [index for index,value in enumerate(prime_sieve) if value == 1]
Euler's Sieve
现在介绍一个有线性时间复杂度$mathcal{O(n)}$的算法:欧拉筛法。
算法:
整个过程为:
def countPrimes(n):
if n < 2:
return 0, primes
cur_list = list(range(2, n))
primes = []
while True:
cur_prime = cur_list[0]
backup_list = cur_list.copy()
for j in backup_list:
if j*cur_prime not in cur_list:
break
cur_list.remove(j*cur_prime)
primes.append(cur_prime)
cur_list.pop(0)
if cur_prime**2 > n:
primes.extend(cur_list)
break
return len(primes), primes
from time import process_time
t1 = process_time()
n = int(input("Input a positive integer: "))
count_primes, primes = countPrimes(n)
t2 = process_time()
print("The number of primes within {} is {}:".format(n,count_primes))
print("Elapsed time: {}".format(t2-t1))
Input a positive integer: 1000 The number of primes within 1000 is 168: Elapsed time: 0.013890345999996612
Input a positive integer: 10000 The number of primes within 10000 is 1229: Elapsed time: 0.36447122299999535
Input a positive integer: 100000 The number of primes within 100000 is 9592: Elapsed time: 49.40181045400001
根据对区区10000
的输入,我写的算法就表现得不太行,实际上输入1000000
就接近崩溃了,算法本身没问题,我实现的代码不够还原,肯定有很多冗余。这两天折腾够了,以后再看看。
Changelog
- 2020/04/03: 感谢评论区 @天空和云 的提醒,修改了两处代码。
- 2020/02/15: 更新了欧拉筛法,思路正确但是代码有缺陷。