下载教材pdf

北京大学电子教参平台apabi数字资源平台可以获得一些教材,不过只能在线阅读:

image.png

按F12查看,可以发现图片的链接:

image.png

链接格式如下:

1
http://162.105.138.126/OnLineReader/command/imagepage.ashx?objID=...&metaId=...OrgId=apabi_usp&Ip=undefined&scale=0.5666289254528362&width=381.5&height=483&pageid=2&ServiceType=Imagepage&......(后略)

经过实验,width和height参数给出了获得的图片大小,pageid给出了页数,所以只需要构造请求并下载合并,即可或得pdf形式的教材:

python3代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import requests
import os
import img2pdf
import shutil
import multiprocessing

def get_page(args):
page_url,now_page_num=args
pic=requests.get(page_url)
with open("temp/"+str(now_page_num)+'.png','wb') as f:
f.write(pic.content)
print(str(now_page_num)+'done')

def parallel_download(page_num,a_url):
pool=multiprocessing.Pool(processes=8)
x1=a_url.find("&width=")
x2=a_url.find("&ServiceType=")
pool.map(get_page,[(a_url[:x1]+'&width=1500&height=2100&pageid='+str(i)+a_url[x2:],i) for i in range(1,page_num+1)])
pool.close()


if __name__ == "__main__":

print('bookname?')
bookname=input()

print('page_num?')
page_num=int(input())

print('url?')
t_url=input()

os.makedirs("temp")
print("downloading pictures")

parallel_download(page_num,t_url)

print("Changing")
with open(bookname+".pdf","wb") as f:
f.write(img2pdf.convert(list("temp/"+str(i)+'.png' for i in range(1,page_num+1))))
print("done")
shutil.rmtree("temp")

其中url要输入刚才复制的某一页的url,page_num填写页数,bookname填写书名

todo:下载数字教学参考资料的书

-------------end-------------