最近在测试一份代码,需求是从去哪儿网通过Python网络爬虫获取一些城市的信息,
代码如下:
from bs4 import BeautifulSoup import pandas as pd import requests def get_static_url_content(url): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'} req=requests.get(url,headers=headers) content=req.text bsObj=BeautifulSoup(content,'lxml') return bsObj def get_city_id(): url = 'http://travel.qunar.com/place/' bsObj=get_static_url_content(url) cat_url = [] cat_name = [] bs=bsObj.find_all('div',attrs={'class':'sub_list'}) for i in range(0,len(bs)): xxx = bs[i].find_all('a') for j in range(0,len(xxx)): cat_name.append(xxx[j].text) cat_url.append(xxx[j].attrs['href']) return cat_name,cat_url city_name_list,city_url_list=get_city_id() city=pd.DataFrame({'city_name':city_name_list,'city_code':city_url_list}) city.to_csv('./输出文件/city.csv',encoding='utf_8_sig')
在pycharm中使用Python3环境运行,发现报错:
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py Traceback (most recent call last): File "E:/test8/去哪网城市.py", line 1, in <module> from bs4 import BeautifulSoup ModuleNotFoundError: No module named 'bs4' Process finished with exit code 1
原来是在当前的Python环境中没有bs4第三方库文件,那就安装呗。通过pip进行安装,在cmd中输入下述语句进行安装,但是还是报错,报解决访问的错误,如下图:
Microsoft Windows [版本 10.0.17134.1006] (c) 2018 Microsoft Corporation。保留所有权利。 C:\Users\admin>cd C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4 Collecting bs4 Collecting beautifulsoup4 (from bs4) Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl Collecting soupsieve>=1.2 (from beautifulsoup4->bs4) Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl Installing collected packages: soupsieve, beautifulsoup4, bs4 Could not install packages due to an EnvironmentError: [WinError 5] 拒绝访问。: 'C:\\Program Files\\ArcGIS\\Pro\\bin\\Python\\envs\\arcgispro-py3\\Lib\\site-packages\\soupsieve' Consider using the `--user` option or check the permissions.
提示:C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts是我从pip.exe存在的路径。
通过搜索发现可以使用user选项进行安装,那我试试。
pip install bs4 --user
这样还真是能够安装成功,如下:
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install bs4 --user Collecting bs4 Collecting beautifulsoup4 (from bs4) Using cached https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl Collecting soupsieve>=1.2 (from beautifulsoup4->bs4) Using cached https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl Installing collected packages: soupsieve, beautifulsoup4, bs4 Successfully installed beautifulsoup4-4.8.0 bs4-0.0.1 soupsieve-1.9.3
同时在我的本机“C:\Users\admin\AppData\Roaming\Python\Python36\site-packages”路径下能看到安装的包。
同时在pycharm的Python3环境下能够看到bs4,
既然bs4包安装了,那就在pycharm中运行脚本吧,但是运行之后发现还是报错,这时报另外一个错误,说“bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.”
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" E:/test8/去哪网城市.py Traceback (most recent call last): File "E:/test8/去哪网城市.py", line 26, in <module> city_name_list,city_url_list=get_city_id() File "E:/test8/去哪网城市.py", line 14, in get_city_id bsObj=get_static_url_content(url) File "E:/test8/去哪网城市.py", line 9, in get_static_url_content bsObj=BeautifulSoup(content,'lxml') File "C:\Users\admin\AppData\Roaming\Python\Python36\site-packages\bs4\__init__.py", line 208, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? Process finished with exit code 1
继续百度发现,需要安装一个lxml的包;同理,在cmd中输入下述语句:
C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts>pip install lxml --user Collecting lxml Using cached https://files.pythonhosted.org/packages/6f/6d/d54317403070fcaae973f38b9c298e4b4c101b469ae51afa7c1370e5c35b/lxml-4.4.1-cp36-cp36m-win_amd64.whl Installing collected packages: lxml Successfully installed lxml-4.4.1
安装完lxml包之后,在pycharm中运行脚本,这时终于没有报错,运行成功,同时爬取到的结果输出到指定的文件夹下。
作者:gislaozhang
链接:https://blog.csdn.net/gislaozhang/article/details/101274643
来源:CSDN
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。