『爬虫』学习记录

爬虫 2023-01-31 00:01:06 307人浏览薄情痞子

Python 官方文档：入门教程 => 点击学习

摘要

## 在学习爬虫中遇到很多坑，写出来供道友参考出现诸如以下错误　　　　ModuleNotFoundError: No module named 'js2xml' 　　　　NameError: name 'js2xml' is no

## 在学习爬虫中遇到很多坑，写出来供道友参考

出现诸如以下错误

　　　　ModuleNotFoundError: No module named 'js2xml'

　　　　NameError: name 'js2xml' is not defined

　　则可能是库没有导入

在将 str 转换为 JSON

JSONDecodeError: Extra data: line 1 column 234701 (char 234700)

　　　则可能是 str 不符合 json 格式

　　1. 可以用 start 和 end 标示开头结尾，如 str[start, end] ；

　　2. 可以对 str 进行剪切，使用 strip('symbol') 方法，对首尾存在 symbol 的进行剪切

　　　又或者是存在多重结构，则

　　One-liner for your problem:

　　data = [json.loads(line) for line in open('tweets.json', 'r')]

。。。存坑

过去一段时间后，再次运行 jupyter notebook，出现错误

错误：

'jupyter' 不是内部或外部命令，也不是可运行的程序

原因及解决：环境变量中添加 D:\Users\23525\Anaconda3\Scripts，里面有 jupyter_notebook.exe、pip.exe 等命令

然后又出现如下错误：

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\Scripts\jupyter-notebook-script.py", line 6, in <module>
from notebook.notebookapp import main
File "C:\ProgramData\Anaconda3\lib\site-packages\notebook\notebookapp.py", line 47, in <module>
from zMQ.eventloop import ioloop
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\__init__.py", line 47, in <module>
from zmq import backend
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\backend\__init__.py", line 40, in <module>
reraise(*exc_info)
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\utils\sixcerpt.py", line 34, in reraise
raise value
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\backend\__init__.py", line 27, in <module>
_ns = select_backend(first)
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\backend\select.py", line 27, in select_backend
mod = __import__(name, fromlist=public_api)
File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\backend\cython\__init__.py", line 6, in <module>
from . import (constants, error, message, context,
ImportError: DLL load failed: 找不到指定的模块。

原因：问题都出现在 zmq 文件夹中，搜索答案需要重新安装 zmq

解决：

pip uninstall pyzmq

pip install pyzmq

在 install 时又出现如下错误：

pip is configured with locations that require TLS/SSL, however the ssl module in python is not available. Collecting pyzmq

Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to https URL because the SSL module is not available.")': /simple/pyzmq/

Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HttpS URL because the SSL module is not available.")': /simple/pyzmq/

Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/

Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/

Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/

Could not fetch URL https://pypi.org/simple/pyzmq/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pyzmq/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping

Could not find a version that satisfies the requirement pyzmq (from versions: ) No matching distribution found for pyzmq pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping

原因：

我得到了相同的“SSL模块不可用”错误运行Anaconda附带的原生点（目前为18.1）。在我的例子中，这是一个系统路径问题，我通过将以下目录添加到我的路径变量来解决：

%Miniconda3_DIR%;%Miniconda3_DIR%\Library\mingw-w64\bin;%Miniconda3_DIR%\Library\usr\bin;%Miniconda3_DIR%\Library\bin;%Miniconda3_DIR%\Scripts;%Miniconda3_DIR%\bin;

在哪里，%Miniconda3_DIR%应该用你的Miniconda（或Anaconda）安装路径代替。

参考：https://stackoverflow.com/questions/53742171/pip-tls-ssl-however-the-ssl-module-in-python-is-not-available-problem

其实出现一段时间不能运行的程序，重新安装是最简单的操作，但我想要真正得解决问题，让我对世界能多少掌握一点控制权。通过一步步发现问题、解决问题、总结及预防，不正是人类发展的恒在规律吗？希望人类继承和探索之路长明。

您可能感兴趣的文档:

点击免费下载>>软考高级考试备考技巧/历年真题/备考精华资料

--结束END--

本文标题: 『爬虫』学习记录

本文链接: https://www.lsjlt.com/news/182650.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341