MQTT 它是 IBM 公司 开发的一个即时通讯协议,也是一个物联网传输协议,它被设计用于轻量级的发布/订阅式的消息传输,旨在为低带宽和不稳定的网络环境中的物联网设备提供可靠的网络服务。它针对低带宽网络,低计算能力的设备,做了特殊的优化,使得其能适应各种物联网应用场景,其实现在很多的第三方推送平台也是基于 MQTT 实现的。
摘自:https://cloud.tencent.com/developer/news/223471
Centos上搭建服务:
应用:
其他参考:
足迹,留给未来的自己
mqtt实现多客户端实时通信,如平台推送消息至移动应用终端。
MQTT 它是 IBM 公司 开发的一个即时通讯协议,也是一个物联网传输协议,它被设计用于轻量级的发布/订阅式的消息传输,旨在为低带宽和不稳定的网络环境中的物联网设备提供可靠的网络服务。它针对低带宽网络,低计算能力的设备,做了特殊的优化,使得其能适应各种物联网应用场景,其实现在很多的第三方推送平台也是基于 MQTT 实现的。
摘自:https://cloud.tencent.com/developer/news/223471
其他参考:
使用以下pipeline即可:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from scrapy.exceptions import DropItem class DuplicatesPipeline(object): def __init__(self): self.ids_seen = set() def process_item(self, item, spider): if item['id'] in self.ids_seen: raise DropItem("Duplicate item found: %s" % item) else: self.ids_seen.add(item['id']) return item |
【摘自:】https://docs.scrapy.org/en/latest/topics/item-pipeline.html#duplicates-filter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def send_email(mail_user, mail_password, mail_recipient, subject, body): FROM = mail_user TO = mail_recipient if type(mail_recipient) is list else [mail_recipient] SUBJECT = subject TEXT = body # Prepare actual message message = """From: %s\nTo: %s\nSubject: %s\n\n%s """ % (FROM, ", ".join(TO), SUBJECT, TEXT) try: server = smtplib.SMTP_SSL("mail.openedoo.org", 465) server.ehlo() #server.starttls() server.login(mail_user, mail_password) server.sendmail(FROM, TO, message) server.close() return 'successfully sent the mail' except Exception as e: return "failed to send mail" |
【摘自:】https://www.programcreek.com/python/example/6443/smtplib.SMTP_SSL
webDriver.Close()
– Close the browser window that the driver has focus ofwebDriver.Quit()
– Calls Dispose()webDriver.Dispose()
Closes all browser windows and safely ends the session【摘自:】https://stackoverflow.com/questions/15067107/difference-between-webdriver-dispose-close-and-quit
先检查text是什么类型
如果type(text) is bytes,那么
1 2 |
text.decode('unicode_escape') |
如果type(text) is str,那么
1 |
text.encode('latin-1').decode('unicode_escape') |
作者:mailto1587
链接:https://www.zhihu.com/question/26921730/answer/49625649
来源:知乎
若安装时遇到报ssl相关的错误(如下)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
pip3 install pycurl Collecting pycurl Using cached https://files.pythonhosted.org/packages/e8/e4/0dbb8735407189f00b33d84122b9be52c790c7c3b25286826f4e1bdb7bde/pycurl-7.43.0.2.tar.gz Complete output from command python setup.py egg_info: Using curl-config (libcurl 7.54.0) Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/folders/rt/r5cq9kls5qngd41nrn9bx6w40000gn/T/pip-install-01_xgi2x/pycurl/setup.py", line 913, in <module> ext = get_extension(sys.argv, split_extension_source=split_extension_source) File "/private/var/folders/rt/r5cq9kls5qngd41nrn9bx6w40000gn/T/pip-install-01_xgi2x/pycurl/setup.py", line 582, in get_extension ext_config = ExtensionConfiguration(argv) File "/private/var/folders/rt/r5cq9kls5qngd41nrn9bx6w40000gn/T/pip-install-01_xgi2x/pycurl/setup.py", line 99, in __init__ self.configure() File "/private/var/folders/rt/r5cq9kls5qngd41nrn9bx6w40000gn/T/pip-install-01_xgi2x/pycurl/setup.py", line 316, in configure_unix specify the SSL backend manually.''') __main__.ConfigurationError: Curl is configured to use SSL, but we have not been able to determine which SSL backend it is using. Please see PycURL documentation for how to specify the SSL backend manually. ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/rt/r5cq9kls5qngd41nrn9bx6w40000gn/T/pip-install-01_xgi2x/pycurl/ |
解决方法参考:
新版MAC:
1 2 3 4 5 |
pip3 uninstall pycurl# 卸載庫 export PYCURL_SSL_LIBRARY=openssl export LDFLAGS=-L/usr/local/opt/openssl/lib export CPPFLAGS=-I/usr/local/opt/openssl/include# openssl相關頭文檔路徑 pip3 install pycurl --compile --no-cache-dir # 重新編譯安裝 |
参考链接:https://hk.saowen.com/a/431b34187c78c994d278c1a1a10d480bab83b28085468aca67ce9171ea8307fc
windows上考虑安装anaconda,通过anaconda来解决:
1 |
conda install -c anaconda pycurl |
有些大厂,像BAT可能对爬虫这类检测做的很厉害,为了模拟人为操作,需要让脚步运行的慢一些。
以下python代码用于登录支付宝账号,为了更像真人操作,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
import random import time from selenium import webdriver from selenium.webdriver import ActionChains url = "https://auth.alipay.com/login/index.htm" name = "你的支付宝账号" password = "支付宝登录密码" # 缓慢输入内容 def send_keys_delay_random(controller, keys, min_delay=0.5, max_delay=1.5): for key in keys: controller.send_keys(key) time.sleep(random.uniform(min_delay, max_delay)) # 延迟随机时间 def delay_time(min_delay=0.5, max_delay=1): time.sleep(random.uniform(min_delay, max_delay)) print("启动浏览器,打开页面") driver = webdriver.Chrome() driver.get(url) delay_time() print("选择账号密码登录") chose_btn = driver.find_element_by_xpath('//*[@id="J-loginMethod-tabs"]/li[2]') ActionChains(driver).move_to_element(chose_btn).click().perform() delay_time() print("输入账号") username_input = driver.find_element_by_id("J-input-user") username_input.clear() send_keys_delay_random(username_input, name) delay_time(1, 2) print("输入密码") password_input = driver.find_element_by_id("password_rsainput") password_input.clear() send_keys_delay_random(password_input, password) delay_time() print("点击登录") btn_login = driver.find_element_by_id("J-login-btn") ActionChains(driver).move_to_element(btn_login).click().perform() driver.quit() print("正常退出") |
https://github.com/Python3WebSpider/CookiesPool
特点:通过接口形式获取随机Cookie;可扩展多种网站
https://github.com/Germey/ProxyPool
特点:通过接口形式获取随机IP代理信息
一般情况下,提供的WebService返回的数据类型都是json,
但若需要html, xml 等类型,可以参考以下链接(mimerender):
本地开发Python时,使用virtualenv隔离环境,并安装所需的一些类库。
将项目迁移到其他机器或部署到服务器上时,需要有同样的类库环境。
1 |
pip freeze > requirements.txt |
1 |
pip install -r requirements.txt |
【参考:python 如何连同依赖打包发布以及python的构建工具? – hunt zhan的回答 – 知乎 https://www.zhihu.com/question/21639330/answer/18847627】