文档的介绍:
*潜行模式:应用各种技术使无头木偶师的检测更加困难。💯
*###目的
*有几种方法可以很容易地被目标网站检测到木偶师的使用。
*在用户代理中添加“HeadlessChrome”只是最明显的一个。
*这个插件的目标是成为木偶师的明确伴侣,以避免
*检测,在它们浮出水面时应用新技术。
*由于这款猫捉老鼠游戏还处于起步阶段,而且插件节奏很快
*保持尽可能灵活,以支持快速测试和迭代。
*###模块化
*此插件使用“puppeteer extra”的依赖系统仅需要
*为已经启用的规避编写mods代码,以保持模块化和高效。
*“隐身”插件是一个方便的包装器,需要多种[规避技术](./evasions/)
*自动并带有默认值。您也可以绕过主模块,并要求
*特定的规避插件,如果你想这样做(因为它们是独立的“木偶师额外”插件):
*//绕过主模块,直接需要一个特定的隐形插件:
*puppeteer.use(require('puppeteer-extra-plugin-sicanic/evasions/console.debug')())
*###贡献
*欢迎PR,如果你想添加一种新的逃避技巧,我建议你
*看看[template](./evasions/_template)来启动事情。
*###荣誉
*感谢[Evan Sangaline](https://intoli.com/blog/not-possible-to-block-chrome-headless/)和[保罗爱尔兰人](https://github.com/paulirish/headless-cat-n-mouse)开始讨论!
下面是使用方法:
1.下载puppeteer-extra
npm install puppeteer-extra --save
2.下载puppeteer-extra-plugin-stealth
npm install puppeteer-extra-plugin-stealth --save
3.下载puppeteer
npm install puppeteer --save
浏览器的包可能下载失败,加一个参数--ignore-scripts 忽略包的下载,后面在引用本地的chrome目录即可
像这样:
executablePath:
"C:\\Users\\nanfang\\AppData\\Local\\Google\\Chrome\\Application\\chrome.exe",
完整的代码:(这里把浏览器的启动和关闭封装了一下,会return一个page直接用这个page gotoURL 即可)
let puppeteer = require("puppeteer-extra"); let { executablePath } = require("puppeteer"); const pluginStealth = require("puppeteer-extra-plugin-stealth"); puppeteer.use(pluginStealth());
let browser = {};
const Bowser = {
launch: async () => {
const pathToExtension =
"/data/Koa_blog/node_modules/puppeteer/.local-chromium/linux-722234/chrome-linux/chrome";
const config = {
headless: false,
args: ["--disable-images"],
defaultViewport: {
width: 1440,
height: 1500,
deviceScaleFactor: 1,
isMobile: false,
hasTouch: false,
},
ignoreHTTPSErrors: true,
slowMo: 100,
userDataDir: "/path/to/user/data/directory",
executablePath:
"C:\Users\nanfang\AppData\Local\Google\Chrome\Application\chrome.exe",
};
browser = await puppeteer.launch(config);
const page = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3617.0 Safari/537.36" // executablePath
);
return page;
},
close: async () => {
try {
await browser.close();
return true;
} catch (err) {
console.log("close browser err::", err);
return true;
}
},
};
module.exports = Bowser;
引用:
const Bowser = require("Bowser");
const gotoUrl = "http://biaoblog.cn";
(async () => {
const page = await Bowser.launch();
await page.goto(gotoUrl);
})();
版本信息:
Node Version:12.18.2
package.json:
{
"name": "crawler",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"author": "",
"license": "ISC",
"dependencies": {
"axios": "^1.1.3",
"jsdom": "16.7.0",
"md5": "^2.3.0",
"node-fetch": "2.6.12",
"node-schedule": "^2.1.0",
"puppeteer": "^8.0.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2"
}
}