nodejs使用selenium爬虫

1.安装浏览器驱动

下载chormdriver 文件注意：下载的版本要和自己的浏览器版本一致

ChromeDriver官网下载地址：https://sites.google.com/chromium.org/driver/downloads
ChromeDriver官网最新版下载地址：https://googlechromelabs.github.io/chrome-for-testing/
ChromeDriver国内镜像下载地址：https://registry.npmmirror.com/binary.html?path=chromedriver/
ChromeDriver国内镜像最新版下载地址：https://registry.npmmirror.com/binary.html?path=chrome-for-testing/

将下载的exe文件复制到项目根目录

安装selenium-webdriver模块

npm install selenium-webdriver

2. 编写爬虫代码(这里以获取1688详情页面数据为例)

由于发现每次打开都是会有滑块验证，说明浏览器并没有存储数据，正常浏览器只会在第一次打开页面触发** 找到C:\Users\<你>\AppData\Local\Google\Chrome\User Data 把整个 User Data 文件夹复制一份到项目根目录** 通过下面方式引入，，这样的话相当于打开的浏览器驱动有一个缓存的地方，，跟谷歌是一致的然后运行项目就会发现，，打开还是会有滑块验证，但是只需要验证一次，，后面打开就不会再验证了同理别的网站也是，只需要第一次登录，后续就不会再验证了，所有的登陆数据都存起来了

// 1688_selenium.js
const { Builder, By, until } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
const path = require('path');
const cheerio = require('cheerio');
// 动态生成绝对路径
const userDataDir = path.join(__dirname, 'user-data');
(async function main() {
    const service = new chrome.ServiceBuilder(
        path.join(__dirname, 'chromedriver.exe') // 确保文件存在
    );

    const options = new chrome.Options()
        .addArguments(`--user-data-dir=${userDataDir}`)
        .addArguments('--disable-blink-features=AutomationControlled')
        .addArguments('--disable-dev-shm-usage')
        .addArguments('--no-sandbox')
        .excludeSwitches('enable-automation')
        .setUserPreferences({ 'credentials_enable_service': false });

    const driver = await new Builder()
        .forBrowser('chrome')
        .setChromeService(service)
        .setChromeOptions(options)
        .build();

    try {
        console.log('[1] 打开商品页...');
        await driver.get('https://detail.1688.com/offer/916585389952.html');
        await driver.sleep(3000); // 等页面初加载

        console.log('[2] 请手动完成滑块（30 秒内）...');
        await driver.sleep(30000); // 给你 30 秒拖滑块

        const html = await driver.getPageSource();

        // 将html下载
        const fs = require('fs');
        fs.writeFileSync('1688.html', html);
    } catch (e) {
        console.error('❌ 出错：', e);
    } finally {
        console.log('[6] 关闭浏览器...');
        await driver.quit();
    }
})();

nodejs使用selenium爬虫 ​

1.安装浏览器驱动 ​

将下载的exe文件复制到项目根目录 ​

安装selenium-webdriver模块 ​

2. 编写爬虫代码(这里以获取1688详情页面数据为例) ​

nodejs使用selenium爬虫

1.安装浏览器驱动

将下载的exe文件复制到项目根目录

安装selenium-webdriver模块

2. 编写爬虫代码(这里以获取1688详情页面数据为例)