How to set up a Headless Chrome Node.js server in Docker

转载

mob60475705f1df 2020-09-22 16:57:00

文章标签 chrome docker sed ide linux 文章分类 代码人生

January 10, 2020 4 min read

Headless browsers have become very popular with the rise of automated UI tests in the application development process. There are also countless use cases for website crawlers and HTML-based content analysis.

For 99 percent of these cases, you don’t actually need a browser GUI because it is fully automated. Running a GUI is more expensive than spinning up a Linux-based server or scaling a simple Docker container across a microservices cluster such as Kubernetes.

But I digress. Put simply, it has become increasingly critical to have a Docker container-based headless browser to maximize flexibility and scalability. In this tutorial, we’ll demonstrate how to create a Dockerfile to set up a Headless Chrome browser in Node.js.

Headless Chrome with Node.js

Node.js is the main language interface used by the Google Chrome development team, and it has an almost native integrated library for communicating with Chrome called Puppeteer.js. This library uses WebSocket or a System Pipe-based protocol over a DevTools interface, which can do all kinds of things such as take screenshots, measure page load metrics, connection speeds, and downloaded content size, and more. You can test your UI on different device simulations and take screenshots with it. Most importantly, Puppeteer doesn’t require a running GUI; it can all be done in a headless mode.

const puppeteer = require('puppeteer');
const fs = require('fs');

Screenshot('https://google.com');

async function Screenshot(url) {
   const browser = await puppeteer.launch({
       headless: true,
       args: [
       "--no-sandbox",
       "--disable-gpu",
       ]
   });

    const page = await browser.newPage();
    await page.goto(url, {
      timeout: 0,
      waitUntil: 'networkidle0',
    });
    const screenData = await page.screenshot({encoding: 'binary', type: 'jpeg', quality: 30});
    fs.writeFileSync('screenshot.jpg', screenData);

    await page.close();
    await browser.close();
}

Shown above is the simple actionable code for taking a screenshot over Headless Chrome. Note that we are not specifying Google Chrome’s executable path because Puppeteer’s NPM module comes with a Headless Chrome version embedded inside. Chrome’s dev team did a great job of keeping the library usage very simple and minimizing the required setup. This also makes our job of embedding this code inside the Docker container much easier.

Google Chrome inside a Docker container

Running a browser inside a container seems simple based on the code above, but it’s important not to overlook security. By default, everything inside a container runs under the root user, and the browser executes JavaScript files locally.

Of course, Google Chrome is secure, and it doesn’t allow users to access local files from browser-based script, but there are still potential security risks. You can minimize many of these risks by creating a new user for the specific purpose of executing the browser itself. Google also has sandbox mode enabled by default, which restricts external scripts from accessing the local environment.

Below is the Dockerfile sample responsible for the Google Chrome setup. We will choose Alpine Linux as our base container because it has a minimal footprint as a Docker image.

FROM alpine:3.6

RUN apk update && apk add --no-cache nmap && \
    echo @edge http://nl.alpinelinux.org/alpine/edge/community >> /etc/apk/repositories && \
    echo @edge http://nl.alpinelinux.org/alpine/edge/main >> /etc/apk/repositories && \
    apk update && \
    apk add --no-cache \
      chromium \
      harfbuzz \
      "freetype>2.8" \
      ttf-freefont \
      nss

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

....
....

The run command handles the edge repository for getting Chromium for Linux and libraries required to run chrome for Alpine. The tricky part is to make sure we don’t download Chrome embedded inside Puppeteer. That would be a useless space for our container image, which is why we are keeping the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true environment variable.

After running the Docker build, we get our Chromium executable: /usr/bin/chromium-browser. This should be our main Puppeteer Chrome executable path.

Now let’s jump to our JavaScript code and complete a Dockerfile.

Combining Node.js Server and Chromium container

Before we continue, let’s change a little bit of our code to fit as a microservice for taking screenshots of given websites. For that, we’ll use Express.js to spin a basic HTTP server.

// server.js
const express = require('express');
const puppeteer = require('puppeteer');

const app = express();

// /?url=https://google.com
app.get('/', (req, res) => {
    const {url} = req.query;
    if (!url || url.length === 0) {
        return res.json({error: 'url query parameter is required'});
    }

    const imageData = await Screenshot(url);

    res.set('Content-Type', 'image/jpeg');
    res.set('Content-Length', imageData.length);
    res.send(imageData);
});

app.listen(process.env.PORT || 3000);

async function Screenshot(url) {
   const browser = await puppeteer.launch({
       headless: true,
       executablePath: '/usr/bin/chromium-browser',
       args: [
       "--no-sandbox",
       "--disable-gpu",
       ]
   });

    const page = await browser.newPage();
    await page.goto(url, {
      timeout: 0,
      waitUntil:

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：安装protoc

下一篇：HTTP协议header中Content-Disposition中文文件名乱码

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯