如何把软件源代码批量转换成打印 pdf 文件？

2024/6/26

申请软件著作权登记时，我们需要按相关的要求把代码转换成 pdf 文件上传。

但是搜索了一圈，发现基本上各种网页插件、编辑器插件都只能单独地一个文件一个文件处理。

作为一个程序员，当然不能这样傻傻的一个一个处理代码文件。代码文件数量多、行数长短不一、而且分布在不同的文件夹里。一个一个处理太慢了，效率太低。生成的代码 pdf 文件也不好看。

今天我们介绍一个把代码批量转换成打印 pdf 文件的小软件，来完成打印代码的任务，而且可以支持代码行号、语法高亮。

提示

软件著作权登记上传源代码的要求：

页眉建议标注该软件名称、版本号，内容应与申请表中填写的一致；右上角标注页码，源程序从正文第 1 页编到第 60 页，文档从目录开始由第 1 页编到第 60 页。

总体流程

拉取要打印的项目源代码，通过 git ls-files > filelist.txt 取得需要打印的代码文件列表，手工清除不需要的文件，并调整为需要的顺序。
依次读取这些代码文件，通过 highlight.js 生成带有行号、语法高亮的 html 文件。
使用 puppeteer 依次打开上一步生成的单个源代友 html 文件，转换成带有每页固定行数，有行号、语法高亮的 pdf 文件。
使用 pdf-merger-js 把多个 pdf 文件按顺序合并起来生成一份源代码 pdf 文件。
使用 pdf-lib 在合并后的 pdf 文件页眉添加软件名称、版本号，在右上角添加页码。

使用的技术栈

cloc
计算源码行数
highlight.js
生成带有行号、语法高亮的 html 文件
puppeteer
打开 html 文件，另存为 pdf 文件
pdf-merger-js
合并 pdf 文件
pdfl-lib
修改 pdf 文件，添加页眉

具体实现

我们使用 NodeJS 开发这个软件。

获取要打印的源码文件

此步骤常见方法是通过 git 管理的文件中获取。

在源码目录运行

计算代码行数

cloc $(git ls-files)

获取 git 仓库的文件清单

git ls-files > filelist.txt

生成 filelist.txt 文件后，打开它，清除不需要的 zip/xlsx/png/docx/.gitignore/cache 等文件。

调整文件的排列顺序，以备后续进行合并操作。

读取代码文件生成 html

此步骤使用 highlight.js 完成。

读取出代码文本后，发送给 highlight.js 处理。

const html = hljs.highlightAuto(codeStr).value;

在 html 文件 <head> 区域添加

html

<link rel="stylesheet" href="/path/to/styles/default.min.css" />
<script src="/path/to/highlight.min.js"></script>
<script>
  hljs.highlightAll();
</script>

把生成的 html 代码插入到一个 html 文件中，左侧是行号，右侧是代码。

/**
 * 获取html内容
 * @param {string} sourceFilePath 源文件路径
 */
async function buildHtml(sourceFilePath) {
  const codeStr = (await promises.readFile(sourceFilePath)).toString();
  const codeHtml = hljs.highlightAuto(codeStr).value;
  const preCode = `<pre><code>${codeHtml}</code></pre>`;
  const numberrow = codeStr
    .split('\n')
    .map((_, i) => fixWidth(i + 1))
    .join('\n');
  const number = `<pre><code class="language-plaintext" style="border-right: 1px solid #EFEFF5;">${numberrow}</code></pre>`;
  const body = `
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>${sourceFilePath}</title>
    <link rel="stylesheet" href="../head/tomorrow.min.css">
    <script src="../head/highlight.min.js"></script>
    <script>hljs.highlightAll();</script>
    <link rel="stylesheet" href="../head/style.css">
</head>
<body>
<main>
  <div class="container">
    ${number}
    ${preCode}
  </div>
</main>
</body>
</html>
`;
  return body;
}

生成 pdf 文件

使用 puppeteer 读取生成的 html 文件，打印成 pdf 文件。

/**
 * 生成pdf文件
 * @param {string[]} fileList 文件名列表
 */
async function writePdfs(fileList, project) {
  const browser = await launch({
    executablePath: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
  });
  for (const fileName of fileList) {
    const page = await browser.newPage();

    // 必须要读取文件才能获得正确样式
    await page.goto(
      `file:\\C:\\Users\\allen\\company\\html-code\\html\\${fileName}.html`
    );

    console.log('生成 pdf', fileName);
    const distPath = join('pdf', `${fileName}.pdf`);
    await page.pdf({
      margin: {
        top: '2cm',
        bottom: '2cm',
        left: '1cm',
        right: '1cm'
      },
      path: distPath,
      format: 'a4'
    });

    await addHeader(distPath, project);
  }

  await browser.close();
}

INFO

为什么不用 wkhtmltopdf？

起初考虑过使用 wkhtmltopdf，但是经过试用，存在 3 个问题：

wkhtmltopdf 无法支持 flex/grid 布局，无法实现带有行号。
wkhtmltopdf 是命令行文件，和 nodejs 组合较麻烦。
wkhtmltopdf 的仓库目前已经不再维护了。

看看源代码文件生成的效果。每页50行代码，页眉标注该软件名称、版本号，右上角有页码。

print-code

合并 pdf 文件

生成了多个 pdf 文件后，使用 pdf-merger-js 合并多个文件生成一个 pdf 文件。

async function mergePdfs(contents, folderName = 'code') {
  const merger = new PDFMerger();

  for (const content of contents) {
    await merger.add(path.join('pdf', `${content}.pdf`));
  }
  const target = `合并_${folderName}.pdf`;
  await merger.save(target);
  return target;
}

添加页眉标识

使用 pdf-lib 库在 pdf 文件每一页的页眉添加标识。

/**
 * 在页眉添加标识
 * @param {string} filePath 文件路径
 * @param {string} project 项目名
 */
async function addPageNumber(filePath, project) {
  // open a font synchronously
  const fontData = await promises.readFile('head\\simhei.ttf');
  // Load a PDFDocument from the existing PDF bytes
  const pdfDoc = await PDFDocument.load(await promises.readFile(filePath));

  pdfDoc.registerFontkit(fontkit);
  const heitiFont = await pdfDoc.embedFont(fontData);
  // Get the first page of the document
  const pages = pdfDoc.getPages();
  let i = 1;
  for (const page of pages) {
    page.drawText(project, {
      x: 260,
      y: 800,
      size: 14,
      font: heitiFont
      // color: rgb(0.95, 0.1, 0.1)
    });

    page.drawText(`页码:${i}`, {
      x: 520,
      y: 800,
      size: 14,
      font: heitiFont
      // color: rgb(0.95, 0.1, 0.1)
    });
    i++;
  }
  // Serialize the PDFDocument to bytes (a Uint8Array)
  const pdfBytes = await pdfDoc.save();
  const target = tryFileName(`${project}.pdf`);
  await promises.writeFile(target, pdfBytes);
}

实现的代码全文

import hljs from 'highlight.js';
import {
  existsSync,
  promises,
  writeFile,
  readdirSync,
  statSync,
  unlinkSync,
  rmdirSync
} from 'fs';
import { launch } from 'puppeteer-core';
import { dirname, basename, join } from 'path';

import PDFMerger from 'pdf-merger-js';
import { mkdir } from 'fs/promises';

import { PDFDocument, rgb } from 'pdf-lib';
import fontkit from '@pdf-lib/fontkit';
import { log } from 'console';

/**
 * 检测目标路径是否存在，存在就加序号
 * @param {string} filePath 目标路径
 */
function tryFileName(filePath) {
  const dir = dirname(filePath);
  const name = basename(filePath);
  let i = 1;
  let newName = name;
  const [filename, ext] = name.split('.');
  let target = join(dir, newName);
  while (existsSync(target)) {
    newName = `${filename}(${i}).${ext}`;
    target = join(dir, newName);
    i += 1;
  }
  return target;
}

/**
 * 添加页眉标记
 * @param {string} filePath 文件路径
 */
async function addHeader(filePath) {
  // open a font synchronously
  const fontData = await promises.readFile('head\\font.ttf');
  // Load a PDFDocument from the existing PDF bytes
  const pdfDoc = await PDFDocument.load(await promises.readFile(filePath));

  pdfDoc.registerFontkit(fontkit);
  const heitiFont = await pdfDoc.embedFont(fontData);
  // Get the first page of the document
  const pages = pdfDoc.getPages();
  for (const page of pages) {
    let fileName = basename(filePath);
    fileName = fileName.substring(0, fileName.lastIndexOf('.'));

    page.drawText(fileName, {
      x: 40,
      y: 800,
      size: 14,
      font: heitiFont
      // color: rgb(0.95, 0.1, 0.1)
    });
  }
  // Serialize the PDFDocument to bytes (a Uint8Array)
  const pdfBytes = await pdfDoc.save();
  // const target = tryFileName(`加页眉_${targetName}.pdf`);
  await promises.writeFile(filePath, pdfBytes);
}

/**
 * 内容写入文件
 * @param {string} filePath 文件路径
 * @param {any} content 文件内容
 * @returns
 */
function writeContent(filePath, content) {
  return new Promise((resolve, reject) => {
    const opt = {
      flag: 'w' // a：追加写入；w：覆盖写入
    };

    writeFile(filePath, content, opt, (err) => {
      if (err) {
        console.error(err);
        reject(err);
      }
      resolve(filePath);
    });
  });
}

/**
 * 修复固定行号
 * @param {number} val 行号固定宽度
 * @returns 固定宽度行号数字
 */
function fixWidth(val) {
  const str = `            ${val}`;
  return str.substring(str.length - 4);
}

/**
 * 获取html内容
 * @param {string} sourceFilePath 源文件路径
 */
async function buildHtml(sourceFilePath) {
  const codeStr = (await promises.readFile(sourceFilePath)).toString();
  const codeHtml = hljs.highlightAuto(codeStr).value;
  const preCode = `<pre><code>${codeHtml}</code></pre>`;
  const numberrow = codeStr
    .split('\n')
    .map((_, i) => fixWidth(i + 1))
    .join('\n');
  const number = `<pre><code class="language-plaintext" style="border-right: 1px solid #EFEFF5;">${numberrow}</code></pre>`;
  const body = `
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>${sourceFilePath}</title>
    <link rel="stylesheet" href="../head/tomorrow.min.css">
    <script src="../head/highlight.min.js"></script>
    <script>hljs.highlightAll();</script>
    <link rel="stylesheet" href="../head/style.css">
</head>
<body>
<main>
  <div class="container">
    ${number}
    ${preCode}
  </div>
</main>
</body>
</html>
`;
  return body;
}

/**
 * 生成pdf文件
 * @param {string[]} fileList 文件名列表
 */
async function writePdfs(fileList, project) {
  const browser = await launch({
    executablePath: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
  });
  for (const fileName of fileList) {
    const page = await browser.newPage();

    // 必须要读取文件才能获得正确样式
    await page.goto(
      `file:\\C:\\Users\\allen\\company\\html-code\\html\\${fileName}.html`
    );

    console.log('生成 pdf', fileName);
    const distPath = join('pdf', `${fileName}.pdf`);
    await page.pdf({
      margin: {
        top: '2cm',
        bottom: '2cm',
        left: '1cm',
        right: '1cm'
      },
      path: distPath,
      format: 'a4'
    });

    await addHeader(distPath, project);
  }

  await browser.close();
}

/**
 * 合并pdf文件
 * @param {string[]} fileList 文件名列表
 * @param {string} folderName 文件夹名称
 */
async function mergePdfs(fileList, folderName) {
  const merger = new PDFMerger();

  for (const fileName of fileList) {
    await merger.add(join('pdf', `${fileName}.pdf`));
  }
  const target = tryFileName(`合并_${folderName}.pdf`);
  await merger.save(tryFileName(target));
  return target;
}

/**
 * 删除文件夹及其内容
 * @param {string} url 文件夹路径
 */
function rmDir(url) {
  let files = [];
  if (existsSync(url)) {
    //判断给定的路径是否存在
    files = readdirSync(url); //返回文件和子目录的数组
    files.forEach(function (file, index) {
      const curPath = join(url, file);
      if (statSync(curPath).isDirectory()) {
        //同步读取文件夹文件，如果是文件夹，则函数回调
        rmDir(curPath);
      } else {
        unlinkSync(curPath); //是指定文件，则删除
        console.log('删除文件', file);
      }
    });
    rmdirSync(url); //清除文件夹
  } else {
    console.log('给定的路径不存在！');
  }
}

/**
 * 生成html文件列表
 * @param {string} filePaths 生成html文件
 * @param {string} sourceDir 源文件目录
 * @returns html文件列表
 */
async function writeHtmls(filePaths, sourceDir) {
  const htmlFileList = [];
  for (const file of filePaths) {
    const htmlContent = await buildHtml(join(sourceDir, file));
    const target = file.replace(/\//g, '_');
    await writeContent(join('html', `${target}.html`), htmlContent);
    console.log('html', target);

    htmlFileList.push(target);
  }
  return htmlFileList;
}

/**
 * 在页眉添加标识
 * @param {string} filePath 文件路径
 * @param {string} project 项目名
 */
async function addPageNumber(filePath, project) {
  // open a font synchronously
  const fontData = await promises.readFile('head\\simhei.ttf');
  // Load a PDFDocument from the existing PDF bytes
  const pdfDoc = await PDFDocument.load(await promises.readFile(filePath));

  pdfDoc.registerFontkit(fontkit);
  const heitiFont = await pdfDoc.embedFont(fontData);
  // Get the first page of the document
  const pages = pdfDoc.getPages();
  let i = 1;
  for (const page of pages) {
    page.drawText(project, {
      x: 260,
      y: 800,
      size: 14,
      font: heitiFont
      // color: rgb(0.95, 0.1, 0.1)
    });

    page.drawText(`页码:${i}`, {
      x: 520,
      y: 800,
      size: 14,
      font: heitiFont
      // color: rgb(0.95, 0.1, 0.1)
    });
    i++;
  }
  // Serialize the PDFDocument to bytes (a Uint8Array)
  const pdfBytes = await pdfDoc.save();
  const target = tryFileName(`${project}.pdf`);
  await promises.writeFile(target, pdfBytes);
}

async function main() {
  rmDir('html');
  rmDir('pdf');

  mkdir('html');
  mkdir('pdf');

  if (process.argv.length !== 4) {
    console.log(`批量打印代码

使用方法：
  node index.js <源码文件夹> <项目名>`);
    return;
  }

  console.log('源码文件夹', process.argv[2]);
  const FILE_LIST = 'filelist.txt';

  const sourceDir = process.argv[2];
  const projectName = process.argv[3];

  // 文件列表要手工生成，并去除不需要的文件，比如图片、压缩包、不需要的配置文件等
  const files = (await promises.readFile(FILE_LIST)).toString();
  const filePaths = files.split('\n').filter((v) => v);

  const htmlFileList = await writeHtmls(filePaths, sourceDir);
  await writePdfs(htmlFileList, projectName);

  const targetName = basename(sourceDir);
  const mergeFile = await mergePdfs(htmlFileList, targetName);
  await addPageNumber(mergeFile, projectName);
  console.log('Done');
}

// 检测是否windows系统
const platform = process.platform;
if (platform === 'win32') {
  main();
} else {
  console.log('请在windows系统下运行此脚本');
}

如何把软件源代码批量转换成打印 pdf 文件？ ​

总体流程 ​

使用的技术栈 ​

具体实现 ​

获取要打印的源码文件 ​

读取代码文件生成 html ​

生成 pdf 文件 ​

合并 pdf 文件 ​

添加页眉标识 ​

联系我们