描述在 grunt 中实现搜索关键词高亮 - 问题详情 - 创脉思

解读

面试官并非想听“用哪个插件”，而是考察候选人能否把“搜索关键词高亮”这一纯前端交互需求无缝嵌入到 Grunt 的构建流水线中，同时兼顾国内工程化落地场景（如 SEO、合规、性能审计）。核心思路是：

把“高亮”做成构建时静态标记，而非运行时正则，避免客户端计算带来的性能扣分和白屏风险；
让 Grunt 负责扫描入口 HTML/模板 → 提取关键词 → 注入高亮标签 → 产出带标记的静态文件；
关键词来源必须可配置（SEO 词库、运营后台、Git 分支变量），以适配国内“投放页经常换词”的敏捷节奏；
整个流程要可缓存、可增量、可回滚，满足国内 CI/CD 平台（Jenkins+钉钉、GitLab-Runner+企业微信）的质检 gate。

知识点

Grunt 任务生命周期（init → config → registerTask → run）
multi-task 机制与 this.files 动态文件映射
模板引擎（lodash.template、swig、art-template）与 HTML 解析器（cheerio、htmlparser2）的选型差异
Node 流式读写与 Grunt 的 grunt.file.copy 性能对比
国内 CDN 缓存策略：带高亮标记的文件如何打时间戳避免覆盖回源
关键词高亮算法：AC 自动机多模式匹配 vs. 正则前瞻捕获，后者在构建时易回溯被安全扫描误判为 XSS 攻击，需做转义
合规过滤：关键词必须过广告法敏感词库，Grunt 任务里需前置 grunt-contrib-replace 做黑词剔除
SourceMap 同步：若同时启用 grunt-contrib-uglify，需把注入后的行号映射回原始模板，方便 Sentry 上报定位

答案

安装依赖

npm i -D grunt-highlight-keywords cheerio lodash.escape

编写自定义任务 tasks/grunt-highlight-keywords.js

module.exports = function(grunt) {
  grunt.registerMultiTask('highlightKeywords', '静态高亮搜索关键词', function() {
    const options = this.options({
      keywords: [],          // 支持数组或外部 JSON 路径
      tagName: 'mark',
      className: 'kw',
      caseSensitive: false,
      minLength: 2
    });
    // 支持国内运营习惯：从 OSS 拉词库
    if (typeof options.keywords === 'string') {
      options.keywords = grunt.file.readJSON(options.keywords);
    }
    const cheerio = require('cheerio');
    const escape = require('lodash.escape');
    // AC 自动机多模式匹配，避免回溯
    const createTrie = require('./ac-trie'); // 自建 30 行实现
    const trie = createTrie(options.keywords, options.caseSensitive);

    this.files.forEach(f => {
      const src = f.src.filter(filepath => grunt.file.exists(filepath));
      src.forEach(filepath => {
        const html = grunt.file.read(filepath);
        const $ = cheerio.load(html, { decodeEntities: false });
        $('body').find('*').contents().filter(function() {
          return this.type === 'text' && this.data.trim().length >= options.minLength;
        }).each(function() {
          const text = this.data;
          const hits = trie.search(text);
          if (!hits.length) return;
          let lastIndex = 0, out = '';
          hits.forEach(h => {
            out += escape(text.slice(lastIndex, h.start)) +
                   `<${options.tagName} class="${options.className}">` +
                   escape(text.slice(h.start, h.end)) +
                   `</${options.tagName}>`;
            lastIndex = h.end;
          });
          out += escape(text.slice(lastIndex));
          $(this).replaceWith(out);
        });
        grunt.file.write(f.dest || filepath, $.html());
        grunt.log.ok('高亮完成 → ' + (f.dest || filepath));
      });
    });
  });
};

Gruntfile.js 中接入

module.exports = function(grunt) {
  grunt.initConfig({
    highlightKeywords: {
      dist: {
        options: {
          keywords: 'https://your-oss.oss-cn-hangzhou.aliyuncs.com/seo/keywords.json',
          className: 'kw',
          tagName: 'em'
        },
        files: [{
          expand: true,
          cwd: 'src',
          src: '**/*.html',
          dest: 'dist'
        }]
      }
    }
  });
  grunt.loadTasks('tasks');
  grunt.registerTask('default', ['clean:dist', 'highlightKeywords', 'cssmin', 'uglify']);
};

国内合规加固

前置任务 replace:compliance 把广告法敏感词从 keywords 里剔除；
后置任务 hashres 给产出文件加 ?v={gitCommit}，防止 CDN 缓存旧高亮版本；
在 .gitlab-ci.yml 中增加 only: variables 判断，若 keywords.json 无变动则跳过，提高增量构建速度。

拓展思考

动态与静态的权衡
如果页面是服务端渲染（Next.js、Nuxt），高亮放在构建时会导致“运营换词需全量重部署”。可让 Grunt 只产出高亮映射表（JSON），SSR 阶段用同一 AC 自动机算法注入，兼顾灵活与性能。
微前端场景
国内很多团队用 qiankun 做微前端，子应用独立仓库。可在基座构建阶段通过 Grunt 的 grunt-zip 把各子应用 HTML 拉下来统一高亮，再推回 OSS，实现“中心化 SEO”而无需改动子应用代码。
安全与性能审计
高亮标签会增加 DOM 节点，**灯塔性能分（Lighthouse）**可能掉 2~3 分。可让 Grunt 在 CI 中跑 lighthouse-ci，若得分低于 85 则自动回滚到上一次高亮快照，保证国内广告页过“大媒体 DSP 质检”。
WebComponents 方案
若公司技术栈已迁移到 ShadowDOM，构建时注入的 <em class="kw"> 会被样式隔离。Grunt 任务需识别 template shadowrootmode 语法，把高亮逻辑注入到 shadow-root 内部，避免样式失效。