jezhou

Using JS generators to make interactive auditing tools in REPL

July 9, 2022

I’ve lately been doing a ton of auditing on production data (mainly through REPL), and when you have enough failing cases for an audit, it can be a real sore to go through all of the entries and examine them one by one.

My usual process for an audit is something like this:

Come up with a thesis of what’s right and what’s wrong
Write a function that verifies whether a particular piece of data is right or wrong
Query for all of the data I need to audit (usually a DB query)
Run the queried data through the function
Report the results

Steps 4 and 5 are surprisingly annoying. The most rudimentary way I’ve pulled data without any special tools is to console.log the results in a csv/tsv format, then copy paste the results into a Google Sheet or something.

Note: There are, of course, much better tools than this method out there! But if you had zero tools from the outset and you didn’t have the time/money to invest in better tooling, this is probably the de facto naive way of doing a decent data pull.

This method is totally fine if you have a relatively flat data structure you want to output, but when the data you’re dealing with is nested -or- the results you want are more complex, you can actually create some primitive tooling yourself without having to download any extra dependencies.

A complex output

Let’s say you wanted to do a scientist-esqe manual audit, where you compare the output of one function, to the output of another one (probably a modified version of the first function). How would you go about auditing the output from the two codepaths?

graph LR A[Database] -->|Query for data| B(Input Objects) B --> C{Function A/B} C --> D[Function A] C --> E[Function B] D --> F[Output A] E --> G[Output B] F --> H(((How do I diff the outputs?))) G --> H style H color:red

Again, there’s a million tools out there that can help you do this very easily (including scientist!), but let’s assume you have not set up any of these tools and you don’t have the time to make the upfront investment to add the specific tooling at the moment. You just have your REPL and your knowledge of javascript.

The ultra-naive way would be to just console.log the two outputs, and diff them with your eyes.

// Source code
function currentWay(a: string) { ... }
function newWay(a: string) { ... }

// REPL
REPL > const input = 'some_testable_input'
REPL > console.log(currentWay(input))

// ... output of current way

REPL > console.log(newWay(input))

// ... output of new way

It’s very tedious to do this for every input/output you want to test, so most people, facing this problem head on, would write a custom that does the diff and prints out the results all at once.

// Source code
function currentWay(a: string) { ... }
function newWay(a: string) { ... }

const inputs: string[] = await query();

function auditScript() {
    for(const input of inputs) {
        console.log(currentWay(input))
        console.log(newWay(input))
    }
}

// REPL
REPL > auditScript()

// ... first output of current way
// ... first output of new way
// ... second output of current way
// ... second output of new way
// ...

But trying to compare that many outputs all at once can also be a little overwhelming.

Is there something in-between? It would be nice if you could “interactively” go through each example automatically, much like you would do with git add —patch or Jest’s interactive snapshot feature.

Generators to the rescue

JS generators are a super under appreciated feature of js that allow you to iteratively yield a fixed or infinitely long set of results, one function call at a time. (Side note: if you’ve use async/await in modern js/ts, you’re already using generators underneath the hood).

The MDN tutorial on generators is really good already so I won’t dive into a tutorial here, but the basic idea is that with whatever dataset you’re working with (pulled from step 3 above), you can embed that dataset in a generator function, and console.log the output in whatever text-based format you want. For the purposes of auditing, it’s not really important in my example to yield an actual return value from the generator, so you can just yield nothing.

function currentWay(a: string) { ... }
function newWay(a: string) { ... }

const inputs: string[] = await query();

// Vanilla generator function!
function* compareCurrentAndNewInteractively() {
    for(const input of inputs) {
        console.log(currentWay(input))
        console.log(newWay(input))
        yield;
    }
}

Now, when I create the generator and call for the next value, it’ll give me the next output to compare in the fixed but large set of data that I pulled from step 3.

REPL > const generator = compareCurrentAndNewInteractively();
REPL > generator.next();

//... first output of current way
//... first output of new way

REPL > generator.next();

//... second output of current way
//... second output of new way

Voila! You just basically just added a -i option to your audit function.

Bells and whistles

The beauty of this is that because this is just vanilla code, it’s easy to extend your audit function to do whatever you want it to do.

Let’s say I want to keep track of which specific output I’m currently auditing. I can do that by adding a simple log function in my yield loop.

function* compareCurrentAndNewInteractively() {
    let counter = 1;
    for(const input of inputs) {

        console.log(`${counter} of ${inputs.length}\n`); // NEW

        console.log(currentWay(input))
        console.log(newWay(input))
        yield; 
    }
}

Now let’s say I also want to show the current input I’m working on along side the output. That’s just another log function. Boom.

function* compareCurrentAndNewInteractively() {
    let counter = 1;
    for(const input of inputs) {
        console.log(`${counter} of ${inputs.length}`);

        console.log(`input: ${input}\n`); // NEW

        console.log(currentWay(input))
        console.log(newWay(input))
        yield; // This just yields `undefined`, which is fine for auditing.
    }
}

I could even transform the input into some kind of shim before passing them along to the functions I’m auditing.

import transformInput from '../../someFile';

function* compareCurrentAndNewInteractively() {
    let counter = 1;
    for(const input of inputs) {
        console.log(`${counter} of ${inputs.length}`);
        console.log(`input: ${input}\n`);

        const shim = transformInput(input); // NEW

        console.log(currentWay(shim))
        console.log(newWay(shim))
        yield;
    }
}

The world is your oyster! It all depends on how you want to go about auditing your data on REPL, and what will help you debug and audit whatever you’re working on.

Copyable gist for your project

Now if you really want to add bells and whistles, you can cheat a little and add a library that stylizes your output as a git diff (I’m using the npm git-diff package here). I did this for work and it honestly makes debugging anything in REPL much, much easier.

Here’s a replit if you want to try it out. It’s a tiny bit janky, but you can pass different objects using differ.replDiff and it demos the functionality well.

If you like what you see, here’s a quick n’ dirty copyable gist that you can cargo cult into your node project. Place it in the file that loads your REPL, hook it up to your REPL context, and let it rip.

Have fun! I hope you found this useful 🙂