The Boundaries of TypeScript Testing

Published: Mar 29, 2025

Last updated: Mar 29, 2025

Over my years of software engineering, there is hill that I stumbled upon that I am willing to die on: Test coverage does not equate to your test confidence.

Increasing your test confidence vector is a hot topic. Some of my greatest mistakes and, as far as I am concerned, most useless code has written done around testing.

In this post, we'll focus on principles for static analysis, unit testing and integration testing that can increase your confidence vector as much as possible while limiting some of the pain points. Some of these points are certain to be contentious and are unashamedly biased from the parts of testing that have consumed way-too-many-hours-to-admit from my short tenure on this planet.

Companion code

Unlike my other blog posts, this one won't be building anything from scratch. There is a companion project to this blog post that I will often link to and share code from throughout this post.

The principles I will talk through today are mostly agnostic to language, tool and runtime, although whether it's idiomatic or frowned up depends on your community and how well-equipped your chosen ecosystem is.

We'll be walking through these principles in TypeScript to satiate some of my recent thoughts.

Issues to acknowledge with test coverage

Software teams use a minimum coverage percentage as a quality gate, but what that number is varies from source-to-source. Google's testing blog has testing coverage values where 60% is acceptable, 75% is commendable and 90% is exemplary. Atlassian's page on code coverage mentions that 80% coverage is good to aim for, but higher coverage might yield dimishing returns.

I personally have two main gripes around test coverage:

  1. The reported coverage can be misleading.
  2. Strict quality gates can lead to writing tests for the sake of writing tests, which leads to bad tests.

To illustrate some of the problems with code coverage, let's start with the basics. Take the following pure function add:

export function add(a: number, b: number) { return a + b; }

We can write a simple test to validate this:

import { expect, test } from "vitest"; import { add } from "./math"; test("should add 1 + 2", () => { expect(add(1, 2)).toEqual(3); });

With vitest, if I test with coverage e.g. vitest --coverage, I get the following output:

--------------------------|---------|----------|---------|---------|------------------- File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s --------------------------|---------|----------|---------|---------|------------------- All files | 100 | 100 | 100 | 100 | utils | 100 | 100 | 100 | 100 | math.ts | 100 | 100 | 100 | 100 | --------------------------|---------|----------|---------|---------|-------------------

Great! We have 100% coverage! No need for tests to try to trip up the boundaries, right? Well, writing a test to check 1 + 2 isn't exactly adventurous. We hardly pushed our function and it looks as if we're at the finish line.

Other than pushing the limits of our function, what happens if someone has come in and adds some sweet nothings to our function?

interface Loggable { log: (msg: string) => void; } function sideEffect(logger: Loggable) { logger.log("yo"); } export function add(a: number, b: number, logger: Loggable) { sideEffect(logger); return a + b; }

We have a new side effect introduced, but what happens if we update our test to pass through a console argument?

import { expect, test, vi } from "vitest"; import { add } from "./math"; test("should add 1 + 2", () => { const result = add(1, 2, console); expect(result).toEqual(3); });

--------------------------|---------|----------|---------|---------|------------------- File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s --------------------------|---------|----------|---------|---------|------------------- All files | 100 | 100 | 100 | 100 | utils | 100 | 100 | 100 | 100 | math.ts | 100 | 100 | 100 | 100 | --------------------------|---------|----------|---------|---------|-------------------

Still 100%, what gives!? In a world where console.log is injected a logger object that throws an error e.g. add(1, 2, { log: () => { throw new Error('KABOOM') }}), we're still at 100% coverage.

Worst of all, what happens if they changed our test to validate only the logging?

import { expect, test, vi } from "vitest"; import { add } from "./math"; test("should add 1 + 2", () => { const spy = vi.spyOn(console, "log"); add(1, 2, console); expect(spy).toHaveBeenCalledOnce(); });

If we check out test coverage again...

--------------------------|---------|----------|---------|---------|------------------- File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s --------------------------|---------|----------|---------|---------|------------------- All files | 100 | 100 | 100 | 100 | utils | 100 | 100 | 100 | 100 | math.ts | 100 | 100 | 100 | 100 | --------------------------|---------|----------|---------|---------|-------------------

We didn't even test our return value!

Code coverage, at least in the forms that I've seen it, is a guideline. In reality, it's unlikely for me to come across tests that don't verify something as important as the return value. What has been problematic for me though have been high test coverage thresholds that force your hand.

This has happened to me plenty of times where I have ended up writing tests that do nothing other than get me passed the numbers required to pass CI:

// code function someService(id: string): { id: string; title: string } { return someRepo(id); } // test test("someService", () => { const spy = vi.spyOn(someRepo); spy.mockImplementationOnce((id: string) => ({ id: "1", title: "title" })); const res = someService("123"); expect(res).toBe({ id: "1", title: "title" }); });

In testing environments that require you to mock out responses with hazy boundaries, you can end up writing tests like this simply to make coverage requirements.

If you _are_ a previous employer reading this, I apologize for the sins that I have committed.

To summarize: committing to code coverage can be hazardous. If you're going to do it, pick a safe threshold and embed it into the engineering culture at your company to prioritize high-quality tests that are not easily fallible.

As we continue into the age of AI assistance, configure your review agents to help scrutinize your choice of edge cases and make sure that any iffy boundaries are covered by tests.

Issues to acknowledge with mocking, spies and test boundaries

Mocks are commonly used within testing to help with isolation. This can be helpful, but can become a scapegoat with setups that feel non-trivial to configure correctly to better represent your system.

In the most recent code example, we hinted towards test boundaries that can negatively impact our test confidence.

But what is a "test boundary"? My definition of the test boundary is the scope of what we can accurately test against. Accuracy being the crucial point here.

Let's look out test again, make some adjustments and add some more context on what the someRepo function does:

// code async function someRepo(id: string): { id: string; title: string } { console.log("Attempting fetch from cache"); const cacheResult = await checkCache(id); if (cacheResult) { return cacheResult; } console.log("Attemping fetch from database"); const dbResult = await fetchFromDb(id); return dbResult; } async function someService(id: string): { id: string; title: string } { return someRepo(id); } // test test("someService", async () => { const spy = vi.spyOn(someRepo); spy.mockImplementationOnce((id: string) => ({ id: "1", title: "title" })); const res = await someService("123"); expect(res).toBe({ id: "1", title: "title" }); });

In our code now, the someRepo function checks a remote cache for a result, returns it if there is a hit or returns the result from the database. The test hasn't changed from what it was previously.

Ask yourself the question: what boundaries of this functionality are tested with the current implementation? The entire raison d'ĂȘtre for our someService test is to evaluate it's functionality, but here we do not step in into our code beyond the current someService context due to our mock.

In practice, increasing our test confidence requires us to increase how far we can "step in" our code in order to reliably test as close to our system as possible.

As a first pass at our test refactoring, what happens if we update our mocks to move to the boundary where they mock our checkCache and fetchFromDatabase instead?

describe("someService", () => { test("returns expected value from database", async () => { const consoleSpy = vi.spyOn(console); const checkCacheSpy = vi.spyOn(checkCache); const fetchFromDbSpy = vi.spyOn(fetchFromDb); checkCacheSpy.mockImplementationOnce((id: string) => null); fetchFromDbSpy.mockImplementationOnce((id: string) => ({ id: "1", title: "title", })); const res = await someService("123"); expect(res).toBe({ id: "1", title: "title" }); expect(checkCacheSpy).toHaveBeenCalledOnce(); expect(fetchFromDb).toHaveBeenCalledOnce(); expect(console).toHaveBeenCalledTimes(2); }); test("returns expected value from cache", async () => { const consoleSpy = vi.spyOn(console); const checkCacheSpy = vi.spyOn(checkCache); const fetchFromDbSpy = vi.spyOn(fetchFromDb); checkCacheSpy.mockImplementationOnce((id: string) => ({ id: "1", title: "title", })); const res = await someService("123"); expect(res).toBe({ id: "1", title: "title" }); expect(checkCacheSpy).toHaveBeenCalledOnce(); expect(fetchFromDb).not.toHaveBeenCalledOnce(); expect(console).toHaveBeenCalledOnce(); }); });

In our tests now, we have shifted the test boundaries a little deeper to where checkCache and fetchFromDb begin. We can improve on this more, but first let's take a moment to consider our spies and .toHaveBeenCalledOnce calls.

Consider the following refactor:

async function someRepo(id: string): { id: string; title: string } { console.log("Running someRepo"); const cacheResult = await checkCache(id); if (cacheResult) { return cacheResult; } // CONSOLE LOG REMOVED const dbResult = await fetchFromDb(id); return dbResult; }

In our someRepo code, we've refactored the logs but the core functionality has not changed. If we run our tests though... they fail.

You can probably know why: one of our tests expect console to have been called twice. This is where we fall into "false negative" territory. As far as desired output behavior goes for our someRepo function, it still operates the same as before, but our tests were written as a false representation of that expected behavior.

Unless it is absolutely necessary to ensure a function is called as part of another, my ethos is that you should omit it entirely. We'll justify the reasons in an upcoming section, but for now my starting guideline would be to test the inputs and outputs.

For now, let's refactor the tests back to the following:

describe("someService", () => { test("returns expected value from database", async () => { const checkCacheSpy = vi.spyOn(checkCache); const fetchFromDbSpy = vi.spyOn(fetchFromDb); checkCacheSpy.mockImplementationOnce((id: string) => null); fetchFromDbSpy.mockImplementationOnce((id: string) => ({ id: "1", title: "title", })); const res = await someService("123"); expect(res).toBe({ id: "1", title: "title" }); }); test("returns expected value from cache", async () => { const checkCacheSpy = vi.spyOn(checkCache); checkCacheSpy.mockImplementationOnce((id: string) => ({ id: "1", title: "title", })); const res = await someService("123"); expect(res).toBe({ id: "1", title: "title" }); }); });

Now we can freely refactor non-critical side-effects like console.log without returning false negatives about the desired function's behavior.

An example where validating calls made would be for functions where their entire purpose is to emit expected, measurable side-effects that the system should prevent from unintended changes.

In the accompanying repository, see these tests for a Proxy where the entire purpose is to emit logs at the beginning and end of a function call.

import { beforeEach, describe, expect, test, vi } from "vitest"; import { addTrace, addTraceCurried } from "./add-trace"; import { container } from "../config/ioc-test"; import { ILoggerService } from "@/services/logger-service"; import { IocKeys } from "@/config/ioc-keys"; class Target { needle() { return null; } async needleAsync() { return Promise.resolve(null); } } describe("trace proxies", () => { let logger: ILoggerService; beforeEach(() => { logger = container.get<ILoggerService>(IocKeys.LoggerService); }); describe("addTrace", () => { test("invokes two trace logs at the beginning and end of a function", () => { const spy = vi.spyOn(logger, "trace"); const target = addTrace(new Target(), logger); expect(target.needle()).toBeNull(); expect(spy).toHaveBeenCalledTimes(2); expect(spy.mock.calls).toEqual([ [{ args: [], className: "Target" }, `Target.needle() called`], [ { className: "Target", duration: expect.any(String) }, `Target.needle() returned`, ], ]); }); test("invokes two trace logs at the beginning and end of an async function", async () => { const spy = vi.spyOn(logger, "trace"); const target = addTrace(new Target(), logger); expect(await target.needleAsync()).toBeNull(); expect(spy).toHaveBeenCalledTimes(2); expect(spy.mock.calls).toEqual([ [{ args: [], className: "Target" }, `Target.needleAsync() called`], [ { className: "Target", duration: expect.any(String) }, `Target.needleAsync() returned`, ], ]); }); }); describe("addTraceCurried", () => { test("full application invokes two trace logs at the beginning and end of a function", () => { const spy = vi.spyOn(logger, "trace"); const partiallyAppliedFn = addTraceCurried(logger); const target = partiallyAppliedFn(new Target()); expect(target.needle()).toBeNull(); expect(spy).toHaveBeenCalledTimes(2); expect(spy.mock.calls).toEqual([ [{ args: [], className: "Target" }, `Target.needle() called`], [ { className: "Target", duration: expect.any(String) }, `Target.needle() returned`, ], ]); }); test("invokes two trace logs at the beginning and end of an async function", async () => { const spy = vi.spyOn(logger, "trace"); const partiallyAppliedFn = addTraceCurried(logger); const target = partiallyAppliedFn(new Target()); expect(await target.needleAsync()).toBeNull(); expect(spy).toHaveBeenCalledTimes(2); expect(spy.mock.calls).toEqual([ [{ args: [], className: "Target" }, `Target.needleAsync() called`], [ { className: "Target", duration: expect.any(String) }, `Target.needleAsync() returned`, ], ]); }); }); });

In these tests, I explicitly spy on the instance of the ILoggerService to ensure that logger.trace is called with the expected arguments as that is the purpose of the proxy.

For this particular test, asserting that the dependency calls is as much confidence as I want from these tests.

For the astute, you will have notice that spies don't inherently mock out functionality unless explicitly stated. This means that I can assert the desired functionality while testing against a real implementation.

Just to reiterate: I do not use spies like this for every function call.

Remember what we covered about false negatives. I will only ever do this in scenarios where a side effect is the sole desired outcome for the function existing.

In these scenarios, there is normally no response or the side effect has higher priority. In my example, the proxy return type plays second fiddle to the trace logging.

For impure functions that return values that you care about (from databases etc), assert against output and do not get caught up on the implementation details.

In the accompanying codebase, you can see some examples of testing a repository layer that cares more about the output than the implementation details:

describe("getBlog", () => { let post: Post; beforeEach(async () => { post = postFactory.build(); await prisma.post.create({ data: post, }); }); test("can get a blog", async () => { const result = await blogRepository.getBlog({ param: { blogId: post.id } }); expect(result.isOk()).toBe(true); result.map((value) => { expect(value.data?.title).toBe(post.title); expect(value.data?.content).toBe(post.content); }); }); test("can retrieve a blog from cache", async () => { const noCacheResult = await blogRepository.getBlog({ param: { blogId: post.id }, }); expect(noCacheResult.isOk()).toBe(true); noCacheResult.map((value) => { expect(value._cacheHit).toBe(false); }); const result = await blogRepository.getBlog({ param: { blogId: post.id } }); expect(result.isOk()).toBe(true); result.map((value) => { expect(value._cacheHit).toBe(true); }); }); test("returns BlogNotFoundError when there is no matching blog found", async () => { const result = await blogRepository.getBlog({ param: { blogId: faker.string.uuid() }, }); expect(result.isErr()).toBe(true); result.mapErr((e) => { expect(e._tag).toBe("BlogNotFoundError"); }); }); });

Notice that it does not check that certain wrapper library functionality is called. Is purely cares on whether or not the functionality returns the expected data and information about hitting the cache. If I swapped out cache implementation or ORM but kept the contract, then this test will not need to change and it will be the best indicator that I've maintained the expected functionality.

Finally, I wanted to finish by acknowledging an area where I find mocking appropriate at the cost of shortening the test boundary: error cases that are not easily replicated and are explicitly handled for unhappy paths.

Applications are fallible to the most bizarre things, so if you're working with a system that handles edge cases that are not easily replicated but are actively handled within your application, then mocking will come to the rescue.

For example (albeit contrived), what happens if you're writing code that specifically handles cases where a disk is full?

// file-service.ts import { promises as fs } from "fs"; import path from "path"; import { err, ok } from "neverthrow"; /** * Saves data to a file, with retry logic for edge cases * @param filePath Path to save the file * @param data Content to write */ async function saveFile(filePath: string, data: string): Promise<void> { try { // Create directory if it doesn't exist const directory = path.dirname(filePath); await fs.mkdir(directory, { recursive: true }); // Write the file await fs.writeFile(filePath, data, { encoding: "utf8" }); return ok(true); } catch (error) { if (err.code === "ENOSPC") { // No space left on device // Very difficult to emulate without actually filling disk return err({ _tag: "DiskFull", somethingImportant: true, }); } return err({ _tag: "ContrivedCatchAll", }); } }

In this example, replicating the disk full scenario is not easily done. Mocking comes to our rescue here:

// file-service.test.ts import { describe, beforeEach, it, expect, vi } from "vitest"; import { promises as fs } from "fs"; import path from "path"; import { saveFile } from "./file-service"; // Mock the fs module vi.mock("fs", () => ({ promises: { writeFile: vi.fn(), mkdir: vi.fn().mockResolvedValue(undefined), }, })); describe("saveFile", () => { beforeEach(() => { vi.resetAllMocks(); vi.mocked(fs.mkdir).mockResolvedValue(undefined); }); it("should handle disk full error correctly", async () => { // Mock writeFile to throw an ENOSPC error const mockError = new Error("No space left on device"); // Add the code property to the error (mockError as any).code = "ENOSPC"; // Setup the mock to reject with our custom error vi.mocked(fs.writeFile).mockRejectedValue(mockError); const result = await saveFile("/path/to/file.txt", "test data"); // Verify the error was handled correctly expect(result.isErr()).toBeTruthy(); result.mapErr((e) => { expect(e._tag).toBe("DiskFull"); expect(e.somethingImportant).toBe(true); }); }); });

Guidelines for test boundaries

With our test boundaries, how far can and should we push? I believe this strongly depends on preferences and configuration, but my guidelines:

  1. For tests that compose of functionality that makes network requests, test out until the network boundary with request interceptors like MSW where possible.
  2. For tests that compose of functionality that interface directly with databases, attempt to run tests against a local copy of the expected database using tools like testcontainers.
  3. For tests that interface with third party library utilities, test without mocking out the utilities.
  4. For other side effects types like interfacing with file systems, if it's undesirable to write to the file-system and clean up, then attempt to mock in-memory. For something like Node.js, there exists utilities like mock-fs.

These guidelines begin to blur the definitions between unit testing and integration testing. If you're familiar with the testing pyramid, then you may favor unit tests as your baseline. Over time, I've personally grown to become more in favor of model closer to the testing trophy which places emphasis on static analysis as your baseline and integration testing as the testing layer to focus on most.

Most of the time, blurring these lines is inconsequential for projects. For projects that require more setup and most importantly are actively becoming an acknowledged problem and time consuming to run, then consider updating your testing configuration to separate these.

Finally, one common bottleneck that I haven't called out here is that to prevent state management race conditions for dependencies like databases, you generally need to forego any parallelization for tests running on the same machine. I should reiterate that until this becomes a problem, I wouldn't be too concerned about this. Locally, tests runners allow running a subset of tests. On CI, there are also strategies to split tests across test runners with isolated database that might be worth considering first. Seek out alternatives first and proactively evaluate any changes to your approach.

An example configuration for enforcing single-threaded, sequential testing with Vitest:

import { defineConfig } from "vite"; export default defineConfig({ test: { // ... some data omitted pool: "forks", poolOptions: { forks: { singleFork: true, }, }, globalSetup: ["./src/tests/vitest-global-setup.ts"], setupFiles: ["./src/tests/vitest-setup.ts", "./src/tests/matchers.ts"], }, });

In my example code, I'm spinning up both a Valkey and Postgres test container, along with using MSW to intercept my HTTP requests to a make-believe service, so I use the global setup and setup files to prepare this.

My global setup will stand up the databases and tear them down at the end of the tests:

import { execSync } from "child_process"; import { psqlContainer, valkeyContainer } from "../lib/testcontainers"; export async function setup() { // Start the PostgreSQL container const [psql, valkey] = await Promise.all([psqlContainer, valkeyContainer]); const databaseUrl = psql.getConnectionUri(); process.env.DATABASE_URL = databaseUrl; // Start the Valkey container const valkeyUrl = valkey.getConnectionUrl(); process.env.VALKEY_URL = valkeyUrl; // Run migrations and seed your database execSync("npx prisma migrate reset --force --skip-seed", { stdio: "inherit", env: { ...process.env, DATABASE_URL: databaseUrl }, }); } export async function teardown() { const psql = await psqlContainer; const valkey = await valkeyContainer; await Promise.all([psql.stop(), valkey.stop()]); }

My setup files apply Vitest hooks in order to ensure that my databases and MSW mocks are cleared between runs to prevent race conditions:

import { server } from "../mocks/server"; import { PrismaClient } from "@prisma/client"; import { container } from "../config/ioc-test"; import { beforeAll, afterEach, afterAll, beforeEach } from "vitest"; import { IocKeys } from "../config/ioc-keys"; const keyv = container.get<any>(IocKeys.KeyvClient); const prisma = container.get<PrismaClient>(IocKeys.PrismaClient); /** * Clean all tables in the database */ async function cleanDatabase() { // Get all tables const tables = await prisma.$queryRaw<Array<{ tablename: string }>>` SELECT tablename FROM pg_tables WHERE schemaname='public' AND tablename NOT IN ('_prisma_migrations') `; // Truncate each table for (const { tablename } of tables) { try { await prisma.$executeRawUnsafe(`TRUNCATE "${tablename}" CASCADE;`); } catch (error) { console.log(`Error truncating ${tablename}`); } } } beforeAll(async () => { server.listen({ onUnhandledRequest: "bypass" }); }, 30000); beforeEach(async () => { // Clean the database before each test to start fresh await cleanDatabase(); }); afterEach(async () => { await keyv.clear(); server.resetHandlers(); }); afterAll(async () => { server.close(); await prisma.$disconnect(); await keyv.disconnect(); });

In the example code, I've chosen to use table truncation for clearing the relational database between runs as it feels the simplest for my use case.

The time is takes for me to run the tests from a fresh start with cached containers is about 7 seconds for the full suite, which includes starting up the Docker containers (albeit there are only ~30 tests).

Remotely, the CI job takes ~16 seconds.

Using real databases comes with more trade-offs that we haven't covered in this post. If your setup flakes more often than not, then blurring those lines is going to lead to a bad time. If your perspective is that you are not time poor, then you should address those configuration issues rather than separating the tests and isolating the problem to one set of tests.

Test data guidelines

The previous section covered database integration into our testing, so let's recap some guidelines that around this:

  1. Serialize tests if tests share the same database.
  2. Write global setup and teardown functionality to ensure safe setup and teardown of your database(s).
  3. Enforce a clean slate before each test.
  4. Colocate the test data within the test file. It should be easy to track down where data was initialized from.

There may be instances where some of these guidelines don't apply. For example. some possible options for database isolation that have different trade-offs for cost:

  1. Database isolation per test.
  2. Transactions per test that are rolled back.
  3. Truncating tables after each tests.

In addition to that, testing in CI may open up avenues to splits tests up across shards that may allow you to configure different databases per shard.

The problems with emulation

I'll keep this section short, but as a general rule of thumb: if you are going to test your application beyond the application boundary, then beware emulation. If you can't use the actual technology locally, then question it's use at all.

This is more a warning than a "never do it" scenario, because there are times where things like drop-in replacements for providing in-memory APIs can be incredibly useful.

In general though, if you're using something 3rd party to emulate something in your environment that belongs to closed-source work, know that any breaking changes or upgrades to the original product will cause you grief (on top of any bugs or lack-of-support from the 3rd party software). Weigh up the trade-offs before implementing something like this.

Using cloud technologies is an example of something I will never bother to emulate for testing or development in my personal work. If something is not officially provided to run locally, I don't bother. I learned this the hard way and spent way too many hours debugging 3rd party emulators when running on the actual platform worked as expected.

You're better off testing on cloud hardware if you ever end up in a scenario like this. Once again, evaluate whether or not the cost and complexity of doing so is worth it.

Pure functions and bounded domains and ranges

Pure functions have two properties:

  1. No side-effects: Things like logging, interfacing with networks or file systems constitutes a side-effect.
  2. Same input == same output: When passing the exact same arguments in, we expect the same result every time.

Let's go back to a previous example of implementing add:

interface Loggable { log: (msg: string) => void; } function sideEffect(logger: Loggable) { logger.log("yo"); } export function add(a: number, b: number, logger: Loggable) { sideEffect(logger); return a + b; }

We know this violates rule (1) of pure functions due to it's side-effect, so let's refactor a little more and add some constraints around our add function:

function constrainedAdd(a: number, b: number) { if (isNaN(a) || isNaN(b)) { return 0; } a = a < 0 ? 0 : a > 2 ? 2 : a; b = b < 0 ? 0 : b > 2 ? 2 : b; return Math.floor(a) + Math.floor(b); }

This completely useless function will add two integers together, with any argument smaller than 0 being set as 0 and any argument larger than 2 being set to 2. It also enforces the numbers added to be integers.

What happens if we were to put it to use?

constrainedAdd(1, 2); // 3 constrainedAdd(-1, -1); // 0 constrainedAdd(-1, 3); // 2 constrainedAdd(3, 3); // 4 constrainedAdd(1.9, 2.1); // 3 constrainedAdd(NaN, 2); // 0

This function may appear useless, but it is a pure function.

  1. There are no side effects.
  2. Each valid input will return the same output.

Here is another useful tidbit about pure functions: functions that are only composed of other pure functions are themselves pure. Assuming Math.floor and isNaN themselves are pure functions, our constrainedAdd does not violate any rules on pure functions. This has some useful properties that make our lives with testing much easier.

The next important lesson we can take away from constrainedAdd is how we define the domain and range of the function.

The domain describes the bounds of valid input, and any given domain input itself for a pure function can map directly to the range which is a set of all possible outputs.

For our function, our domain for the arguments can be described as any valid number, but with this implementation it's our range that is is most interesting. Our output is locked to be any valid integer within the set {0,1,2,3,4}.

The beauty of pure functions it makes life easier to test against edge cases if you can understand the range.

it.each([ // [a, b, expected] [1, 2, 3], // Normal case within constraints [-1, -1, 0], // Both negative values constrained to 0 [-1, 3, 2], // Negative and above constraint combination [3, 3, 4], // Both values above constraints [1.9, 2.1, 3], // Decimal values with flooring and constraints [NaN, 2, 0], // NaN constrains the output to 0 ])("adds constrained %f and %f to equal %i", (a, b, expected) => { expect(constrainedAdd(a, b)).toBe(expected); });

In a test like the one above, we increase our test confidence vector by increasing our assertion coverage for as many edge cases as we can infer from the domain and range.

The domain and range may not always be feasible to calculate and test all possible options for. It's worth trying to calculate these parameters bounds as you write functions though, as it can help you understand the complexity of what you are writing from a practical standpoint.

Although math and numbers are the most relatable when it comes to domain and range (for anyone stuck doing Calculus anyway), this also applies to any argument where the output is consistent for the same input.

The following is also a pure function:

function getSecret(code: "alpha" | "bravo" | "delta") { switch (code) { case "alpha": return "hello"; case "bravo": return "from"; case "delta": return "planet earth"; } }

In the above, our domain is one of three values alpha, bravo or delta while the range is hello, from and planet earth.

In hindsight after writing this, explaining domain and range with getSecret seems easier to understand than my constrainedAdd function...

Functional core, imperative shell

This pattern is something I first came across at Ruby conferences five or so years ago and it is a pattern that I've written about before.

The heart of this concepts lies around what we've learned so far related to pure functions and composition. Pure functions are easier to reason about, and so if you start with your "deepest" code as being composed of pure functions (a.k.a. a functional core), then testing those functions can yield more consistency.

destroyallsoftware has a great screencast outlining this pattern.

Not to mislead anyone with this analogy, but when I visualize this pattern then I simply think of a planet where the "core" is the deepest part.

In practical terms, if we consider our controller-service-repository layer application, we already know that we can't avoid side-effects with our repository layers. That's okay, the idea is to invert as much of our impure logic upward until we hit our "imperative shell". The boundary of the shell and core can be thought of as fluid.

For example, take the following:

function functionB() { console.log("core started"); somePureFunction(); console.log("core ended"); } function functionA() { functionB(); } functionA();

At our current state, assuming that somePureFunction is in fact pure, then our functional core boundary starts at the somePureFunction invocation (and our imperative shell boundary ends), because functionB is impure thanks to the logs.

Let's refactor again:

function functionB() { somePureFunction(); } function functionA() { console.log("core started"); functionB(); console.log("core ended"); } functionA();

If we "push up" the logs to functionA, then functionB through composition has become a pure function itself, so now the functional core starts at the invocation of functionB.

In my mind, the ideology of "functional core, imperative shell" isn't explicit that certain layers are entirely pure or impure. I think of it as "dipping into" functional cores at those boundaries. You can had methods within your service layer that are entire pure, or they're impure because they call into functionality that interfaces with something like our databases.

Something I think is worthwhile highlighting is that my "production" Inversify container injects a proxy for logging into the layers with addTrace, whereas my Inversify test container does not. This means that, when testing, none of the composed classes are automatically made impure from the proxy. This is a purposeful decision.

In practice, I find that service layers methods are the point where you normally "dip in and out" of the line between the functional core and imperative shell, but that's not a hard rule. Business layer logic that operates on concrete instances can normally be implemented as functional while calls out to the database can only ever be impure.

Dependency injection

Dependency injection has many benefits, but it also enables a level of control that can be second-to-none around testing. Inverting control of what is constructed for a class instance can be a powerful tool.

Take the following code:

interface IUser { id: string; name: string; } interface IUserRepository { findById(id: string) => IUser; } class UserRepository extends IUserRepository { async findById(id: string) { return someWayToGetUser(id); } } class Service { private userRepo: IUserRepository constructor(userRepo: IUserRepository) { this.userRepo = userRepo; } async findUser(id: string) { const user = await findUser(id); return user; } }

In our contrived code, we've inverted control over the userRepo to be anything that adheres to the IUserRepository.

When it comes to testing, we can inject anything to adheres to the IUserRepository contract, be that an instance of UserRepository or any mock that adheres to the contract.

That all being said, please don't forget the guidelines. Unless given an explicit reason to implement a mock, stick as close to the real thing as you can.

In cases where you use tools like Effect, tsyringe or Inversify, this can make it trivial to provide test-specific dependencies where required.

Within the accompanying repository, I've configured Inversify with a test-specific container. This testing container can be used to supply the mock implementations or remove unnecessary behaviour (like I have with removing higher-order proxies).

Conclusion

Today's post covered a lot of testing methodology and guidelines that enable you to reason more about what effective testing can look like for you.

A recap of the most important talking points:

  • The problem with code coverage
  • The impacts mocking can have on your test confidence vector

Photo credit: zhangkaiyv

Personal image

Dennis O'Keeffe

Byron Bay, Australia

Dennis O'Keeffe

2020-present Dennis O'Keeffe.

All Rights Reserved.