Jun 9, 2025

ai llm privacy cost founder-notes

Why We Reconsidered AI for a Tax Product

AI Was the Obvious Choice

Once we were clear about the problem we wanted to solve, AI felt like the natural next step.

Tax filing is full of:

Conditional logic
Edge cases
Long explanations hidden behind short questions

Modern language models are very good at this kind of work. They can explain, summarize, and guide users step by step in plain language.

So like most teams, we started with proprietary LLM APIs.

They helped us move fast.

Shipping Fast Came with Assumptions

Early on, the tradeoffs seemed acceptable.

We assumed:

API costs would remain manageable
Privacy guarantees would be “good enough”
We could always revisit these decisions later

Those assumptions held while we were prototyping.

They started breaking down once we thought seriously about production.

Cost Becomes a Product Constraint

Tax workflows are not short conversations.

Users ask follow-up questions. They revisit answers. They need confirmation and reassurance.

That means:

Long prompts
Long responses
Repeated interactions over time

At scale, token-based pricing turns into a variable cost that grows faster than usage.

For a product meant to be affordable and accessible, that matters.

Cost stopped being an infrastructure detail and started becoming a product constraint.

Privacy Is Not an Abstraction

Cost was only part of the picture.

Tax data includes:

Income details
Employment history
Residency status
Family information

Sending that data to third-party models, even with strong contractual terms, introduces risk.

It also creates a trust gap.

If users are expected to share sensitive financial information, they deserve clarity about where that data goes and how it is handled.

For us, privacy could not be an afterthought.

The Shift Toward Open Models

These concerns pushed us toward a different approach.

Instead of treating AI as a remote service, we started thinking of it as part of our own system.

Open-source models offered:

Predictable infrastructure costs
Full control over data flow
The ability to adapt models to our domain

Fine-tuning made it possible to shape behavior without leaking user data outside our environment.

The tradeoff was clear.

We would be taking on more engineering responsibility.

Fine-Tuning Is Only the Beginning

Training a model is relatively straightforward.

Running it reliably is not.

Once we committed to open models, new questions appeared:

How do we ensure reproducible training runs?
How do we track experiments and changes?
How do we serve models safely?
How do we roll back when something breaks?

At that point, we were no longer just building a product.

We were building an MLOps stack.

Why We Started Local First

Before thinking about cloud infrastructure or scaling, we focused on fundamentals.

We wanted to be able to:

Train models on our own machines
Reproduce results days or weeks later
Understand failures without guesswork

That decision led us to a local-first, Docker-based MLOps setup.

It wasn’t the fastest path, but it was the most honest one.

What’s Next

In the next post, I’ll walk through why fine-tuning alone is not enough, and how environment drift, dependency issues, and GPU setup quickly become the real bottlenecks.

This is where MLOps stops being theoretical and starts becoming necessary.

If you’re building AI-powered products in regulated or trust-sensitive domains, these tradeoffs will feel familiar.

We’re still learning. This blog is part of that process.