Why We Reconsidered AI for a Tax Product
AI Was the Obvious Choice
Once we were clear about the problem we wanted to solve, AI felt like the natural next step.
Tax filing is full of:
- Conditional logic
- Edge cases
- Long explanations hidden behind short questions
Modern language models are very good at this kind of work. They can explain, summarize, and guide users step by step in plain language.
So like most teams, we started with proprietary LLM APIs.
They helped us move fast.
Shipping Fast Came with Assumptions
Early on, the tradeoffs seemed acceptable.
We assumed:
- API costs would remain manageable
- Privacy guarantees would be “good enough”
- We could always revisit these decisions later
Those assumptions held while we were prototyping.
They started breaking down once we thought seriously about production.
Cost Becomes a Product Constraint
Tax workflows are not short conversations.
Users ask follow-up questions. They revisit answers. They need confirmation and reassurance.
That means:
- Long prompts
- Long responses
- Repeated interactions over time
At scale, token-based pricing turns into a variable cost that grows faster than usage.
For a product meant to be affordable and accessible, that matters.
Cost stopped being an infrastructure detail and started becoming a product constraint.
Privacy Is Not an Abstraction
Cost was only part of the picture.
Tax data includes:
- Income details
- Employment history
- Residency status
- Family information
Sending that data to third-party models, even with strong contractual terms, introduces risk.
It also creates a trust gap.
If users are expected to share sensitive financial information, they deserve clarity about where that data goes and how it is handled.
For us, privacy could not be an afterthought.
The Shift Toward Open Models
These concerns pushed us toward a different approach.
Instead of treating AI as a remote service, we started thinking of it as part of our own system.
Open-source models offered:
- Predictable infrastructure costs
- Full control over data flow
- The ability to adapt models to our domain
Fine-tuning made it possible to shape behavior without leaking user data outside our environment.
The tradeoff was clear.
We would be taking on more engineering responsibility.
Fine-Tuning Is Only the Beginning
Training a model is relatively straightforward.
Running it reliably is not.
Once we committed to open models, new questions appeared:
- How do we ensure reproducible training runs?
- How do we track experiments and changes?
- How do we serve models safely?
- How do we roll back when something breaks?
At that point, we were no longer just building a product.
We were building an MLOps stack.
Why We Started Local First
Before thinking about cloud infrastructure or scaling, we focused on fundamentals.
We wanted to be able to:
- Train models on our own machines
- Reproduce results days or weeks later
- Understand failures without guesswork
That decision led us to a local-first, Docker-based MLOps setup.
It wasn’t the fastest path, but it was the most honest one.
What’s Next
In the next post, I’ll walk through why fine-tuning alone is not enough, and how environment drift, dependency issues, and GPU setup quickly become the real bottlenecks.
This is where MLOps stops being theoretical and starts becoming necessary.
If you’re building AI-powered products in regulated or trust-sensitive domains, these tradeoffs will feel familiar.
We’re still learning. This blog is part of that process.
