Is this a real client engagement?

No. Like our other case studies at this stage, this is illustrative. The fintech firm described does not exist. When we publish real engagements with client consent we will mark them separately. We do have real ML and fine-tuning experience in the trading domain (documented in other KoishiAI articles on XAUUSD ML work) but that is research, not a client case.

Is this a regulated financial service?

No. We do not hold SEC Thailand licences, do not provide investment advice, and do not sell trading signals to retail. This case study describes building research infrastructure that augments human analysts inside a regulated firm — the firm's licensed personnel remain responsible for every trading decision. If you need licensed financial-service provision, this is not us.

Why dense Qwen3-32B and not the 80B heavy model?

For most research queries the dense 32B is the sweet spot — 90% accuracy on our internal brutal tier at standard throughput. The 80B is worth it for the hardest long-form reasoning (complex macro analysis, multi-document synthesis) but at 5x the latency. Most fintech firms run 32B as the workhorse and provision the 80B for escalation to specific hard problems.

Can this be integrated with our proprietary alpha models?

Yes. The LLM sits alongside your existing quant stack, not on top of it. Typical integration: your alpha models produce signals and positions; the LLM layer handles the natural-language reasoning and research workflow around them — summarising regulatory filings, cross-checking news flow against position context, generating drafts of internal memos. The LLM does not touch execution or risk decisions. That separation is both a technical and a compliance boundary.

What unique experience do you bring to a fintech engagement?

Years of direct work on trading-domain ML outside of client engagements: XAUUSD signal benchmarks, regime-detection using GMM, DPO and LoRA fine-tuning of router models for trading decisions, meta-labelling for signal quality. Published some of this methodology on KoishiAI. This is not a skill set most AI consultants have — fintech clients pay for the combination of Local AI infrastructure skills PLUS understanding of what a quant workflow actually looks like.

Case Study: Local AI Research Infrastructure for a Thai Fintech — Confidential Signals Without Cloud Leak

TL;DR: Illustrative scenario — a Thai fintech or asset manager cannot legally or prudently run AI analysis on its proprietary positions via cloud APIs. A Local AI stack (Qwen3-32B + dedicated ML fine-tuning on 48 GB dual-GPU, ~380,000 THB hardware + 100,000 THB setup) gives their quant/research team a confidential reasoning assistant for internal use. Crucially: this is research infrastructure, not a licensed financial service — it augments human analysts, not replaces them.

Key facts

This is an illustrative scenario, not a real client engagement. The firm described is hypothetical.
Not a regulated financial service: we build research infrastructure. Licensed personnel at the client remain responsible for investment decisions.
Typical hardware: 48 GB dual-GPU workstation (~320,000-380,000 THB); dedicated setup 80,000-120,000 THB to handle extra security requirements for sensitive financial data.
Stack: Qwen3-32B dense as the reasoning workhorse, Qwen3-Next-80B on standby for the hardest prompts, full Local RAG over the firm’s proprietary research library.
Optional: LoRA / DPO fine-tuning of smaller models on the firm’s internal style and domain terminology — this is where our prior trading-ML research work is directly relevant.
Expected output: 2-3x faster turnaround on routine research tasks (regulatory summaries, news digestion, memo drafting); zero data leaves the firm’s network.

Why this case study exists

Transparency disclosure: this is not a real engagement. No Thai fintech has hired us for exactly this pattern. We wrote it for two reasons. First, fintech clients considering AI in 2026 face unique constraints (regulatory, operational, information-security) that generic consulting pages do not address. Second, our editor has documented real research experience in trading-domain ML on this very site — XAUUSD benchmark work, regime detection, LoRA fine-tuning of router models — and prospective fintech clients reasonably want to see how that real research translates to what we would offer as a service.

When we have real fintech engagements with client consent we will publish them separately. For now, this scenario illustrates thinking.

The scenario (illustrative)

A Thai fintech firm — 30 people, roughly equal split between tech, research/quant, and compliance. They manage assets and run proprietary strategies; exact AUM and strategy mix are confidential by definition. Their research team (5 people) spends a significant fraction of each week on tasks that are repetitive but non-trivial: summarising 50-page regulatory filings, cross-referencing news flow against current positions, drafting internal decision memos, translating English research into Thai client summaries. They have experimented with ChatGPT and Claude for exactly these tasks, and the output quality is good — but their CISO and compliance officer both put a stop to it the moment they realised “paste this 10-K into ChatGPT” was happening on confidential documents.

They reach out to explore a Local AI stack that gives the research team productivity without the privilege or data-exfiltration risk.

Why this matters for Thai fintech in 2026

Thai fintech firms operate under multiple overlapping regimes: Bank of Thailand and SEC Thailand regulations on investment activities, PDPA on customer data, and fiduciary duties to investors that no licence document explicitly spells out but which any compliance officer understands. The result is that cloud AI — where the vendor’s terms of service are controlled by someone outside your jurisdiction and your data is processed by someone outside your professional-standards framework — is frequently an unacceptable match for confidential workflows, even when the productivity gain is real.

Local AI is increasingly the only answer that satisfies both the operational and compliance layers. The question is execution: most AI consultants do not understand what a quant research workflow actually looks like, and most quant consultants do not understand what running Local AI at scale actually requires. The overlap is rare; fintech firms looking for this skill combination should expect a short list of providers.

The constraints we would work within

Zero data exfiltration: no prompt, document, position, or metadata leaves the firm’s network. Not to us, not to the model vendor, not to any telemetry endpoint.
Audit trail: every LLM query logged with user, time, input content (or a secure hash), retrieved documents, and output. Retention aligned with the firm’s regulator-facing data-retention policy.
Role-based access: a junior analyst cannot see senior strategy documents via the LLM by asking around the access control. Enforcement is at the retrieval layer, not in the prompt.
Separation from execution: the LLM never touches order systems, portfolio-management systems, or risk-limit systems. This is a boundary both for technical safety and for regulatory clarity about what the AI is and is not doing.
Model version pinning: same reason as in our clinic case. You do not want a reasoning assistant silently changing its behaviour across quarters.

What we would propose

Hardware: a 48 GB dual-GPU workstation. RTX 5090 (24 GB) + RTX 5080 (16 GB) or equivalent single-box configuration. 128 GB system RAM. 4 TB NVMe vault for the research corpus. Dual UPS. Secure server room with restricted access. Hardware capex 320,000-380,000 THB; elevated setup fee (80,000-120,000 THB) because we treat fintech security as a first-class requirement.

Software stack:

Qwen3-32B dense as the primary reasoning model. Dense over MoE here — same reason as the law firm case: legal and financial language is pedantic, and dense models outperformed MoE by 35-40 points on the hardest 10% of prompts in our internal benchmarks.
Qwen3-Next-80B on standby on the 48 GB pooled VRAM for escalation on the hardest long-form reasoning (complex macro analysis, multi-document synthesis, unusual cross-asset questions).
A RAG system over the firm’s internal research corpus (10k-100k documents: filings, research notes, internal memos, translated research).
nomic-embed-text or similar multilingual embedder for the vector index. Qdrant or Weaviate as the vector DB.
Role-based access control enforced at the SQL layer; the LLM sees only documents the user is authorized to see.
Optional LoRA fine-tuning on the firm’s internal tone and domain terminology, via the same DPO/LoRA workflow our editor has used on trading-domain ML work (documented in our internal research logs).
Full audit logging to write-once storage.

What the research team actually uses it for (illustrative examples; the firm defines the actual scope):

Regulatory filing summarisation in 30 seconds instead of 2 hours
Cross-checking news flow against current position context — “anything in today’s news that contradicts our thesis on position X?”
Drafting internal decision memos from bullet-point analyst notes
Thai translation of English research with domain terminology consistency
Internal Q&A over the firm’s accumulated research archive

What it explicitly does NOT do:

Generate trading signals for production execution
Make or approve investment decisions
Touch portfolio, order management, or risk systems
Interact with clients or counterparties on the firm’s behalf
Replace any human function that the firm’s regulatory licences require to be human-performed

Expected outcomes

Honest framing — describing typical impact, not a specific number we would promise:

Research velocity: 2-3x faster on routine summarisation and cross-referencing tasks. Analysts spend less time on mechanical work and more on judgement.
Consistency: internal memos and translations acquire a stable house style once the LLM is tuned to it, reducing back-and-forth editing.
Knowledge retention: historical research becomes queryable. “What did we think about this issuer 18 months ago?” goes from 45 minutes to 30 seconds.
Compliance posture: a clean, auditable boundary between AI-assisted research and human-driven decisions. The firm’s compliance officer can show regulators exactly what the AI does, what data it touches, and where the human accountability line sits.
What we would not claim: alpha generation, trading-signal uplift, or any outcome tied to market performance. Those depend on the firm’s analysts and strategies, not the tool.

Common objections

“We already have a quant team — why do we need this?” The quant team produces signals and runs models. The research team produces the context around those signals: regulatory read-throughs, news digestion, memo drafting. These are distinct workflows. The LLM serves the research workflow, not the quant workflow.

“Isn’t this just an expensive ChatGPT Enterprise?” ChatGPT Enterprise does not run on your hardware, does not submit to your compliance regime, and cannot be audited by your internal security team. Enterprise contracts manage legal risk; Local AI removes the source of risk.

“What about model bias on financial topics?” All LLMs have biases and training-data gaps. Our approach treats the LLM as a drafter, not an oracle: every output passes through a human analyst who has the context to catch errors. We also recommend pinning a specific model version and re-evaluating quarterly rather than accepting silent updates that might shift tone or output quality.

“Can we trust one solo researcher with something this sensitive?” Fair question, and worth asking of anyone. Our answer: we bring a documented public track record (KoishiAI itself, trading-domain ML research, benchmark methodology), we install everything open-source and document it handover-ready, and the firm owns every piece of the stack from day one. If the engagement ends, nothing breaks — the firm’s team, or a different vendor, takes over. Solo is lower risk than a big firm here because there are no sub-contractors, no offshore developers, and no agency handover games.

Who this pattern fits

Thai fintech firms, asset managers, and boutique prop shops with 10-50 people and serious data-confidentiality requirements
In-house research desks at banks and insurers needing AI productivity without regulator-visible risk
Hedge fund technology teams looking to augment analyst workflows without touching execution systems
Any firm that has rejected ChatGPT Enterprise on compliance grounds and needs the on-premise equivalent

Does not fit: retail-facing robo-advisors (different licensing regime entirely), firms that want AI to replace analyst judgment (we explicitly do not sell that), and firms unwilling to commit to the hardware ownership and internal ops discipline required.

How to engage

Given the sensitivity, we prefer to start with a short call under NDA before we see any data. Email editor name with a one-paragraph description of what you’re trying to solve — no specifics needed at this stage. We reply within one business day to schedule an NDA and a scoping call.

See the Local AI Setup package on our services page, and editorial standards for the boundaries of what we will and will not take on.