Case Study: Local AI Research Infrastructure for a Thai Fintech — Confidential Signals Without Cloud Leak
Illustrative scenario — how a Thai fintech firm could run AI-assisted market research and internal reasoning on confidential positions without ever sending a single data point to a cloud LLM.
TL;DR: Illustrative scenario — a Thai fintech or asset manager cannot legally or prudently run AI analysis on its proprietary positions via cloud APIs. A Local AI stack (Qwen3-32B + dedicated ML fine-tuning on 48 GB dual-GPU, ~380,000 THB hardware + 100,000 THB setup) gives their quant/research team a confidential reasoning assistant for internal use. Crucially: this is research infrastructure, not a licensed financial service — it augments human analysts, not replaces them.
Key facts
- This is an illustrative scenario, not a real client engagement. The firm described is hypothetical.
- Not a regulated financial service: we build research infrastructure. Licensed personnel at the client remain responsible for investment decisions.
- Typical hardware: 48 GB dual-GPU workstation (~320,000-380,000 THB); dedicated setup 80,000-120,000 THB to handle extra security requirements for sensitive financial data.
- Stack: Qwen3-32B dense as the reasoning workhorse, Qwen3-Next-80B on standby for the hardest prompts, full Local RAG over the firm’s proprietary research library.
- Optional: LoRA / DPO fine-tuning of smaller models on the firm’s internal style and domain terminology — this is where our prior trading-ML research work is directly relevant.
- Expected output: 2-3x faster turnaround on routine research tasks (regulatory summaries, news digestion, memo drafting); zero data leaves the firm’s network.
Why this case study exists
Transparency disclosure: this is not a real engagement. No Thai fintech has hired us for exactly this pattern. We wrote it for two reasons. First, fintech clients considering AI in 2026 face unique constraints (regulatory, operational, information-security) that generic consulting pages do not address. Second, our editor has documented real research experience in trading-domain ML on this very site — XAUUSD benchmark work, regime detection, LoRA fine-tuning of router models — and prospective fintech clients reasonably want to see how that real research translates to what we would offer as a service.
When we have real fintech engagements with client consent we will publish them separately. For now, this scenario illustrates thinking.
The scenario (illustrative)
A Thai fintech firm — 30 people, roughly equal split between tech, research/quant, and compliance. They manage assets and run proprietary strategies; exact AUM and strategy mix are confidential by definition. Their research team (5 people) spends a significant fraction of each week on tasks that are repetitive but non-trivial: summarising 50-page regulatory filings, cross-referencing news flow against current positions, drafting internal decision memos, translating English research into Thai client summaries. They have experimented with ChatGPT and Claude for exactly these tasks, and the output quality is good — but their CISO and compliance officer both put a stop to it the moment they realised “paste this 10-K into ChatGPT” was happening on confidential documents.
They reach out to explore a Local AI stack that gives the research team productivity without the privilege or data-exfiltration risk.
Why this matters for Thai fintech in 2026
Thai fintech firms operate under multiple overlapping regimes: Bank of Thailand and SEC Thailand regulations on investment activities, PDPA on customer data, and fiduciary duties to investors that no licence document explicitly spells out but which any compliance officer understands. The result is that cloud AI — where the vendor’s terms of service are controlled by someone outside your jurisdiction and your data is processed by someone outside your professional-standards framework — is frequently an unacceptable match for confidential workflows, even when the productivity gain is real.
Local AI is increasingly the only answer that satisfies both the operational and compliance layers. The question is execution: most AI consultants do not understand what a quant research workflow actually looks like, and most quant consultants do not understand what running Local AI at scale actually requires. The overlap is rare; fintech firms looking for this skill combination should expect a short list of providers.
The constraints we would work within
- Zero data exfiltration: no prompt, document, position, or metadata leaves the firm’s network. Not to us, not to the model vendor, not to any telemetry endpoint.
- Audit trail: every LLM query logged with user, time, input content (or a secure hash), retrieved documents, and output. Retention aligned with the firm’s regulator-facing data-retention policy.
- Role-based access: a junior analyst cannot see senior strategy documents via the LLM by asking around the access control. Enforcement is at the retrieval layer, not in the prompt.
- Separation from execution: the LLM never touches order systems, portfolio-management systems, or risk-limit systems. This is a boundary both for technical safety and for regulatory clarity about what the AI is and is not doing.
- Model version pinning: same reason as in our clinic case. You do not want a reasoning assistant silently changing its behaviour across quarters.
What we would propose
Hardware: a 48 GB dual-GPU workstation. RTX 5090 (24 GB) + RTX 5080 (16 GB) or equivalent single-box configuration. 128 GB system RAM. 4 TB NVMe vault for the research corpus. Dual UPS. Secure server room with restricted access. Hardware capex 320,000-380,000 THB; elevated setup fee (80,000-120,000 THB) because we treat fintech security as a first-class requirement.
Software stack:
- Qwen3-32B dense as the primary reasoning model. Dense over MoE here — same reason as the law firm case: legal and financial language is pedantic, and dense models outperformed MoE by 35-40 points on the hardest 10% of prompts in our internal benchmarks.
- Qwen3-Next-80B on standby on the 48 GB pooled VRAM for escalation on the hardest long-form reasoning (complex macro analysis, multi-document synthesis, unusual cross-asset questions).
- A RAG system over the firm’s internal research corpus (10k-100k documents: filings, research notes, internal memos, translated research).
nomic-embed-textor similar multilingual embedder for the vector index. Qdrant or Weaviate as the vector DB.- Role-based access control enforced at the SQL layer; the LLM sees only documents the user is authorized to see.
- Optional LoRA fine-tuning on the firm’s internal tone and domain terminology, via the same DPO/LoRA workflow our editor has used on trading-domain ML work (documented in our internal research logs).
- Full audit logging to write-once storage.
What the research team actually uses it for (illustrative examples; the firm defines the actual scope):
- Regulatory filing summarisation in 30 seconds instead of 2 hours
- Cross-checking news flow against current position context — “anything in today’s news that contradicts our thesis on position X?”
- Drafting internal decision memos from bullet-point analyst notes
- Thai translation of English research with domain terminology consistency
- Internal Q&A over the firm’s accumulated research archive
What it explicitly does NOT do:
- Generate trading signals for production execution
- Make or approve investment decisions
- Touch portfolio, order management, or risk systems
- Interact with clients or counterparties on the firm’s behalf
- Replace any human function that the firm’s regulatory licences require to be human-performed
Expected outcomes
Honest framing — describing typical impact, not a specific number we would promise:
- Research velocity: 2-3x faster on routine summarisation and cross-referencing tasks. Analysts spend less time on mechanical work and more on judgement.
- Consistency: internal memos and translations acquire a stable house style once the LLM is tuned to it, reducing back-and-forth editing.
- Knowledge retention: historical research becomes queryable. “What did we think about this issuer 18 months ago?” goes from 45 minutes to 30 seconds.
- Compliance posture: a clean, auditable boundary between AI-assisted research and human-driven decisions. The firm’s compliance officer can show regulators exactly what the AI does, what data it touches, and where the human accountability line sits.
- What we would not claim: alpha generation, trading-signal uplift, or any outcome tied to market performance. Those depend on the firm’s analysts and strategies, not the tool.
Common objections
“We already have a quant team — why do we need this?” The quant team produces signals and runs models. The research team produces the context around those signals: regulatory read-throughs, news digestion, memo drafting. These are distinct workflows. The LLM serves the research workflow, not the quant workflow.
“Isn’t this just an expensive ChatGPT Enterprise?” ChatGPT Enterprise does not run on your hardware, does not submit to your compliance regime, and cannot be audited by your internal security team. Enterprise contracts manage legal risk; Local AI removes the source of risk.
“What about model bias on financial topics?” All LLMs have biases and training-data gaps. Our approach treats the LLM as a drafter, not an oracle: every output passes through a human analyst who has the context to catch errors. We also recommend pinning a specific model version and re-evaluating quarterly rather than accepting silent updates that might shift tone or output quality.
“Can we trust one solo researcher with something this sensitive?” Fair question, and worth asking of anyone. Our answer: we bring a documented public track record (KoishiAI itself, trading-domain ML research, benchmark methodology), we install everything open-source and document it handover-ready, and the firm owns every piece of the stack from day one. If the engagement ends, nothing breaks — the firm’s team, or a different vendor, takes over. Solo is lower risk than a big firm here because there are no sub-contractors, no offshore developers, and no agency handover games.
Who this pattern fits
- Thai fintech firms, asset managers, and boutique prop shops with 10-50 people and serious data-confidentiality requirements
- In-house research desks at banks and insurers needing AI productivity without regulator-visible risk
- Hedge fund technology teams looking to augment analyst workflows without touching execution systems
- Any firm that has rejected ChatGPT Enterprise on compliance grounds and needs the on-premise equivalent
Does not fit: retail-facing robo-advisors (different licensing regime entirely), firms that want AI to replace analyst judgment (we explicitly do not sell that), and firms unwilling to commit to the hardware ownership and internal ops discipline required.
How to engage
Given the sensitivity, we prefer to start with a short call under NDA before we see any data. Email editor name with a one-paragraph description of what you’re trying to solve — no specifics needed at this stage. We reply within one business day to schedule an NDA and a scoping call.
See the Local AI Setup package on our services page, and editorial standards for the boundaries of what we will and will not take on.