Skip to content
KoishiAI
ไทย
← Back to all articles

Case Study: Local RAG for a Thai Law Firm — Confidential Contract Review with Attorney-Client Privilege Intact

Illustrative scenario — how a Thai mid-size law firm could run AI-assisted contract review on confidential documents without hitting cloud APIs that would break privilege and PDPA.

AI-drafted from cited sources, fact-checked and reviewed by a human editor. How we work · Standards · Report an error
Illustration of a local RAG system for Thai law firm contract review

TL;DR: Illustrative scenario — a 20-lawyer Thai firm needs AI to review contracts faster but cannot legally use ChatGPT or Claude for privileged client documents. A local Qwen3-32B + vector-DB RAG on a 48 GB dual-GPU workstation (~320,000 THB hardware + ~60,000 THB setup) gives lawyers a ‘ask my contract library’ assistant that never exposes a single clause to a third party.

Key facts

  • This is an illustrative scenario, not a real client engagement. The firm described is hypothetical.
  • Attorney-client privilege + PDPA make cloud-API AI non-starter for confidential legal documents in Thailand.
  • Dense Qwen3-32B outperforms MoE variants on legal-language edge cases — worth the throughput trade-off for this use case.
  • Typical hardware: dual-GPU 48 GB workstation (RTX 5090 + 5080 = ~320,000 THB); setup 60,000-100,000 THB; 3-year TCO about 500,000-700,000 THB.
  • Matter-level access control is non-negotiable: retrieval filters by case/client ID before results reach the model.
  • Expected impact: contract review acceleration of roughly 2-3x on routine clause lookups; the model does not replace legal judgement.

Why this case study exists

Same transparency disclosure as our other case studies: this is not a real engagement. We wrote it because potential clients — law firms considering AI in 2026 — deserve to see how we would approach the unusually strict privacy requirements of legal work, before we have a roster of real law-firm engagements to point to. When we do have real cases with client consent we will mark them separately.

The legal-tech space is littered with cloud AI startups promising “contract review in seconds”. For a Thai firm bound by both the Thai Bar’s privilege rules and PDPA’s sensitive-data protections, those products are mostly unusable without a level of data-handling diligence that defeats the productivity gain.

The scenario (illustrative)

A mid-sized Bangkok law firm, twenty lawyers across corporate, M&A, and employment practice groups. The managing partner is under pressure from two sides: junior lawyers want AI tools (they use ChatGPT at home and want to use it at work); clients in regulated industries have started asking how the firm handles confidential documents when AI gets involved. The firm trialled two cloud-based legal AI platforms and rejected both — not because the tech was bad, but because their compliance lead could not get a straight answer on where document text was stored, how long, and by whom it could be accessed.

They reach out for an on-premise alternative that gives the productivity lift without the privilege risk.

Attorney-client privilege in Thailand is primarily contractual and ethics-based rather than codified the way it is in the US, but the practical effect is the same: documents shared under retainer are confidential, and transmitting them to a third party without client consent is a professional-ethics issue. PDPA adds another layer: much of the data inside legal documents (employment records, M&A due-diligence personal data, litigation filings) is personal or sensitive personal data, with the same cross-border-transfer restrictions we described in our clinic case study.

A cloud AI API call that includes contract text is, in the worst case, both a privilege breach and a PDPA violation. Local deployment removes both risks in one move: the text never leaves the firm’s network.

The constraints we would work within

  • Matter-level data isolation: a lawyer on Matter A must not be able to query across to Matter B. Retrieval filters enforce this, not prompts.
  • No data leaves the firm network — no cloud inference, no analytics telemetry, no third-party embedding model that phones home.
  • Thai + English contracts: the firm mixes both constantly. The stack must handle bilingual retrieval without requiring separate indexes.
  • Audit trail: every query logged with user, timestamp, matter, retrieved documents, and generated response — subject to retention policy aligned with the firm’s own data-retention rules.
  • Performance: a 5-10 second response time for a routine clause-lookup query is acceptable; 30 seconds is not.

What we would propose

Hardware: dual-GPU workstation. RTX 5090 (24 GB) for the primary model, RTX 5080 (16 GB) hosting the embedding model and vector DB in GPU memory for fast retrieval. Total pooled VRAM 40-48 GB. Hardware capex around 280,000-360,000 THB including chassis, 128 GB RAM, 4 TB NVMe for the document vault, dual UPS, and offsite backup drive.

Software stack:

  • Ollama serving Qwen3-32B dense as the primary inference model. We recommend dense over MoE here because legal language is pedantic and we saw on our internal brutal tier that dense 32B scores 90% vs. 50% for the original 30B-A3B MoE. A 10-40 point accuracy gap matters enormously when the downstream user is a lawyer making a judgement call.
  • nomic-embed-text (or equivalent multilingual embedder) for document indexing, loaded on the second GPU.
  • Qdrant or Weaviate as the vector database, with a Postgres metadata layer for case/client/privilege tagging.
  • A thin FastAPI middleware that enforces matter-level access control: every retrieval call must include the user’s authorized matter list, and the SQL-side filter strips disallowed results before any content reaches the LLM.
  • Web UI via Open WebUI or a custom React front-end with firm branding.

Workflow:

  1. Contracts ingested into the vault (PDF / DOCX / scanned → OCR as needed), chunked with overlap, embedded, indexed with matter/client/date metadata.
  2. A lawyer asks “find me indemnification clauses across our 2024 acquisition contracts that cap liability below 2x purchase price.”
  3. Access control checks the lawyer’s authorized matters.
  4. Retrieval ranks candidate clauses; top 15-20 passed to the LLM.
  5. LLM synthesises an answer with inline citations (contract name + page + clause number).
  6. Lawyer clicks any citation to see the original source in situ.
  7. Every step logged.

Why dense Qwen3-32B over Qwen3.6-35B-A3B: MoE’s routing is good at general chat, worse at sustained reasoning over technical text. Our probe 3 bench showed dense models outperforming MoE by 35-40 points on the hardest 10% of prompts. Legal contract review lives squarely in that 10%. We trade ~70% of the throughput for that accuracy gain — a 5-second response is still fast enough.

Expected outcomes

Honest framing again — describing what this pattern typically achieves, not a promise:

  • Clause lookup: “find me X kind of clause across Y matters” queries, which currently take a lawyer or paralegal 20-40 minutes of manual skimming, complete in under 30 seconds with inline citations.
  • Contract comparison: “how does this indemnification clause differ from our standard form?” produces a structured diff that a lawyer reviews in 2-3 minutes instead of 15.
  • Precedent search: “has the firm taken the position X in any prior matter?” — critical for consistency, often impossible with pre-AI tools.
  • What it does NOT do: replace legal judgement, write new contracts from scratch without human review, or serve as authority on jurisdictional law without primary-source verification. Every output is presented as input for a lawyer, not a substitute for one.
  • Productivity uplift: we have seen published ranges of 2-3x on routine contract-review tasks in similar deployments elsewhere. We would not promise a specific number to a client.

Common objections

“What if the model hallucinates a clause?” The RAG retrieval step is the safeguard. Every factual statement in the model’s answer must be grounded in a retrieved passage, and every citation links back to the source document. A lawyer clicking the citation sees whether the passage actually says what the model claimed. Hallucinations become traceable in seconds.

“Our IT team doesn’t speak machine-learning.” They do not need to. We install, document, and train on maintenance basics (storage, backups, user management). For model-level questions or tuning, a quarterly support retainer covers it remotely.

“The firm partnership won’t approve capital spend like this.” Compared to one junior lawyer’s salary at a mid-sized Bangkok firm (1-2M THB/year including benefits and overhead), the total 3-year TCO (500-700k THB) pays for itself if it saves even 20% of one junior’s billable hours. The calculation is usually more compelling than partners initially expect.

Who this pattern fits

  • Law firms with 10-50 lawyers and document volumes in the low tens of thousands
  • Boutique firms with regulated-industry clients (financial services, healthcare) where cloud AI is a non-starter
  • In-house legal teams at larger companies with similar privilege/IP-sensitivity concerns

Does not fit: very small firms (cost does not pencil out below 5-8 lawyers), pure litigation practices where the corpus is too varied to benefit from systematic indexing, or firms unwilling to commit to owning the infrastructure.

How to engage

Like our other case studies, the next step is a free 30-minute scoping call. Email editor name with a brief description of your firm’s setup and what you’d want the system to help with. We respond within one business day.

See our services page for standard packages, and editorial standards for what we will and will not take on.

Frequently asked questions

Is this a real client engagement?
No. Like our other case studies at this stage, this is an illustrative scenario that shows our thinking on a common legal-tech problem. The firm described does not exist. When we publish real engagements with client consent, we will mark them clearly.
What is RAG and why does a law firm need it over plain chat?
RAG (retrieval-augmented generation) lets the model answer questions by first searching a private document library, then generating an answer grounded in the retrieved passages. For a law firm this means 'ask a question about any of our 10,000 contracts and get an answer with citations to specific clauses' — which plain chat cannot do and cloud AI cannot do on confidential material without breaking privilege.
Does 48 GB of VRAM really matter over 24 GB?
For a law firm, yes. You want to run the dense Qwen3-32B at high quality for contract analysis (legal language is pedantic and the MoE variant makes too many errors on edge cases), plus hold the vector DB and a reasonable context window. 48 GB comfortably fits both with room to grow. 24 GB works as a starting point but you will outgrow it within a year.
What about non-English/non-Thai contracts?
Qwen3 handles Chinese, Japanese, and Southeast Asian languages reasonably well. For contracts in languages the model is less strong at (older continental European variants, specialised Arabic), we add a human-verified translation step before indexing rather than trusting the model with an un-audited translation.
How do you prevent the model from leaking across cases?
Matter-level access control. Each document is indexed with a case/client ID; the retrieval step filters by the user's authorized cases before passing anything to the model. A paralegal on case A cannot query case B, and the model cannot accidentally pull in documents from other matters. This is enforced at the DB layer, not at the prompt layer.