Is this a real client engagement?

No. This is an illustrative scenario written to show the thinking process we apply when designing a PDPA-compliant local AI setup. The clinic described here does not exist. Numbers, constraints, and outcomes are plausible but not from a real engagement.

Why can't a Thai clinic use ChatGPT or Google Gemini for triage?

Patient symptom data is sensitive personal data under Thailand's PDPA B.E. 2562. Sending it to a cloud API based in the US or Europe constitutes a cross-border transfer that requires explicit patient consent, a legal basis, and demonstrated adequate protection. Most clinics cannot meet these conditions for routine use, and consumer-grade chat products do not sign the required data-processing agreements.

Can a 24 GB GPU really handle a whole clinic's traffic?

For a clinic of up to 50 concurrent users — which is generous for a 5-doctor practice — yes. Qwen3.6-35B-A3B runs at roughly 138 tokens per second on a single 24 GB card, enough to serve a steady stream of short triage dialogues. Busier operations need the heavy tier (48 GB) or a second box for load balancing.

What's the total cost over 3 years?

Hardware capex around 180,000-250,000 THB for a 24 GB workstation with backup UPS; one-time setup 30,000-60,000 THB; optional quarterly support retainer 5,000-10,000 THB. Total 3-year TCO roughly 300,000-400,000 THB — versus cloud-API calls which at even 10 requests per doctor per day would exceed that within 18 months and still leave you non-compliant.

What about model updates?

We pin a specific model version at install time. Updates are opt-in and reviewed quarterly — you do not want your triage assistant silently changing its behaviour after a vendor push. This is one of the reasons local deployment wins for regulated work: you control the version.

Case Study: Local AI Triage Chatbot for a Thai Clinic Under PDPA

TL;DR: Illustrative scenario — a small Thai clinic wants an AI triage chatbot but cannot legally send patient symptom data to cloud APIs under PDPA. A local Qwen3.6-35B deployment on a single 24 GB GPU (~180,000 THB hardware + ~30,000 THB setup) keeps every word of patient interaction inside the clinic’s LAN, while still producing a working triage assistant in Thai and English.

Key facts

This is an illustrative scenario, not a real client engagement. The clinic described is hypothetical.
PDPA Thailand classifies patient symptom data as sensitive personal data — cloud cross-border transfer requires explicit consent and an adequate-protection basis that consumer AI APIs cannot provide.
Single-workstation Local AI setup: 24 GB GPU + Qwen3.6-35B-A3B runs at ~138 tokens/sec, comfortable for a 5-doctor clinic.
One-time hardware + setup: approximately 210,000-310,000 THB; ongoing costs minimal (electricity ~1,000 THB/month + optional quarterly support).
Compared to an equivalent cloud-API deployment: roughly 50-70% lower 3-year total cost AND legally defensible under PDPA.
Recommended model ladder: Qwen3.6-35B-A3B default; fallback to dense Qwen3-32B for harder medical-terminology prompts.

Why this case study exists

We committed to editorial transparency, so let us state it plainly up front: this is not a real client engagement. No Thai clinic hired us. We wrote this piece as an illustrative case study — a way to show you how we would think about a common PDPA-sensitive scenario — before we have a roster of real engagements to reference. When we do have real case studies with client consent, we will mark them clearly and publish them separately.

Why bother writing a fictional case? Because “what does it actually look like to deploy local AI for healthcare in Thailand?” is a question that potential clients reasonably want answered before hiring anyone. A vague capability list on a consulting page does not answer it. Walking through a concrete scenario does — even when the scenario is composite rather than historical.

The scenario (illustrative)

A multi-specialty clinic in Bangkok, five doctors, about 60 patients a day, has been asked by management to explore AI-assisted triage — an intake chatbot that asks patients structured questions about their symptoms, family history, and current medications before they reach a nurse. The clinic’s IT lead attended an AI vendor conference and came back with quotes from three cloud-based platforms. Their compliance officer rejected all three after a 20-minute conversation with the clinic’s lawyer. The sticking point: every quote involved symptom data leaving Thailand, which under PDPA requires either explicit per-patient consent for cross-border transfer or a demonstrated adequate-protection arrangement. Neither was practical for routine use.

The clinic approaches us looking for an on-premise alternative.

Why this matters for Thai healthcare in 2026

Patient data in Thailand is regulated under the Personal Data Protection Act B.E. 2562, with Section 26 categorizing health data as sensitive personal data. Section 28 restricts cross-border transfers to jurisdictions with “adequate” protection — a standard that neither the US nor most APAC jurisdictions have been formally certified as meeting. In practice, healthcare organizations that route data through OpenAI, Anthropic, or Google’s cloud APIs are operating in a legal grey zone whose cost surfaces only when something goes wrong (a breach, a regulator inquiry, a patient complaint).

Local deployment sidesteps the entire transfer question. If the model and the data both live on a workstation in the clinic’s server room, no cross-border transfer occurs, so no consent or adequate-protection analysis is required for that flow.

The constraints we would work within

Data residency: no patient utterance, symptom description, or chat transcript may leave the clinic’s LAN.
Budget ceiling: small clinic, operating capex usually 200,000-300,000 THB for a significant IT project.
Languages: patients speak Thai primarily; some expats speak English. The assistant must handle both without a round-trip through a translation service.
Latency: a patient expects an acknowledgement within ~3 seconds of pressing send. Triage conversations have 5-10 turns, so total interaction ≤ 90 seconds.
Operational reality: clinic IT staff is typically one person who maintains everything else too. Whatever we install has to be stable without weekly maintenance.

What we would propose

Hardware: a single workstation with a consumer-grade 24 GB GPU (RTX 5090 or equivalent), 64 GB system RAM, 2 TB NVMe storage, on a UPS, in the clinic’s existing server closet. Total hardware capex around 180,000-250,000 THB depending on chassis and support contract.

Software stack:

Ollama serving Qwen3.6-35B-A3B-Q4 as the default, with Qwen3-32B dense on standby for the ~10% of prompts that Qwen3.6 baseline handles poorly.
Open WebUI as the patient-facing chat front-end (rebranded with clinic styling).
A thin Python wrapper that enforces the triage flow — structured questions with validation, no free-form deviation — rather than handing the model a blank canvas.
Encrypted local PostgreSQL for chat transcripts (patient ID, timestamp, triage outcome) with 90-day retention aligned to clinic policy.
Tailscale or equivalent for remote administration without exposing anything to the public internet.

Model constraints: a system prompt that forbids medical diagnosis, always recommends seeing the doctor, and routes any red-flag symptoms (chest pain, neurological signs, severe bleeding) to an immediate human escalation path. This is a triage assistant, not a diagnostic one.

Why Qwen3.6 over Qwen3-32B as default: speed. At 138 tokens per second on the 35B-A3B MoE, a typical 120-token triage response streams to the patient in under a second. The dense 32B is more accurate on edge cases but runs at roughly half the throughput — fine as a fallback, not as the default.

Expected outcomes

Honest framing: we describe what this kind of setup typically delivers, not a specific number we guarantee.

Intake time savings: patients complete structured symptom intake before the nurse sees them, typically shaving 10-15 minutes off the average consultation flow.
Doctor’s first-screen view: a structured summary of patient-reported symptoms, current medications, and flag-words, ready for the doctor when they enter the room.
Compliance posture: legally defensible under PDPA without per-patient cross-border consent; the clinic’s compliance officer can point to a specific data-flow diagram and say “nothing leaves the building”.
Reliability: uptime expectations on par with any other clinic-server deployment (i.e. 99% with the usual Thai-summer power-outage caveats; hence the UPS).

Common objections

“What if the model gives dangerous medical advice?” We do not let it. The system prompt forbids diagnosis and directs red-flag cases to immediate human escalation. We also run a pre-deployment adversarial test with a pharmacist reviewer: several hundred red-flag prompts that the model must refuse to diagnose. If any fail, we tune the prompt and retest until they pass.

“We don’t have IT staff for this.” The clinic’s existing IT lead handles exactly one new thing: the monthly health check (CPU temperature, disk space, model version sanity). Everything else is logged to a dashboard that we review on the quarterly support call. If a model update is needed, we drive it remotely with the clinic’s approval.

“What happens when you stop being available?” Everything we install is open-weight and open-source. The clinic owns the hardware, the model weights, and the code. Another vendor or internal person can take over; we document clearly enough for that to be possible.

Who this pattern fits

Specialty clinics, dental practices, and outpatient centres with 3-15 doctors and fewer than 100 patients per day
Law firms needing the same model pattern for confidential contract review (different front-end, same infrastructure)
Accounting offices managing tax-document queries under PDPA constraints

It does not fit: hospitals with multi-thousand-employee traffic (needs real load balancing), national-chain clinics (needs centralized policy management), and anyone who cannot commit to owning the hardware long-term.

How to engage

If you are considering something like this for a real organization, the next step is a free 30-minute scoping call to determine whether the pattern fits. Email editor name with a brief description of your use case. We respond within one business day.

Our services page has the three standard packages; bespoke scope is possible. Our editorial standards govern what we will and will not take on — worth a read before inquiring.