Newsletter

Stop Wasting Money on Proprietary Models

Open-weight models beat proprietary models on cost. And they run on your GPU. June 2026.

                          Pragmatismo — Practical AI. Real results.  •  June 2026
                        

pragmatismo.com.br

Stop Paying
for Proprietary Models.

Open-weight models beat proprietary models on cost. And they run on your GPU.

Qwen 3.6 (Apache 2.0) runs on an RTX 3060 with 12GB and scores 73.4% on SWE-bench Verified — versus 80.8% for proprietary frontier models at US$ 75/million output tokens. DeepSeek V4 Flash (MIT) surpasses 79% and runs on 2× RTX 3060s. GPT-OSS-120B (Apache 2.0) scores 62.4% and runs in the cloud for pennies. The math doesn't work for proprietary models.

Real benchmark — SWE-bench Verified 2026

Code that works. Or it doesn't.

500 real GitHub issues. The model must understand the bug, write the patch, and pass the tests. No shortcuts. Source: swebench.com.

DeepSeek V4 Flash

79%

Claude Opus 4.6

80.8%

Qwen 3.6-27B (dense)

77.2%

Qwen 3.6-35B-A3B ⚡️ RTX 3060

73.4%

GPT-OSS-120B

62.4%*

Data: swebench.com. Qwen 3.6-35B-A3B runs on RTX 3060 12GB with quantization (Apache 2.0). DeepSeek V4 Flash (MIT) requires 2× RTX 3060. Proprietary frontier models cost US$ 75/1M output tokens. *GPT-OSS-120B: 62.4% on SWE-bench Verified (marc0.dev), Apache 2.0, 5.1B active of 120B MoE, runs in cloud. June 2026.

Real cost

Frontier performance. Pocket change.

The cost gap between proprietary and open models is abysmal. And with your own GPU, token cost is zero.

Model

Input/1M

Output/1M

SWE-Ver.

Runs on

Proprietary Frontier

US$ 15

US$ 75

80.8%

API only

DeepSeek V4 Flash (API)

US$ 0.14

US$ 0.28

79%

2× RTX 3060

Qwen 3.6-27B (API)

~US$ 0.50

~US$ 2

77.2%

RTX 4090

Qwen 3.6-35B-A3B

US$ 0

73.4%

RTX 3060

DeepSeek V4 Flash

US$ 0

79%

2× RTX 3060

Sources: deepseek.com, qwen.alibaba.com, swebench.com, openai.com/gpt-oss. Official API prices June/2026. Qwen 3.6-35B-A3B on RTX 3060, DeepSeek V4 Flash on 2× RTX 3060, GPT-OSS-120B on 1× H100.

“While you pay US$ 75 per million output tokens for proprietary APIs, your competitor runs Qwen 3.6 on a 12GB RTX 3060 — for free, without sending any data to anyone.”

— The math that doesn't add up. June 2026.

Your scenario

Four paths. One is yours.

From the simplest to the most sovereign. ALL models below are open weight (Apache 2.0 or MIT).

EXIT 01 — API SWAP

DeepSeek V4 Flash or Qwen 3.6 via API

Swap endpoints, zero code changes. OpenAI-compatible API. Cost 50-250x lower than proprietary models, equivalent coding performance. Results in days. DeepSeek V4 Flash: US$ 0.28/1M output. Qwen 3.6: ~US$ 2/1M output.

EXIT 02 — CLOUD GPU

GPT-OSS-120B, Qwen 3.6 or DeepSeek V4 Flash in the cloud

Rent a GPU (H100, A100) on AWS, Azure, RunPod or Spheron. Run vLLM with OpenAI-compatible API. GPT-OSS-120B (5.1B active, 120B MoE) fits on 1× H100. DeepSeek V4 Flash 2× H100. Full data control. Predictable cost. Used in production by General Bots.

EXIT 03 — ON-PREMISE (YOUR GPU)

Qwen 3.6-35B-A3B on RTX 3060 (12GB)

Qwen 3.6-35B-A3B: activates only 3B parameters per token (MoE). With 4-bit quantization, it fits in 4-6GB of VRAM. Runs on your RTX 3060 with 12GB. 73.4% on SWE-bench Verified. DeepSeek V4 Flash (284B MoE, 13B active): 2× RTX 3060. 79% on SWE-bench Verified, 1M context. Inference cost: ZERO. LGPD/GDPR compliance automatic — data never leaves your machine.

EXIT 04 — LAST RESORT

Legacy API — if you really have no alternative

If you don't have a GPU, can't use the cloud, and don't want to switch APIs, legacy API access is still cheaper than frontier. But it's plan Z. Start with any of the three exits above.

REAL READING (NO BULLSHIT)

pragmatismo.com.br

Carl vs Wilson — Two Teenagers, Two AI Philosophies

A deep analysis of how your AI stack choice can define the future of your company. Which side do you choose?

pragmatismo.com.br

The LLM Boom is Over: Enter the Era of Industrial Orchestration

Why the hype cycle is settling and what matters now for sustainable AI — cost, control, and real results.

pragmatismo.com.br

Escape from BigTech

TCO comparison: open source saves up to 87.5% over 5 years vs proprietary stacks. The numbers of freedom.

Full blog: Open source, LLMs & real strategy

Visit → pragmatismo.com.br/blog. No paywall, no empty promises.