Alibaba's Best Model Is Now Closed, and the Open Weights Are Still Good

Qwen just split its model family in two: a closed API-only frontier tier and an open-weight workhorse tier. For teams that self-host, the practical question is whether the open tier is still worth it - and it is, for now.

For the past year, Qwen has been one of the clearest arguments that open-weight models could track the frontier. The Apache-licensed weights showed up on Hugging Face, you pulled them into Ollama, and you got something competitive enough to take seriously in production. That deal has quietly changed.

In May, Alibaba announced Qwen 3.7 Max - a 1M-context, reasoning-native flagship priced as a closed DashScope API with no open weights. The family now cleanly splits into open-weight workhorses (3.5 and 3.6, under Apache 2.0) and a closed-weight frontier model. As of this writing, no Qwen 3.7-27B or Qwen 3.7-35B-A3B repository exists under the official Qwen organization on Hugging Face. The 3.7 open weights have been announced as forthcoming. They haven't shipped.

That's the news. Here's what it means.


The practical tier that exists right now

If you're running a local coding setup today, the answer is still Qwen 3.6. Qwen 3.6-27B is a fully-dense 27B model - every parameter fires on every token - and it beats the 397B-A17B 3.5 flagship on key coding benchmarks: SWE-bench Verified 77.2 vs 76.2, Terminal-Bench 2.0 59.3 vs 52.5. It fits in 16.8 GB at Q4_K_M on a single consumer GPU, and it's the current "best local model for agentic coding" recommendation.

That's a striking result for teams who care about self-hosted inference: a 27B model beating a 397B model on agentic coding tasks, because the architecture is dense and the training was targeted. The lesson is that parameter count is a poor proxy for task performance.

The number on the model card matters less than what the model was trained to do.

For context on cost: API usage at $10-30 per million tokens for a team doing 100K queries per month runs $1,000-3,000 monthly. Break-even on local hardware is typically one to three months. That math gets sharper when you're running repetitive agent tasks - code review, diff summarization, ticket triage - where the prompts are short and the volume is high.


What the split reveals about the business

Alibaba's open-weight track record has been genuinely consistent. Qwen's cadence through 3.5 and 3.6 was extremely consistent - Alibaba ships open weights, the 3.7 open variants were officially announced, and that's a real commitment from a team that has historically kept this kind of promise.

But Alibaba is building out the closed tier aggressively. Three closed releases in under a month, each with its own price tier and revenue logic, is a serious commercial push, and there is a real incentive against releasing open weights that would substitute for the API.

This is the same tension every lab that goes open eventually faces. Meta felt it with Llama. Mistral felt it. The open tier stays open right up until the closed tier becomes a meaningful revenue line.

None of this means Qwen 3.7 open weights won't ship. They probably will. But the signal is worth reading plainly: the best model in the family is now behind an API, and the open tier is one product decision away from freezing.


What teams should actually do

For teams running local inference for privacy-sensitive work - internal knowledge search, code on proprietary repos, draft docs that shouldn't leave the building - healthcare teams under HIPAA, legal teams handling client matter data, and developers working with proprietary source code face a structural problem that contractual data processing agreements only partially address. Self-hosting eliminates the data transfer entirely - the model weights run on hardware the organization controls, with no outbound connections after the initial model download.

Qwen 3.6-27B fits that use case well today. It's capable enough for agentic coding tasks, it runs on a single GPU, and the Apache 2.0 license means you can modify and deploy it without restriction.

For the harder tasks - long-context reasoning, complex multi-step agent chains - a hybrid approach works well: run a local model for routine tasks like code review, summarization, and drafting, and call cloud APIs only for complex reasoning tasks. A teammate like Beagle can help route those requests automatically - sending the privacy-sensitive payload to the local endpoint and the hard reasoning problem to the frontier API - without the team having to manage that decision manually.

The broader point is this: the decision to self-host used to be mostly about cost and control. It's increasingly about not being surprised when a lab reprices, restricts, or retires the model you built around. Most startups should review their model stack monthly, or immediately after major pricing, context-window, or capability changes. Fast-moving vendors can shift the best choice quickly.

Qwen just shifted. The open tier is still worth running. But teams that treat it as a permanent free lunch are reading the situation wrong.