When Meta released LLaMA as “open source” in February 2023, the AI community celebrated. Finally, the democratization of AI we’d been promised. No more gatekeeping by OpenAI and Google. Anyone could now build, modify, and deploy state-of-the-art language models.
Except that’s not what happened. A year and a half later, the concentration of AI power hasn’t decreased—it’s just shifted. The models are “open,” but the ability to actually use them remains locked behind the same economic barriers that closed models had. We traded one form of gatekeeping for another, more insidious one.

The Promise vs. The Reality
The open source AI narrative goes something like this: releasing model weights levels the playing field. Small startups can compete with tech giants. Researchers in developing countries can access cutting-edge technology. Independent developers can build without permission. Power gets distributed.
But look at who’s actually deploying these “open” models at scale. It’s the same handful of well-funded companies and research institutions that dominated before. The illusion of access masks the reality of a new kind of concentration—one that’s harder to see and therefore harder to challenge.
The Compute Barrier
Running Base Models
LLaMA-2 70B requires approximately 140GB of VRAM just to load into memory. A single NVIDIA A100 GPU (80GB) costs around $10,000 and you need at least two for inference. That’s $20,000 in hardware before you serve a single request.
Most developers can’t afford this. So they turn to cloud providers. AWS charges roughly $4-5 per hour for an instance with 8x A100 GPUs. Running 24/7 costs over $35,000 per month. For a single model. Before any users.
Compare this to GPT-4’s API: $0.03 per 1,000 tokens. You can build an application serving thousands of users for hundreds of dollars. The “closed” model is more economically accessible than the “open” one for anyone without serious capital.
The Quantization Trap
“Just quantize it,” they say. Run it on consumer hardware. And yes, you can compress LLaMA-2 70B down to 4-bit precision and squeeze it onto a high-end gaming PC with 48GB of RAM. But now your inference speed is 2-3 tokens per second. GPT-4 through the API serves 40-60 tokens per second.
You’ve traded capability for access. The model runs, but it’s unusable for real applications. Your users won’t wait 30 seconds for a response. So you either scale up to expensive infrastructure or accept that your “open source” model is a toy.
The Fine-Tuning Fortress
Training Costs
Base models are rarely production-ready. They need fine-tuning for specific tasks. Full fine-tuning of LLaMA-2 70B for a specialized domain costs $50,000-$100,000 in compute. That’s training for maybe a week on 32-64 GPUs.
LoRA and other parameter-efficient methods reduce this, but you still need $5,000-$10,000 for serious fine-tuning. OpenAI’s fine-tuning API? $8 per million tokens for training, then standard inference pricing. For most use cases, it’s an order of magnitude cheaper than self-hosting an open model.
Data Moats
But money is only part of the barrier. Fine-tuning requires high-quality training data. Thousands of examples, carefully curated, often hand-labeled. Building this dataset costs more than the compute—you need domain experts, data labelers, quality control infrastructure.
Large companies already have this data from their existing products. Startups don’t. The open weights are theoretically available to everyone, but the data needed to make them useful is concentrated in the same hands that controlled closed models.
Who Actually Benefits
Cloud Providers
Amazon, Microsoft, and Google are the real winners of open source AI. Every developer who can’t afford hardware becomes a cloud customer. AWS now offers “SageMaker JumpStart” with pre-configured LLaMA deployments. Microsoft has “Azure ML” with one-click open model hosting. They’ve turned the open source movement into a customer acquisition funnel.
The more compute-intensive open models become, the more revenue flows to cloud providers. They don’t need to own the models—they own the infrastructure required to run them. It’s a better business model than building proprietary AI because they capture value from everyone’s models.
Well-Funded Startups
Companies that raised $10M+ can afford to fine-tune and deploy open models. They get the benefits of customization without the transparency costs of closed APIs. Your fine-tuned LLaMA doesn’t send data to OpenAI for training. This is valuable.
But this creates a new divide. Funded startups can compete using open models. Bootstrapped founders can’t. The barrier isn’t access to weights anymore—it’s access to capital. We’ve replaced technical gatekeeping with economic gatekeeping.
Research Institutions
Universities with GPU clusters benefit enormously. They can experiment, publish papers, train students. This is genuinely valuable for advancing the field. But it doesn’t democratize AI deployment—it democratizes AI research. Those are different things.
A researcher at Stanford can fine-tune LLaMA and publish results. A developer in Lagos trying to build a business cannot. The knowledge diffuses, but the economic power doesn’t.
The Developer Experience Gap
OpenAI’s API takes 10 minutes to integrate. Three lines of code and you’re generating text. LLaMA requires setting up infrastructure, managing deployments, monitoring GPU utilization, handling model updates, implementing rate limiting, building evaluation pipelines. It’s weeks of engineering work before you write your first application line.
Yes, there are platforms like Hugging Face Inference Endpoints and Replicate that simplify this. But now you’re paying them instead of OpenAI, often at comparable prices. The “open” model stopped being open the moment you need it to actually work.
The Regulatory Capture
Here’s where it gets really interesting. As governments start regulating AI, compute requirements become a regulatory moat. The EU AI Act, for instance, has different tiers based on model capabilities and risk. High-risk models face stringent requirements.
Who can afford compliance infrastructure? Companies with capital. Who benefits from regulations that require extensive testing, monitoring, and safety measures? Companies that can amortize these costs across large user bases. Open source was supposed to prevent regulatory capture, but compute requirements ensure it anyway.
We might end up with a future where model weights are technically open, but only licensed entities can legally deploy them at scale. Same outcome as closed models, just with extra steps.
The Geographic Divide
NVIDIA GPUs are concentrated in North America, Europe, and parts of Asia. A developer in San Francisco can buy or rent A100s easily. A developer in Nairobi faces import restrictions, limited cloud availability, and 3-5x markup on hardware.
Open source was supposed to help developers in emerging markets. Instead, it created a new form of digital colonialism: we’ll give you the recipe, but the kitchen costs $100,000. The weights are free, but the compute isn’t. Same power concentration, new mechanism.
The Environmental Cost
Every startup running its own LLaMA instance is replicating infrastructure that could be shared. If a thousand companies each deploy their own 70B model, that’s thousands of GPUs running 24/7 instead of one shared cluster serving everyone through an API.
Ironically, centralized APIs are more energy-efficient. OpenAI’s shared infrastructure has better utilization than thousands of individually deployed models. We’re burning extra carbon for the ideology of openness without achieving actual decentralization.
What Real Democratization Would Look Like
If we’re serious about democratizing AI, we need to address the compute bottleneck directly.
Public compute infrastructure. Government-funded GPU clusters accessible to researchers and small businesses. Like public libraries for AI. The EU could build this for a fraction of what they’re spending on AI regulation.
Efficient model architectures. Research into models that actually run on consumer hardware without quality degradation. We’ve been scaling up compute instead of optimizing efficiency. The incentives are wrong—bigger models generate more cloud revenue.
Federated fine-tuning. Techniques that let multiple parties contribute to fine-tuning without centralizing compute or data. This is technically possible but underdeveloped because it doesn’t serve cloud providers’ interests.
Compute co-ops. Developer collectives that pool resources to share inference clusters. Like how small farmers form cooperatives to share expensive equipment. This exists in limited forms but needs better tooling and organization.
Transparent pricing. If you’re charging for “open source” model hosting, you’re not democratizing—you’re arbitraging. True democratization means commodity pricing on inference, not vendor lock-in disguised as open source.
The Uncomfortable Truth
Open source AI benefits the same people that closed AI benefits, just through different mechanisms. It’s better for researchers and well-funded companies. It’s not better for individual developers, small businesses in emerging markets, or people without access to capital.
We convinced ourselves that releasing weights was democratization. It’s not. It’s shifting the bottleneck from model access to compute access. For most developers, that’s a distinction without a difference.
The original sin isn’t releasing open models—that’s genuinely valuable. The sin is calling it democratization while ignoring the economic barriers that matter more than technical ones. We’re building cathedrals and wondering why only the wealthy enter, forgetting that doors without ramps aren’t really open.
Real democratization would mean a developer in any country can fine-tune and deploy a state-of-the-art model for $100 and an afternoon of work. We’re nowhere close. Until we address that, open source AI remains an aspiration, not a reality.
The weights are open. The power isn’t.