Why Distributed AI Matters, for Everyone
If you are old enough to remember the early 2000s, the current AI gold rush should feel quite familiar. Back then I was a computer enthusiast teenager who was skeptical of the cloud takeover of processing and storage. Nonetheless, replacing "GPU clusters" with "server farms" and "AI infrastructure" with "cross-continent fiber optics," we are essentially within the same dot-com boom. That being said, the AI boom has learned its lessons and comes with much better marketing.
Early 2000s, companies were putting billions into infrastructure, and for those of us following history, we would know the bubble finally burst in 2000. For those infrastructure investments, initially 90% of the fibers laid remained unused. The infrastructure became a commodity and the value of many companies dropped to a fraction of what they were just a few months prior.
I spent my whole career inside this evolution of computer processing and the move to the cloud. Mid 2000s I was a strong believer in native apps, writing mainly in C++ and early Qt releases. Past 2010 I started server-side development with .NET technologies on on-prem servers and eventually moved to cloud providers like AWS with much more sophisticated bells and whistles for more powerful, high-load and multi-tier solutions. It became full circle for me when working particularly in the IoT field and on edge devices, where you do need to have a NAT internal network to tolerate downtime in the internet if that happens. My learning from this whole journey was: it's the hybrid solution that usually brings the best of both worlds.
Now we are in an age where hundreds of billions are being poured into infrastructure again with the promise that everything would be shifted to this new world. Here I want to see how we could put our lessons from the dot-com era to use in the age of AI and maybe try to stay ahead of the curve.
The Centralization Problem
As of now, for AI usage widely we have only two options:
Fully on-device models: These are tiny models doing small parts of tasks on the device. Examples would be the voice-to-text for phone voice assistants (like Siri), your keyboard autocomplete and word suggestions, face detection on the camera app and others. The cool thing is they are private and your data stays on the device. On the bad side, they are pretty limited in functionality. For example, with no internet connection if you ask Siri for almost anything you are out of luck, even though the system is processing your voice and certain parts of the command.
Fully on-cloud models: These are much larger models, LLMs like GPT, Gemini and so on, even the intelligence of Siri. These models are powerful, resource hungry, capable and always improving, but then your data is out of your hands. You have got to trust the external party, and this could raise a few issues:
Bad actors: If a company has an employee with bad intent who decides to misuse the resources exposed to them. It is surprising in this day and age, with all the standards and sophisticated tooling we have access to and the best practices to allow only on-demand access to data flows, what a large group of employees still have access to personal data, even though it is "anonymised." This isn't a hypothetical concern. In 2022, images from Roomba test units, including a photo of a woman on the toilet, ended up on social media after being shared by contractors training the AI.
Cyber security attacks: Just this last year we heard from the largest players like OpenAI, Anthropic and so on that their data has been breached. It wouldn't take a Cassandra to predict we will have more incidents in the future.
Government involvement: As of now, governments and judicial systems depending on where you live could request access to your data, and the company hosting the solution has no other way but to comply with the order. If you look closely, in most ZDR regulations they have a clause to cover their own behind here, saying they only keep the trace of the data for misuse of the platform. In this case the authority could basically understand who is at fault in case of an incident. But also, in this ever-evolving world we live in, with the largest powers trying to get ahead of one another at any cost, it does not take too much imagination to see one government asking for sensitive data to find leverage.
As of now there is little middle ground. No hybrid solution. No IoT-like multi-tier cloud where you could still take the benefit of using the cloud but also take the advantages of the on-prem solution.
The Cloud Computing Playbook
Late 1990s and early 2000s we were living in a world where most computation was happening on privately owned infrastructure. That's when AWS entered the market, revolutionizing cloud computing early on with EC2 (compute) and later S3 (storage) and other technologies. Benefits were very much real: instant scalability, no upfront capital and very reliable if executed correctly. For smaller companies and startups especially, this was a superpower.
But then after a while an interesting thing happened. Companies started to do the math. Dropbox, one of the heaviest users of AWS S3, dropped AWS and saved $75M over two years. Basecamp's CTO famously said "Cloud is not the only solution" when they scaled back to privately owned hardware and saved ~$3M on annual spending.
With the expansion of the Internet of Things, both in industrial scale and home usage, the choice was clear: you need a multi-tier network to ensure you have service continuity. This is where I got back into the mindset of hybrid solutions. In 2016 I was working with multiple IoT protocols like Z-Wave, Zigbee and so on. They do an amazing job managing the fleet of devices on-prem, but then you need to connect their NAT to the internet with a connector. The connector is essentially acting as a router to decide what needs to be processed locally and what should be pushed upstream to the servers. Back then I was working in a startup specialized in Z-Wave as the weapon of choice, and I managed to put together this gateway that was doing exactly that. The users were seamlessly benefiting from the reliability and the expandability.
I do believe AI needs to make the same journey.
What Distributed AI Could Look Like
Let's now imagine if AI would work this way.
You buy a new vacuum cleaner. It has a camera for object detection and mapping. Instead of forcing you to send the data to their cloud, it allows you to configure your own AI powerhouse. Maybe it's a box you keep close to your router. Let's call this the "AI Gateway." So when the vacuum needs to send a picture for labeling, this box could decide to process it locally if it's sensitive information, pass it to your trusted AI provider, or forward it to the manufacturer.
So let's go through a few more examples:
Home footage camera? That should all be processed locally, or in a cloud you really trust, since it's the most private data you have.
Generic question and query from an LLM? Maybe this is fine to be processed on the cloud AI provider, either anonymously or with memory attached.
Medical and financial queries? Either fully anonymised before sending to the AI labs or processed locally as much as possible.
Kids' homework? Heavy anonymization filters to protect their identity and prevent profiling before sending to labs.
So essentially we need a strong routing protocol that enables us to handle these three tiers of trust:
Tier One: On-Prem (Maximum Trust). Your data should never leave your premises and be processed locally. This could be achieved with a NAS with a mid-range GPU, since the models are getting smaller. You don't need amazing performance for these requests, or at least be able to tolerate the slower token/response generation. You won't be running full-on GPT-5.2 locally, but then again maybe an open-source 20B parameter model would be enough.
Tier Two: High Trust Provider with True ZDR Governance. In this case your data leaves your premises, so you still need to trust the provider. But then the provider should contractually guarantee zero data retention. No logs, no training, and no retention beyond the milliseconds they need to process the data, with no asterisks (I am talking to you, OpenAI). This tier comes with inherent risk of trusting a party, but then you would have legal power in case of an incident and protection when and if things go south. This should be the sweet spot for many tasks and larger models. The providers should be able to support a wide array of standardized models with different capabilities for different use cases. Ollama Cloud seems to be moving in this direction at the moment.
Tier Three: Standard Cloud AI (Baseline Trust). This is all AI tools we use today, the wild west of AI hosting. Companies are creative with their ZDR policies if they have one. They might be using the data for further training, logs, reporting, and data might be shared with third parties. I am a strong believer that data in this tier should be heavily redacted and anonymised before sending, to minimize the risks. The queries need to stay generic when they can and be limited. This is where we really need to move toward the first two tiers.
The Gateway
This is not really magic. We have all the tooling and protocols required for a push toward this secure solution, and the good news is it only requires very little modification and standardization from the community and the larger players. As it stands, at least on the major LLM provider side, we do have more or less consensus on the technology use and standardization involved: all large-scale providers do have, for most cases, a RESTful API with very similar schemas to send the request and either stream or return the response synchronously. Although there are massive similarities (I assume due to a close loop of talent and product evolution, closely observing each other), there are smaller differences that could simply be encapsulated. We do have solutions and libraries right now that for most parts are converting one API to another. Sending of images is also largely very similar across big players: OpenAI, Anthropic, Google, Ollama, etc. That being said, there is little to no normalization for smaller providers or devices like voice assistants, security cameras and smart devices. Maybe manufacturers could take a lesson or two here from LLM labs.
As for the working Gateway router, there is a need for a few basic functionalities:
- Request Classification: A small on-device model to determine the severity of the request, the policy needing to be applied, and to identify the sensitive parts of the query and their associated category. What device is this originated from? What information does it contain? Are there parts of the request that need to be potentially redacted?
- Rule Engine: This is the rule-matching logic. In the ideal world, the user should be able to modify the ruleset depending on their comfort level and what fits best for them. The idea would be to set the tiering of the request based on classification done in earlier steps, or to redact information in the request and potentially rehydrate it back.
- Provider Selection: The gateway maintains the list of providers to potentially redirect to based on the rulesets.
- Fallback: What if the internal GPU has a long queue of requests? What if a provider is down? The gateway should be able to fall back according to a configuration set.
The Home Setup
Let me paint the picture of how this would work.
You've got a small box next to your router. It's got a decent GPU, runs open-source models, handles redirects based on rules, and has configuration with an internal audit trail.
Your robot vacuum cleaner, your smart fridge, security camera and smart speaker all come with the feature to set the upstream server for their model usage. Upon setup, the gateway fetches their requested models from a public trusted repository of open models. Now, next time your teenager asks the smart speaker about a sensitive health topic, it goes through your router, identifies the severity of the request knowing the origin and nature, loads the local model's response, and returns the value to the smart speaker. In this way, your teenager has received an answer to their question with no compromise of their own privacy or security. Your vacuum cleaner wakes up at 3 AM and starts sending pictures for classification of the objects it captures. Your gateway notices no physical human presence and lets the data leave untouched to the external provider.
This isn't a paranoid AI fantasy. It's the same evolution we have already made in cloud processing and general computing. Hybrid cloud won because it made sense and delivered the best of both worlds.
Why This Matters for Everyone
"I have nothing to hide."
That's the most common response when it comes to lack of privacy. This argument falls apart as soon as AI infrastructure becomes more than a novelty and enters common use.
AI Sees Everything
We saw a glimpse of this when cookie tracking became the norm around 2015, and how the industry moved fast to disable third-party cookies, bring anonymity to the user, and stop the creation of user profiles. But if you thought that was bad, you are in for a surprise here. With AI it's not only the content but the context that would lead to much more detailed profiling and tracking. Your robot vacuum cleaner knows the plan of your apartment, what items you own, what's the pattern of your daily behavior. Your smart speakers, your fitness trackers, your security cameras, and most importantly your LLM provider together have a better understanding of you than even your partner or your therapist would. This information is very valuable and could easily be used not only for targeted advertisement but to evaluate you on your insurance, potential jobs, and every little aspect of your life. This is truly a world with no privacy.
The Geopolitical Dimension
Information is power. Let's zoom out further. Already in the last few years we are seeing patterns of aggression between nations trying to establish their territory beyond modern-era borders. In this effort, no means are overlooked. As of now, most AI buildup is being done in very few countries, and the power to run them is even more limited to basically two main countries: USA and China. If history is any lesson, there is no benefit to this duopolistic market ownership, and we all collectively lose in such a situation.
The power imbalance is staggering. We're providing all intimate and sensitive information: our questions, confusions, concerns and creative ideas, to those companies operating essentially in a black box.
The goal isn't zero trust, but appropriate trust. Trust that YOU choose, for reasons suited to you, with alternatives if and when that trust is broken.
The Path Forward
I have already pointed my personal effort toward such a solution. I have bootstrapped a prototype of such a gateway: https://gatewise.ai. Although very much early stage, the vision is the same, that being more toward B2B for the time being: to enable diversity, freedom of choice and anonymity to LLM usage in particular. If you have made it this far, I invite you to have a look at the solution and share your opinion about this matter or the solution to my email: ali.khoramshahi@hotmail.com.
As excited as the AI revolution is, let's try to avoid the same mistakes of the dot-com era and step ahead of bad actors.
