Currently, there’s a divide between those who back open-source AI and those who support closed AI. OpenAI stands for closed AI, pointing to safety worries while hiding their business goals. Meta backs open-source AI saying closed systems slow down progress, but they’re not doing this just out of kindness.
AI will become part of us in the years to come. We’ll move from phones to smart glasses talking to local language models and agents that adjust to what we need. These agents will talk to each other using shared knowledge and even up-to-the-minute info like the Ethereum price now. This setup will bring people together into a shared brain, which no single group should own. Ethereum’s system could actually offer a safe and clear way to swap data and run smart contracts.
Ethereum being open-source and having a strong group of developers, could play a key part in pushing forward genuinely open-source AI projects. Its ability to use smart contracts could help create AI models and ways to govern them that aren’t controlled by one group. This might help bring together these two different ways of thinking about AI.
People often get the wrong idea about “open-source” AI. AI models are just complex math equations with settings that get fine-tuned during training. GPT-3 has 175 billion of these settings, and word on the street is that GPT-4 might have over a trillion. When big companies say they’re providing “open-source” AI, they’re giving out these settings, which doesn’t match up with what open-source is all about.
In the world of AI, training is kind of like compiling software. The model settings are like the final program file, but the blueprint and training data stay under wraps. When companies like Meta or X “open-source” their models, they’re just sharing the end result. You’re stuck with just tweaking the existing model, not creating something new from scratch.
Big tech companies have a monopoly on AI development – mainly because of the huge resources needed to train large-scale models. Blockchain tech might change this. Just as it shook up finance with Bitcoin, it could pave the way for open-source, community-driven AI that’s not owned by corporations. This article puts forward an idea for this kind of decentralized AI system using blockchain tech.
The Complexities of AI Training Datasets
AI training datasets often several terabytes in size, mold a model’s ‘character’. Choosing these datasets involves cultural subtleties making it necessary to create multiple versions for different needs. Decentralized, content-addressed storage systems like IPFS or Ethereum Swarm are well-suited for these versioned multi-fork datasets. They store the changes similar to how GIT works.
Training involves complex math. Models defined by their parameter count (for example, llma-2-7b’s 7 billion), go through optimization. Backpropagation uses partial derivatives to adjust parameters across batched datasets. This process repeats until the model reaches the desired accuracy, which we check against a test dataset.
The Challenge of Decentralized AI Training
Training AI on a large scale requires huge computing power, which big companies often control. This explains why experts predict the AI market in blockchain will grow from $184.6 million in 2021 to $973.6 million by 2027, with a yearly growth rate of 31.8%.
But decentralized systems would run into reliability problems with single nodes just like Bitcoin’s energy-guzzling Proof of Work. Ethereum’s Proof of Stake gives us a more eco-friendly option using staked money as a way to measure reliability. So decentralized training would need new ways to build trust. One idea involves random checks by validator nodes. Another assumes the computation is correct but allows time to challenge mistakes. All of this aiming for true open source principles.
Zero-knowledge proofs such as zkSNARKs, offer another way to verify. They’re easy to check but right now, they need too many resources to create for AI training. Research on zkML keeps going, which might let smart contracts do the verifying in the future. These methods try to recreate centralized trust in decentralized systems. The main challenge is making sure the computations are honest without going against the principles of decentralization.
Our blockchain system uses datasets owned by the community and managed by DAOs. Members who disagree can split off the DAO and dataset using content-addressed storage to copy data . A DAO controls training, with nodes putting up collateral to join in. People requesting work offer rewards for specific dataset-model pairs. Nodes record every step of the training to make it repeatable.
To check work, we use an optimistic approach with time for challenges. Wrong calculations lose stakes pushing nodes to validate . This new system brings up key questions: Who will pay for training and storing datasets?
LLM OS and Collective Intelligence
LLM OS, an idea from Andrej Karpathy, puts a large language model at its core. In our setup, it’s an agent on a user’s node talking to other agents and tools. Swarm acts as the file system making it easy to access shared knowledge.
The future points to a blend of humans and AI, with ongoing talks through devices we wear. Custom-made digital helpers will chat with us and each other tapping into shared wisdom. This might group people into a big smart network.
This big smart network needs to stay spread out, not owned by just one group. The systems we talked about are key to keeping this AI setup widespread.