Collective Servers

Store private data off-chain while permissioning it onchain

To enable community-owned models and datasets, we use a trusted compute node with asymmetric encryption. Users contribute their data by encrypting it client-side with the node's public key. It is decrypted with the corresponding private key held securely on the node.

The trusted compute node can only run code that is approved by the collective. With a community-owned dataset, for example, for a company to pay for access to train on the data, the company (or someone on their behalf) puts up a pull request with the code that they need to run on the secure node. This could include code to transmit the data to them, or code to train the model. Note the trusted compute node does not have access to a GPU, so to train larger models, the data requester must setup a secure compute node that that its own private key, then request that an encrypted copy is sent to that heavy compute node. Then, the data requester submits a proposal to the DAO describing what they would like to do.

If the DAO approves the proposal (and code), the pull request is merged and deployed to the node. Here is the code running on the node. However, this still requires trusting that the node operator(s) will adhere to the approved code and not introduce any vulnerabilities.

Ongoing developments

Our current approach is a step towards decentralization, but still relies on trust in the compute node and code approval process. If the node operator, which at this point is only Vana, were to deploy malicious code to the server, or otherwise compromise the private key that can decrypt the data, then the node operator would be able to access the underlying data.

We are eager to add more secure and decentralized options as fully homomorphic encryption, distributed training, and other privacy-preserving technologies mature. We looked at implementing ways to reduce the trust required by splitting model weights or splitting the dataset across many machines in a way that preserves privacy, but have not found a satisfactory solution, so rely on a trusted compute node today. For example, is doing great work letting many users load a small part of the model for distributed inference.

Last updated