The Future is already here, and it is definitely distributed.

Age of private data opulence

May 03, 2021

To quote one of our favorite sci-fi author:

The future is already here – it's just not evenly distributed.
— William Gibson

We want to take an artistic license and re-quote it as:

The future is already here, and it is definitely distributed
- Hammer Of The Gods

In the world of AI/ML, collaboration is usually hard because datasets are not made public and improving models require recreating an enormous set of data to improve models. As a result, most of the collaboration happens via jupyter notebooks at the level of algorithms (not data).

With tinyML, the world is changing where you don’t need an enormous amount of data, and more importantly relevant data can be captured on readily available devices. For the first time, this presents an opportunity to improve the model with Real World Data.

Real World Data is unimaginably big and leveraging this data will need more sophisticated ways of extracting signal from the noise.

An illustration of the cosmic microwave background. — After some-while it is all just noise so we need to get much closer. (Pictured The Planck satellite mission mapped light temperature differences on the oldest surface known — the background sky left billions of years ago when our universe first became transparent to light. Those differences helped to recreate the sound of the Big Bang.European Space Agency/Planck Collaboration)

Building good baseline models in this environment requires certain foundational pieces such as the ability to reproducibly deploy models across devices and device platforms [Btw, Rune does this!]. Reproducible execution of ML environments allows for improved model work flows .

Now we can collaboratively improve a model by exposing it to multiple tinyverse data and it will create a Cambrian explosion of models (like npm modules). Developers can now improve their models by having forks or pull requests to it, or even decide to use model A over model B based on its performance characteristics.

In order to facilitate the new world of tinyML collaboration, we need a new set of tools that will support workflows for open source as well as private source. We are building the product stack to help solve some of these challenges such as:

Accelerate development of TinyML apps with Rune

Rune is a tiny container specifically designed to help you containerize TinyML applications across several platforms and devices. It is like docker but tinier.

$ touch ./Runefile
$ rune build ./Runefile
$ rune run my.rune

Rise of private datasets

We have strongly asserted in the past that the future is tiny for several reasons:

Sensors generate more data than we capture or use
5G will bring low latency and high bandwidth leading to higher density of edge devices
Computations will move to the edge, and federate back intelligence
Billions of edge devices to be launched over next few years

The proliferation of distributed edge devices means that centralizing data is much harder, if not futile. In fact, there are certain domains such as healthcare where the data is better off not centralized for privacy reasons. The data will stay on the edge, and stay private - this is an era that unlocks an age of private data sets.

This is why we are building our TinyML Ops tool -hmrd allowing you to run it locally to collect private data.

Reliably deploy TinyML models in your apps keeping your data private

Reliable and repeatable builds are the hallmark of production grade systems. Hammer lets you deploy and manage your containers across devices and platforms securely and privately.

rune = hmr.Rune(org,name,runefile,version)
rune.get_data() #get data - only on your system
rune.set_model(m_loc) # add new models
rune.deploy() # rebuild and deploy rune all locally!

Additionally, we will be kicking up the security products by launching our enterprise suite later this year that will allow you to protect your Rune using zero-trust runes and more. Stay tuned!

If everything is private, how do we learn?

This is a very good question, and a very important one especially in healthcare. Understanding the longitudinal health of a person requires for us to get consent to the private data. It is only with secure access to data can we make good model baselines.

This is how we see the path to the future:

Default opt-out and explicit opt-in required: every data point will have to be explicitly opted in, to share back to centralized locations for training. The control is firmly back with the user or entity generating the data.
Start with the baseline models and improve: the one good thing with tinyverse is that you can start by collecting limited data for a very narrow problem, and build a simple model. This model can be improved on the device over time, using feedback from the user. We’ve seen voice recognition models improving drastically utilizing this technique.
Technologies such as federated learning, differential privacy, and other cryptography techniques (such as homomorphic encryption, secure multi-party computation, and zero knowledge proofs) are tools in our toolbox to keep the data private and still improve the model. Today, these technologies and techniques cannot run on embedded devices due to constraints. Yet, mobile devices already support these techniques and technologies. Runes will support these class of devices, allowing us to take advantage of their strengths and mitigate their weaknesses!

We will enter an age where there will be an opulence of data on the edge, and the ones who can use it will move ahead of the curve.

Ushering a new era of research

We want to push the envelope of what is possible on the edge by taking these ideas to decentralized clinical trials.

We are excited to announce that we are a founding participatory member of the IEEE Technology and Data Harmonization for Enabling Decentralized Clinical Trials driving the Novel digital biomarkers and continuous composite endpoints working group.

The world of clinical trials and observational trials are mostly centralized, slow to complete, lacking diversity, and not using health signals that can be captured today with modern devices.

Novel digital biomarkers are often used in studies taking place in settings other than healthcare facilities, and their collection can be facilitated by edge computing and federated learning. Smartphones and tiny sensors will be able to capture health signals that will be continuous and have predictive qualities.

Runes can help define a “standardization” and “normalization” framework for how these biomarkers can be used in such a setup.

We will be sharing more about our work over time on how we intend to make an impact on decentralized clinical trials.

So what’s next? True democratization for ML.

TinyML and embedded machine learning will be creating a new type of AI that is much more accessible to the masses. The models don’t require gigantic data sets to improve due to the constrained universe of tinyverse. This means, there is a potential to bring the power of the community to improve and build amazing models collaboratively.

If you love what we are building and/or writing, please share!

Subscribe & follow us to check out more upcoming posts!

tinyVerse

Discussion about this post