Librunecoral: Runes on coral devices - part 2

On the road to enabling more Devices, more Models and better continuous integration

One of the first dreams we had for Rune was run any computation anywhere. The work we have done on librunecoral helps gets us closer to that goal! Following up from our previous post on librunecoral we go deeper into new improvements that enables more model types, device support and cleaner interfaces for building.

macro photography of black circuit board
Supporting more and more devices from Rune without sacrificing performance is a tough challenge and librunecoral lets us get the ball rolling

Device Support: Arm v7

Ever since our announcement of librunecoral, we have been very busy improving it. The main improvements we have made so far are:

  • Added support for Arm v7 architecture

  • Usability improvements to Build system

  • Improvements to Continuous Integration

We now use cross to build and test the whole library against the following targets:

 - Linux: armv7, aarch64, x86_64

 - Android 6.0+: aarch64, x86_64 

 - Windows: x86_64

 - MacOS: x86_64

 - iOS: aarch64

These improvements helped us integrate librunecoral into rune and helped us provide a smoother developer experience. All you need to compile rune with librunecoral now is use this one command:

cross build --bin rune --target aarch64-unknown-linux-gnu

Our precompiled rune releases also include librunecoral now.

Improved Model Architecture Support

Tensorflow 2.6 TFLite Ops supported

Before the integration of librunecoral with the rune, we were running into the tflite ops issues (E.g. “Didn't find op for builtin opcode 'RESIZE_BILINEAR' version 3”), because of the older version of tensorflow provided by the tflite-rs crate.

Now that we have the latest tensorflow up and running, here is a glimpse of new Runes that are coming:

Single-pose: A CNN model that runs on RGB images and predicts human joint locations of a single person. It has huge applications like building an app to measure energy consumption during an exercise. You can also control a Drone with your hands.

Yolo_v5: It provides real-time object detection. It has been used in various applications to detect traffic signals, people, parking meters, and animals.

DeepLab_v3: A deep learning model for semantic image segmentation, with the goal to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image.

You can find all of our CV runes here!

Enabling Hardware Acceleration across devices

One of the main objectives of creating librunecoral is to leverage the hardware acceleration available on various hardware platforms. This means we don’t have to sacrifice performance for virtualization.

To make this a reality:

We extended rune to support various hardware acceleration feature flags.

For example:

cross build --bin rune --features runecoral_edgetpu_acceleration  --features runecoral_gpu_acceleration --target aarch64-unknown-linux-gnu 

Next steps and Benchmarks

Before we can share some numbers, and more good news, we have a encountered a few roadblocks we need to solve first:

Extend Runefile to specify the hardware acceleration preference

Not all tensorflow backends support all operations required by your models and not all platforms support all acceleration backends. So, we must extend Runefile to let you specify your preferred acceleration backends for a given model.

Enable the TPU Acceleration on the coral devices

We lost support for TPU Acceleration on Google Coral devices once we made a static library out of rune. We use libedgetpu to help us register and leverage the TPU devices. And libedgetpu isn't designed to be used as a statically linked library. This is a tricky issue to solve but we intend to bring back support for this. 

Enable 32 bit Arm support

While the future is 64 bit, there are a lot of perfectly good 32 bit Arm devices out there, that can still be used for plenty of Machine Learning tasks. In fact, the default Raspberry pi OS is also 32 bit. The Wasmer library we use in rune by default doesn't support Arm32.

Add support for Apple M1 CPU:

Currently you can run the linux based docker images on Apple Silicon, and that works, but we can add support for native builds.

Fix librunenative build for MacOS / iOS when building with cross:

This will allow us to retire the old rune_vm backend entirely and bring a lot of improvements to rune ecosystem as a whole, including mobile.

Fix Windows builds:

Windows builds are tricky because of two reasons:

  1. We use symlinks in Rune codebase, which causes problems on Windows.

  2. We run into Windows path length issues when compiling librunecoral inside rune on Windows

Do you have any preference of which tasks we need to prioritize first? If so, our team would love to hear from you in the comments below :)

Looking ahead

Our support for multiple hardware platform is growing and we are making embedded machine learning on cpu/gpu/tpu based devices easy. By continuing to invest in developer experience, tools, and AI we want to make running edge computing easy for anyone in the world.

Along with Rune support for coral devices, we also intend to roll out the following key enterprise features by end of the year:

  • Monitoring

  • Security

  • Observability

We will explain these platform features and how they come together to help run production edge computing.

Resources

Follow us on our socials:

Twitter @hotg_ai and @hammer_otg | LinkedIn | Discord | Discourse

A guest post by
Subscribe to Mohit