Building a Processing Block for an Audio Rune

TinyML for Audio

and

Jul 12, 2021

During our previous exploration in Rune-ifying Audio Models, we discovered that many audio models take in an input of f32. Our audio capability which is used to read audio input streams parses them as i16 data.

We require a pre-processing step for audio runes to work, which can convert our input from an i16 data type to a floating-point value. Let's explore how we can go about creating a processing block!

See the code for this post on GitHub!

Generating a Processing Block.

Processing block projects are essentially just rust projects with specific dependencies.

A barebones file structure for a processing block can be generated with the cargo command

$ cargo new audio_float_conversion --lib

The following files will be generated:

Each processing block has 2 components

lib.rs file, where the logic of the processing block is written
Cargo.toml file, which contains package information and dependencies

Dependencies

Processing blocks are dependent on the runic_core crate. The runic_core crate defines and executes the interfaces needed to make your code pluggable.

Let’s add the dependencies in your Cargo.toml.

[dependencies]

rune-core = { git = "https://github.com/hotg-ai/rune", version = "0.4.0-dev" }

rune-proc-blocks = { git = "https://github.com/hotg-ai/rune", version = "0.4.0-dev" }

In your lib.rs setup the basic crates and modules, we require.

#![no_std]
extern crate alloc;
use rune_core::{Tensor};
use rune_proc_blocks::{ProcBlock, Transform};

The alloc crate is useful for providing smart pointers and collections for heap-allocated values. The runic_types and rune_pb_core modules are necessary for the processing block to function.

Our next step would be to set some attributes and set up a struct.

#[derive(Debug, Clone, PartialEq, ProcBlock)]
#[transform(input = [i16; _], output = [f32; _])]

pub struct AudioFloatConversion {    
   i16_max_as_float: f32,
}

const I16_MAX_AS_FLOAT: f32 = i16::MAX as f32;

impl AudioFloatConversion {    
pub const fn new() -> Self {  
      AudioFloatConversion {
            i16_max_as_float: I16_MAX_AS_FLOAT,
        }    
   }
}

The AudioFloatConversion struct and impl instantiate the constant variable i16::MAX which converts i16 audio input to floating-point values.

We need to set a default value for our AudioFloatConversion struct.

impl Default for AudioFloatConversion {
    fn default() -> Self { AudioFloatConversion::new() }
}

The core of our operation occurs within an implementation named Transform.

impl Transform<Tensor<i16>> for AudioFloatConversion {
    type Output = Tensor<f32>;

    fn transform(&mut self, input: Tensor<i16>) -> Self::Output {

     input.map(|_dims, &value| (value as f32 / i16::MAX as f32).clamp(-1.0, 1.0))
    }
}

The input Tensor is in i16 format which is specified in the first line with Tensor<i16>.

We need to specify the Output type. Since we need to convert our data to floating-point values, our Output type is Tensor<f32>

The transform function contains the main component of our processing block.

We use the map function to apply our algorithm to convert and normalize every element of the i16 Tensor to an f32 Tensor.

Finally, we have an implementation named HasOutputs which is used to set our output dimensions from the instructions provided in the Runefile.yml.

impl HasOutputs for AudioFloatConversion {
  fn set_output_dimensions(&mut self, dimensions: &[usize]) {
    
   assert_eq!(dimensions.len(), 1, "This proc block only supports 1D outputs (requested output: {:?})", dimensions);
  }
}

We can additionally check to ensure that data of the appropriate format is being parsed through the processing block. In our current scenario, we only require support for processing a 1-dimensional array.

Test Cases

Our final step is to ensure that the processing block works as intended. We'll have to set up some tests for this.

Let's begin by adding a test case to check if our processing block can handle empty data.

#[cfg(test)]

mod tests {
    use super::*;
    use alloc::vec;
    
    #[test]
    fn handle_empty() {
        let mut pb = AudioFloatConversion::new();
        let input = Tensor::new_vector(vec![0; 15]);
        let got = pb.transform(input);
        assert_eq!(got.dimensions(), &[15]);
    }
		
}

Here we are instantiating our processing block, creating an empty Tensor that contains 15 elements, and are applying our processing step with the transform function.

Our test confirms if an array of 15 elements have been created.

Similarly, we can create 2 more test cases to check if our converted data matches the expected output and ensure that any data beyond the set limit in the clutch function is being normalized appropriately.

#[cfg(test)]
mod tests {
    use super::*;
    use alloc:vec;
    #[test]
    fn does_it_match() {
        let max = i16::MAX;
        let min = i16::MIN;
        let mut pb = AudioFloatConversion::new();
        let input = Tensor::new_vector(vec![0, max, min+1]);
        let got = pb.transform(input);
        assert_eq!(got.elements()[0..3], [0.0, 1.0, -1.0]);
    }

    #[test]
    fn does_clutch_work() {
        let max = i16::MAX;
        let min = i16::MIN;
        let mut pb = AudioFloatConversion::new();
        let input = Tensor::new_vector(vec![max, min, min+1]);
        let got = pb.transform(input);
        assert_eq!(got.elements()[0..3], [1.0, -1.0, -1.0]);
    }
}

To ensure that all our tests are working as intended we can run

$ cargo test

All 3 tests should show that they have passed.

Creating a processing block is a straightforward process, however, we have barely scratched the surface of the power and versatility processing blocks bring us. Normally when TensorFlow Lite models are deployed to devices, running inference on the model requires code to be written in the native language of the device being used. Data processing code written in multiple different programming languages can have different nuances and outcomes. Runes solve this problem by allowing data to be processed without requiring custom code for the various devices to where the rune is being deployed to.

An extremely valuable resource in building your own runes is our API documentation. Be sure to check out our growing repository of processing blocks as well!

To facilitate the usage of our technologies, we are working on creating a series of easy-to-understand tutorials.

Building a Processing Block for An Audio Rune will be the first of many such posts in our endeavor to educate our growing community.

Join our Community:

Twitter @hotg_ai and @hammer_otg | LinkedIn | Discord | Discourse

tinyVerse

Discussion about this post