Sound Manager SDK
Immersitech Sound Manager SDK to process and mix 3D spatial audio.
Table of Contents
- Pseudo Example
- API Reference
The Immersitech Sound Manager SDK is a C/C++ library that functions as an audio mixer and audio processor featuring 3D spatial audio processing, noise reduction, and speech enhancement.
The Sound Manager SDK is currently made for people who have direct access to raw audio data. If you can access this, the SDK can collect them and return to you a raw audio output buffer that has been processed.
Additionally, the Immersitech Sound Manager SDK allows you to edit or query the audio settings in real time for any listener or speaker.
Before diving into how to utilize the library, first we establish some concepts.
Listeners & Sound Sources
In a 3D audio space we can establish one person as a listener and every sound source they hear can be a source. In the Immersitech Sound Manager SDK, each participant is a listener and each other participant is a source for them.
Devices and Half-span Angle
To achieve the optimal 3D spatial audio qualities, the SDK needs to understand what type of physical audio equipment setup the user has. To this end, there are some setting in the audio control parameters that allow for this information to be set for each conference participant.
HALF_SPAN_ANGLE lets the SDK know the relative angle the user is to the physical speakers they are listening to. This parameter is not used when a participant's device is set to headphone, it only applies for participant's using speakers. The angle can be from 1 to 90 degrees (integer values). In most cases, the values can be sometime like the following:
- 7 degrees for small speakers
- 15 degrees for laptop speakers
- 25 degrees for larger television sound bars
3D Coordinate System
The Immersitech libraries can place each participant into a 3D location, but knowing which axis x, y, and z refer to is important. Moving along the x axis will move a participant left or right, moving along the y axis will move a participant up or down, and moving along the z axis will move participant forward or backward, all three relative to the center point (0,0,0). Also note that the unit describing these x, y, z coordinates is Centimeters. Therefore, a participant at (15,-10,50) is 15 centimeters to the right of the center, ten centimeters down from the center, and 50 centimeters in front of the center.
Another important feature of the Immersitech 3D grid is that all participants will turn to face towards the center point (0,0,0). Keep this in mind when you are listening to hear a position change, as it is relative to both where the source is located AND which direction the listener is facing.
Audio Parameter Notation
In the Immersitech Libraries, we use the notation where one sample is a single value, one frame contains one sample per channel, and one buffer contains one sample period worth of frames. To learn more about this notation, visit this web resource.
Whisper and Sidebar Rooms
This concept is not critical for starting to use the library, feel free to return to this concept when you need to have side conversations from the main conference. The idea of this feature is that a participant can say something to another participant of the same conference without everyone in the conference hearing them.
A participant who is not in any whisper or sidebar room will be considered to be in the main room. A participant in the main room will only hear sources that are also not in any whisper or sidebar room. A participant in the main room will only act as a source for listeners in the main room or listeners in any whisper room.
A participant in a whisper room will hear all sources from the main room and also all sources in the whisper room. A participant in a whisper room will act as a source only for listeners in the same whisper room.
A participant in a sidebar room will hear only sources in the same sidebar room. A participant in a sidebar room will act as a source only for participants in the same sidebar room as them.
A participant in both a whisper room and a sidebar room will hear all sources from the sidebar room and also all sources in the whisper room. A participant in both a whisper room and a sidebar room will act as a source only for listeners in the same whisper room.
Let's take a high level look at the small number of simple steps needed to utilize the library:
The first step in your code will be to initialize the Immersitech library. This step will allow Immersitech to set everything up internally for audio processing.
imm_initialize_library(license_file_path, SAMPLE_RATE, FRAMES_PER_BUFFER, IMM_OUTPUT_TYPE_STEREO_INTERLEAVED, IMPULSE_LENGTH);
Optionally, check the version of your library to ensure you are up to date
printf("The Immersitech library version is %s\n", version_string);
Now that the library is initialized, we can begin to create conferences. Note that if you haven't initialized the Immersitech library the returned library will be NULL.
Immersitech_Conference my_conference = imm_create_conference();
Let's add two participants into this conference, both with 1 channel input
We can store the return IDs to access them later and edit their settings
ID_1 = imm_add_participant(my_conference , SAMPLE_RATE, NUM_CHANNELS, IMM_PARTICIPANT_REGULAR);
ID_2 = imm_add_participant(my_conference , SAMPLE_RATE, NUM_CHANNELS, IMM_PARTICIPANT_REGULAR);
Now that we have some participants in our conference, let us start processing audio
This happens in two steps, first input all the Participants audio, then process and generate the output for each participant
The first of the two steps is to add each Participant's audio into the engine when you receive it. Do this once for each participant, as we are establishing this the participant's input audio and this is the audio that should be used when considering this participant as a source.
This means you will add one buffer worth of data, the same size as you initialized the library with relative to that input participant's number of channels and sampling rate. If your Participant has 2 channel input, they should still have one buffer of data, but naturally will include twice as many samples. They should be in Block or Interleaved form depending on how you initialized the Engine. The table below will exercise some common examples:
|Who||Sampling Rate||Number of Frames||Number of Channels||Number of Samples|
|Conference Output||48 kHz||480||2||960|
|Participant 1 Input||48 kHz||480||2||960|
|Participant 2 Input||48 kHz||480||1||480|
|Participant 3 Input||16 kHz||160||2||320|
|Participant 4 Input||16 kHz||160||1||160|
|Participant 5 Input||8 kHz||80||2||160|
|Participant 6 Input||8 kHz||80||1||80|
Please ensure that your input buffer has the correct number of samples and that you enter the number of FRAMES into the function call and not the number of samples.
imm_input_audio( my_conference , my_audio_data_1, num_frames, ID_1);
imm_input_audio( my_conference , my_audio_data_2, num_frames, ID_2);
The second step of audio processing is to generate the output for each participant as a listener. This means call the process function once for each participant to generate the stereo output of what that participant should hear.
To do so, simply provide an output buffer in which to store the results. The output buffer data will be formatted the way you specified upon initializing the library. Find more information about the different output formats under Output Formats. Once again, you will want to ensure that the output buffer you provide has enough memory allocated for the number of frames and number of output channels you selected.
imm_output_audio( my_conference, ID_1, participant_1_output);
imm_output_audio( my_conference, ID_2, participant_2_output);
And that's it! You can then adjust the features of the audio processing for each participant whenever you want as follows:
imm_edit_participant_state( my_conference, ID_1, IMM_CONTROL_ANC_ENABLE, 1);
imm_edit_participant_state( my_conference, ID_2, IMM_CONTROL_MASTER_GAIN, 70);
imm_edit_participant_state( my_conference, ID_2, IMM_CONTROL_DEVICE, IMM_DEVICE_SPEAKER);
To move a participant in 3D space, simply adjust their location:
imm_edit_participant_location( my_conference, ID_2, 10, 30, 20);
If at any point a participant chooses to leave the call, remove them from the conference:
imm_remove_participant( my_conference, ID_1);
imm_remove_participant( my_conference, ID_2);
When a conference is finished, free all the memory for that conference:
imm_destroy_conference( my_conference );
When you are finished using the Immersitech library, be sure to destroy the library to free the memory allocated during initialization. Do not call this function before you are completely finished using the library:
In order to use the Immersitech Sound Manager libraries, you will need these files:
- libimmersitech-sound-manager.dll (.dylib for mac or .so for linux)
- libimmersitech-sound-manager.lib (only necessary for windows)
To use the Immersitech Sound Manger Library, include immersitech_sound_manager.h in your projects and add the functions to your code. You will also need to make sure to link the dynamic library to your project and ensure it is in the location you linked it to. Make sure also in your code that the path you supply to your license file matches the path you gave to the Immersitech Library.
This handle is the structure you should use to store a conference.
|Function to initialize the loaded Immersitech libraries. You must call this function before you can start creating conferences as it initializes the common_handle. You may not call this function a second time if there are conferences currently runnning, wait for all conferences to be destroyed before running again.|
|int||imm_destroy_library ()||Function to free all memory used by the Immersitech Library. Note that you must be done using all your conferences before you call this function.|
|Function to retreive version of Immersitech libraries|
|Immersitech_Conference||imm_create_conference ()||Function to allocate memory and initiliaze Immersitech Conference Instance. Returns your instance handle.|
|Function to free all memory used by an Immersitech Conference Instance|
|Function to add a participant into an Immersitech Conference Instance|
|Function to remove a participant from an Immersitech Conference Instance|
|Function to add input (source) audio for a participant in an Immersitech Conference Instance|
|Function to generate the output for a given participant would hear given all other participant's source audio|
|Function clear out all data related to a particulat participant|
|Function to change the spatial location of a participant with cartesian coordinates|
|Function to change the Listener audio effects for a given participant's Listener output|
|Function to change a single source-listener pair's spherical coordinates (Not Recommended for Use)|
|Function to get a participant's cartesian location in a conference|
|Retrieve the Function to view a given participant's state for a particular Listener audio effect|
|Function to view a given source-listener pair's spherical coordinates (Not Recommended for Use)|
|Function to print a more detailed message about an error you received. Returns 1 if the code you provided is a valid error code or 0 if the code you provided is not an immersitech error code.|
Audio Control Parameters
These are a list of effects you can apply to a given participant in a conference. What each effect will do to the audio stream is described below.
|Name||Default value||Possible Values||Description|
|IMM_CONTROL_STEREO_BYPASS_ENABLE||0||0 or 1||If enabled, this state will cause all of the following effects to be bypassed for this participant, regardless of whether or not they are enabled.|
|IMM_CONTROL_MUTE_ENABLE||0||0 or 1||If enabled, this state will prevent this participant’s input audio from entering the conference.|
|IMM_CONTROL_MIXING_3D_ENABLE||0||0 or 1||Enables 3D mixing for a participant.|
|IMM_CONTROL_ANC_ENABLE||0||0 or 1||If this state is enabled, noise cancellation will be activated and the listener will hear the other speakers more clearly|
|IMM_CONTROL_AGC_ENABLE||0||0 or 1||If this state is enabled, automatic gain control will be activated and the listener will the other speakers at a consistent volume level|
|IMM_CONTROL_DEVICE||0||a value from immersitech_device_types||This state allows you to optimize the output audio for a participant depending on what type of device they are listening on. This currently supports your device being headphones or stereo loud speakers. See the section describing immersitech_device_types for more details|
|IMM_CONTROL_HALF_SPAN_ANGLE||15||1 to 90||Integer value from 1 to 90 representing the half-span angle from the center-line.|
|IMM_CONTROL_MASTER_GAIN||100||0 to 100||Integer value representing the percentage of unity gain applied to this listener's output.|
|IMM_CONTROL_WHISPER_ROOM||0||0 to 100||Integer value representing the whisper room a participant is currently in. If the value is zero, this indicates the participant is not in a whisper room.|
|IMM_CONTROL_SIDEBAR_ROOM||0||0 to 100||Integer value representing the sidebar room a participant is currently in. If the value is zero, this indicates the participant is not in a sidebar room.|
The following list describes the supported types of devices that you can supply for the IMM_CONTROL_DEVICE control.
|IMM_DEVICE_HEADPHONE||1||If you are wearing headphones (in-ear, on-ear, or over-ear), select this device type. If you are using a device that is neither headphones nor stereo loudspeakers, you should default to this choice.|
|IMM_DEVICE_SPEAKER||2||If you are using stereo loud speakers, such as bookshelf speakers for example, select this device type. You will also want to adjust IMM_CONTROL_HALF_SPAN based on how you speakers are setup.|
The following list describes the supported types of devices that you can supply for the ImmAddParticipant function.
|IMM_PARTICIPANT_REGULAR||1||A regular participant will input audio into the conference as well as listen to the output from a conference.|
|IMM_PARTICIPANT_SOURCE_ONLY||2||A source-only participant will input audio into the conference but WILL NOT listen to or generate the output from a conference.|
|IMM_PARTICIPANT_LISTENER_ONLY||3||A listener-only participant WILL NOT input audio into the conference but will listen to or generate the output from a conference.|
You can select the way in which your output data will be formatted using one of the enum values below in the initialization function.
|IMM_OUTPUT_FORMAT_MONO||1||One frame of single channel data. Note that all 3D effects will be rendered useless with this output format.|
|IMM_OUTPUT_FORMAT_STEREO_INTERLEAVED||2||Two channels of audio data which are interleaved. Ensure your buffer will have two frames worth of samples allocated.|
|IMM_OUTPUT_FORMAT_STEREO_BLOCK||3||One frame of left channel data followed by one frame of right channel data.|
The following is a list of error codes that can be returned by any of the functions in the Immersitech API.
|Error number||Internal Key||Description|
|-10000||IMM_ERROR_NONE||No errors. Everyone is good.|
|-9999||IMM_ERROR_ENGINE_NULL||The handle to your Immersitech_Conference_Handle was NULL. This could mean you didn't initialize it or you sent the wrong handle.|
|-9998||IMM_ERROR_DATA_NULL||The handle to your data was NULL. This could mean you didn't initialize it or you sent the wrong handle.|
|-9997||IMM_ERROR_DATA_LENGTH||You supplied a data array of the incorrect length. Character array for version string: at least 11 characters long. Input audio array: cannot be longer than the frames per buffer size of the conference. Ideally holds frames per buffer of the conference times the number of channels for that input. Output audio array: Exactly 2 times the number of frames per buffer of the conference.|
|-9996||IMM_ERROR_NUM_CHANNELS||For a participant's input, you must use either 1 or 2 channels. For a participant's output, you must use 2 channels. The value you supplied was outside this range.|
|-9995||IMM_ERROR_SAMPLE_RATE||The Immersitech Library currently only support matching input and output sampling rates. The value you supplied for the new participant did not match the one you supplied for creating the conference.|
|-9994||IMM_ERROR_INVALID_ID||The Participant_ID you supplied did not match any of the Participant IDs in the conference. The request was not fulfilled.|
|-9993||IMM_ERROR_INVALID_CONTROL||The Flag you supplied did not match any of the immersitech_audio_controls available. The request was not fulfilled.|
|-9992||IMM_ERROR_INVALID_VALUE||The value you requested for the edit was not valid. See the valid values below:
|-9991||IMM_ERROR_CONFERENCE_EMPTY||You attempted to make a change without adding any participants into the conference first. The request was not fulfilled.|
|-9990||IMM_ERROR_PARTICIPANT_TYPE||You attempted to use a Participant in a way that does not work. The possibilities are:
|-9989||IMM_ERROR_LIBRARY_ALREADY_INITIALIZED||You attempted to initialize the library. However, the library has already been initialized. The request was not fulfilled.|
|-9988||IMM_ERROR_LIBRARY_NOT_YET_INITIALIZED||You attempted to free the library but you haven't initialized it yet. The request was not fulfilled.|
|-9987||IMM_ERROR_LICENSE_DATE_EXPIRED||The license file you supplied has expired. Your library will run in bypass mode only. Please contanct Immersitech for a new license.|
|-9986||IMM_ERROR_LICENSE_VERSION_MISMATCH||The license file you supplied is not valid for the version of the library file you are trying to use. Your library will run in bypass mode only. Please contanct Immersitech for a new license.|
|-9985||IMM_ERROR_LICENSE_TAMPERED||The license file has been corrupted. Your library will run in bypass mode only. Please contanct Immersitech for a new license.|
|-9984||IMM_ERROR_LICENSE_MISSING||The path to the license file you supplied does not exist. Your library will run in bypass mode only. Please make sure you have the correct path and the file is in the specified location.|
What is the default location and effect settings for a new participant?
A new participant will always be placed at location (0,0,0) and have all effects turned off by default. Additionally, by default all participants will have their device set to IMM_DEVICE_HEADPHONE and will not be in any whisper or sidebar rooms.
What is the benefit of changing the value for impulse length?
A shorter impulse length (minimum 512) will improve the CPU and RAM performance of the library but reduce the quality of the 3D rendering. A longer impulse length (maximum 8192) will improve the quality of the 3D rendering but use more CPU and RAM. You can pick a value based on which is most important to your application.
Is this library thread safe?
The library is thread safe for all general and expected use cases.
For cases that should not be done or are unexpected in regular programs, such as trying to call imm_initialize_library multiple times from different threads, it will likely crash. If you have an irregular use case that crashes with threading, let us know and we can investigate a solution for you.
What Sampling rates and Buffer sizes are supported and why?
The Immersitech Libraries currently only support an output sampling rate of 48 kHz or 44.1 kHz. We will not ever support an output sampling rate of less because that would defeat the goal of having immersive three dimensional sound. We do not currently support higher output sampling rates, but we could implement them given a your request.
The Immersitech libraries currently support only 10 millisecond buffers (480 frames), 20 millisecond buffers (960 frames), 512 frame buffers, and 1024 frame buffers. To fit your application of a different buffer size, you can call imm_input_audio and imm_input_audio multiple consecutive times to get a different value. You can also reach out to us to request a different buffer size.