Documentation – ultraspeech.com

System requirements

Hardware requirements

One of the following ultrasound imaging systems (with a compatible probe): Terason T3000 (laptop-based or OEM module), which is the system Ultraspeech was initially designed for / Telemed Echoblaster 128, which is supported since 1.3 release.
An ASIO/DirectX compatible soundcard for audio recordings (ASIO drivers strongly recommended)
A WDM-compliant camera for video recordings (Ultraspeech has been tested with this Imaging Sourcecamera).

To obtain accurate and consistent data, the use of a probe stabilization system such as the helmet developed by Alan Wrench (Articulate Instruments) is also recommended.

Software requirements

When using Ultraspeech with the Terason T3000 ultrasound system

Operating system: Windows XP or Windows 7 32bits ONLY, since the Terason T3000 ultrasound system is NOT compatible with 64bits operating.
Drivers for the Terason T3000 + Terason Ultrasound software version 4.6.2 (this release seems to be the only one compatible with the Terason SDK which is used by Ultraspeech – contact Terason to get this release).

When using Ultraspeech with the Telemed Echoblaster ultrasound system

Operating system: Windows XP or Windows 7 32bits or 64bits (in that case use the 64bits installer). However, note that Ultraspeech is developed and tested mainly on 32bits platforms. All functionalities have not been validated on 64bits OS.
Drivers for the Telemed EchoBlaster and Telemed Echo software (version 3.4 or later).

Installation instructions

First of all, make sure that your ultrasound system (Terason or Telemed), and optionally your camera and your audio system, works correctly “outside” Ultraspeech . and that your user account as administrator privileges (note that Terason system can not be used without admin privileges).

The automatic installer software provided for Ultraspeech 1.2 and 1.3 should simplify the installation process (compared to previous releases).

Download 32bit or 64bit installer depending on your operating system (note that Terason T3000 ultrasound system is NOT compatible with 64bits system, as mentioned above). Launch installer with administrator privileges (in Win 7, do a right-click followed by “Run as administrator”), and follow installation instructions. All the installation process (including the registering of third party components (ActiveX controls) should be done automatically.
Start Ultraspeech by double-clicking on the Ultraspeech icon located on your desktop. Ultraspeech should start by saying that your license number is incorrect. It’s normal . Please send the identification code displayed in the pop-up window to support@ultraspeech.com. I’ll send you your license number back. Edit the license file named “license.dat” located in your “Ultraspeech 1.X” directory (typically “C:\Program Files\Ultraspeech 1.X\” or C:\Program Files x86\Ultraspeech 1.X\) and update the default license code. If license.dat is in read-only mode, first modify system permissions with a right-click on licence.dat, then “Security tab”, “Edit”, then click on your user name (upper part of the window), and give full control to a specific user by updating the checkboxes at the bottom part.
Start Ultraspeech again by double-clicking on the Ultraspeech icon on your desktop. Ultraspeech should now start

In some cases, the automatic registering of ActiveX components fails during the installation process. If this happens, you will have to register these components manually:

Open MS command prompt by first clicking on the Windows Start menu and selecting “Run”. Type “cmd.exe” (without quotes) into the Run box and click “OK” ( for windows 7, command prompt must be run as administrator).
Navigate to “C:\Program Files\Ultraspeech 1.X\com\terason\” directory (using the “cd” command).
Type then “regsvr32.exe TTAutomate.ocx” (without quotes), press return. A pop-up window saying that an ActiveX control has correctly been registered should be displayed. Repeat the registration process with other .ocx files in the same directory and with the icimagingcontrol.ocx file which is located in “C:\Program Files\Ultraspeech 1.X\com\imagingsource\”.
Try to start Ultraspeech 1.X again by double-clicking on the Ultraspeech icon on your desktop.

If it still does not work, send an e-mail to support@ultraspeech.com.

Understanding Ultraspeech

The main feature of the Ultraspeech software is the synchronous recording of ultrasound and video image streams with the audio signal. Data recording is triggered by simply clicking on a start/stop button. Internally, Ultraspeech uses multiple FIFO buffers to access image data. Thanks to multithreading programming techniques, all streams are processed in parallel. Streams share the same multimedia timer so that each frame and each audio buffer can be tagged with the timer value during the recording. Any initial asynchrony between streams is captured during the acquisition, and synchrony is restored automatically in an automatic post-processing stage (software-based synchronization mechanism).

Ultraspeech provides a convenient tool for large database recording called “Database recording assistant”. This tool provides an automatic file naming system and the automatic display of the text stimuli for each item to record (i.e the word or sentence to pronounce).

Ultraspeech User Interface

Ultraspeech graphical user interface consists in a menubar, a main window and a set of “child” windows.

Main Ultraspeech Menu

General:

Project Manager: Open project manager
Calibration toolbox: Open calibration toolbox
Database recording assistant: Open … database recording assistant
Expand Archive files: uncompress archives files (with extension .arc) that are created when recording with the “Store images in archive files” option checked (see the Project Manager window section).
Quit … guess what !

Ultrasound

Start ultrasound: Init Terason T3000 system and start transmitting the ultrasound images (this could take several seconds).
Stop ultrasound: stop transmitting the ultrasound images and shut down Terason T3000 system.
Ultrasound settings: Open Ultrasound Settings window.

Camera

Start/stop camera: Enables/disables live display in the control window.
Camera source: Open camera source window in which you can select a particular imaging device, the video format and the frame rate.
Camera settings: Open camera settings window in which you can adjust image settings such as brightness, contrast, saturation, etc.

Audio

Audio settings: Open audio settings window.

Main Ultraspeech Window

The main components of the main Ultraspeech window are:

the ultrasound image display window, in which the ultrasound images are displayed.
the video image display window, in which the video images are displayed.
different boxes below each image display window which indicate the name of the imaging device, the frame rate and the image size.
the “stimuli display zone”, in which text material can be displayed (i.e the sentence to pronounce).
the “console” in which all the messages given by Ultraspeech are displayed.
3 checkbox buttons (“Record Ultrasound”/“Record Video”/ “Record Audio”). If checked, the corresponding stream will be recorded (if unchecked, the stream is just displayed).
the main record button (named “REC”), which triggers and stops the recording of the different image/audio streams.

Project manager

Capture directory: Specify here the directory in which the recorded data will be stored. When Ultraspeech starts, a directory named “Ultraspeech Capture” is created automatically in the “My Documents” directory of the system user. This directory is used by default. Choose another directory by clicking on the “Browse” button. To empty the capture directory (i.e erasing definitely all the data in it), use the “Clean” button.
Basename: Indicate here the prefix used for naming the data (the default value is Thomas ).
Use database recording assistant: Use the database recording assistant tool for file naming (see below).
RAM-to-disk vs Direct-to-disk: Ultraspeech provides two modes for data recording: RAM-to-Disk (default) and Direct-to-disk. In the RAM-to-disk mode, an internal buffer is pre-allocated in the RAM memory (Random Access Memory) when the recording starts. During recordings, available data are stored in this buffer. The content of the internal buffer is finally transferred onto the hard drive when the recording is stopped. In the Direct-to-disk mode, data are immediately transferred on the hard drive as soon as they become available (i.e a bitmap is written on the hard drive as soon as an ultrasound and/or video frame is available). The Direct-to-disk mode is designed for long recordings (>20 seconds). However, since hard drive average access time is long (much longer compared to the RAM memory access time), many frames can be skipped at acquisition in this mode. Thus, in its actual implementation, this mode is not very efficient and should be used with caution. RAM-to-disk mode is strongly recommended.
Store images in archive file: This option is available only when the RAM-to-disk recording mode is active. If checked, all recorded images are transferred onto the hard drive in one “archive” file (“.arc”). If unchecked, all recorded images are directly converted into a collection of bitmaps. Using archive files is convenient when recording a large set of utterances. Since writing one big files requires less time than writing a lot of small files, the time needed to transfer the data between two acquisitions is reduced. To uncompress all the archives files contained in the capture directory, click on “General>Expand all archives file” in Ultraspeech main menu.
Stream post-sync: This option is available only when the RAM-to-disk recording mode is active. If checked, streams are post-synchronised before transferring the data on the hard drive (see Understand Ultraspeech section). Checking this option (checked by default) is recommended.
Max recording length: This option is available only when the RAM-to-disk recording mode is active. Specify here the maximum length (in seconds) of your acquisition. This value is required to calculate the size of the internal buffer used in the RAM-to-disk recording mode. The maximum value depends on the amount of RAM memory available on your computer.
Ultrasound / Camera FIFO size: The queue order mode of both ultrasound and video image frame stream are FIFO buffer (first-in first-out). Specify here the size of these buffers (the default value 5 should be sufficient).
Live display while recording: If checked, the ultrasound & video streams (if active) are displayed during the recording. If uncheck, image streams are freezed during recording. Note that displaying the image streams during recording increases CPU load.

After changing a setting, press the “Apply” button to valid.

Ultrasound settings (available for Terason T3000 only)

Preset: Name of the preset used to display ultrasound stream. This should be created in Terason Ultrasound software.
“Terason software” button: Display/hide Terason Ultrasound software (The Terason Ultrasound software application is used as a server of live ultrasound image frames by Ultraspeech). Displaying Terason Ultrasound software could be useful to adjust image settings which are not available directly in Ultraspeech (such as the gain, the depth, etc.).
Image Size: Display the current image size (which can not be modified in the current release ).
Image Orientation: Flip ultrasound image right/left or up/down.
Requested FPS: Set the B-mode image frame rate (in frame per seconds).
Transmit-receive additionnal delay: the requested duration of the transmit and receive sequence for each frame in milliseconds (ms). The transmit-receive sequence duration is extended beyond the minimum required time by inserting waiting periods at the end of each scan line.The amount of delay inserted at the end of each scan line versus the end of the frame can be regulated with the txRxDuration parameter. Delay at the end of each scan line is useful for reducing secondary reflection. However, the trade-off is that the timing of each scan line is skewed over a longer period of time within the frame, potentially resulting in greater motion artifacts. The txRxDuration parameter allows the client to optimize the delay for a particular application.
Minimum latency: Boolean selection of the default minimum latency buffering scheme or a FIFO buffering scheme for newly acquired, unprocessed, frames. This parameter allows Ultraspeech to turn off minimum latency buffering to protect against data loss (frames skipped at acquisition). When minimum latency is disabled, up to 30 raw frames can be internally buffered in a FIFO. At 20 fps, 1.5 seconds of data can be buffered from transient reductions in CPU processing power. However, this mechanism is only useful when used in conjunction with regulation of the frame repetition rate to prevent the data processing from being chronically slower than the acquisition rate. Note that turning off minimum latency buffering causes the system to use up to approximately 15MB more RAM.
Pixel Size/Direction/Origin: Get the size of each pixel in the x/y-direction. The spatial position of any pixel in the image relative to the transducer head can easily be calculated as follows: PX = OX + NX * SX * DX and PY = OY + NY * SY * DY
Mechanical index and thermal indices: For a safe use of ultrasound, MI should be 0, and TI-Soft tissue, TI-bone, and TI-cranial should be -1. If not, stop immediately !

After changing a setting, press the “Apply” button to valid.

Camera source/settings

Camera source

This dialog box can be used to select and set up a device, adjusting the frame rate and the video format. Ultraspeech 1.1 has been tested with the Imaging source DFK 21BU04, using Y800 as video format (black-and-white), and 60 fps as framerate.

Camera settings

This dialog box can be used to set up classic video camera properties, such as Hue, Saturation, white balance, brightness, gain, and exposure. Note that Ultraspeech 1.1 does not support external trigger mode.

Audio settings

When starting, Ultraspeech queries the available audio device capabilities. If an ASIO driver is available, Ultraspeech will use it. If possible, Ultraspeech will display only the format/resolution/sampling rate which are natively supported by the device.

Device: Select the audio device (Ultraspeech has been tested with the RME Fireface 800 and some internal soundcards).
Sampling rate: Select the sampling rate (ex: 44 100 Hz).
Resolution: Select the resolution (ex: 32 bits)
First channel index/Nb channels: Select the index of the first channel and the number of channel. For example, if the soundcard has 8 audio inputs, choose 4/3 to record channels 4, 5, and 6.
Buffer size: Indicate the desired internal buffer size in sample frames.

Calibration toolbox

Ultraspeech provides an inter-session re-calibration mechanism to position the probe and the camera (related to the speaker’s head) at a reference position. This tools is useful when recording data in multiple acquisition sessions spaced in time. The procedure is based on real-time averaging of a live image with a target reference image. During this interactive re-calibration procedure, the subject adjusts the position of his/her head in order to fit to the target reference position. A similar procedure is used for ultrasound, where the live tongue image is super-imposed on a target reference.

First, capture or load a reference ultrasound/video image. Then, press Calibrate Ultrasound/Video to start the interactive calibration procedure. Adjust the probe/camera position in order to make the image as less fuzzy as possible.

Database recording assistant

Ultraspeech provides a tool for the recording of large databases, called “database recording assistant”. This tool provides an automatic file naming system and the automatic display of the word or sentence to pronounce (this is useful when the main computer screen is cloned on second screen placed in front of the subject). To use this assistant, format your text corpus in a text file so that each line contains one item (a vcv, a word, a sentence, etc.). Load this textfile in Ultraspeech by clicking on the “Load Text File” button.

The database recording assistant is able to parse corpus composed of the same set of items, repeated several time. Example: if you want to record 2 times the same set of 3 VCV {apa, ada, aka), format your corpus like this:
apa ada aka apa ada aka
The assistant will recognize automatically that the corpus is made of “2 lists” of “3 sentences” (if the automatic text parsing procedure failed, you can specify the correct number of sentence per list). You can then navigate among the lists and sentences. Recorded images and audio corresponding to list X and sentence Y will be stored in the “Capture Directory”/*_capture/Basename_listX_sentY/“ directory (or * is either “us”, “video”, or “audio”). Audio wavefile name will be “Basename_listX_sentY.wav”. A text file (named “Basename_listX_sentY.txt”) containing the corresponding sentence will be created next to the audio file. Image archive data files will be named “Basename_listX_sentY.arc” (bitmaps will be named “Basename_listX_sentY_timestamp_number.bmp”). In order to be able to move backward during the data acquisition and re-record a specific utterance, the use of archives files (for image data storage) is highly recommended.

Getting started with Ultraspeech

Configure your ultrasound system for tongue imaging

First, create your preset outside Ultraspeech using the Terason Ultrasound software (called an Exam) or the Telemed Echo software. It is difficult to tell which configuration is optimal, since it depends on many things (the subjects, your application, etc.). Besides, in Ultrasound imaging, there is always a tradeoff between image quality, size, depth and frame rate.

As an example, here are the settings I use for my own research (for midsagittal images):

B-mode vizualisation (of course )
Depth (5 cm for children, 7 cm for adults)
Frequency (VL for Terason, 5Mhz for Telemed, since I often want to get a high framerate)
1 focal point (around 5 cm to focus on the upper surface of the tongue).
Disable image averaging (for optimal time sync, set 0)
Disable smoothing (since I prefer to do it myself in Matlab, using Ultramat)
Map “D” for Terason
Scan width: as wide as possible (it depends on your probe).

Typical use

This section describes how to use Ultraspeech to record ultrasound, video and audio, using the default options.

Start ultraspeech
Start ultrasound (general menu: Ultrasound>Start Ultrasound).
Set camera source (general menu: Camera>Camera Source) and adjust video format and frame rate. (This have to be done only one time).
Start camera (general menu: Camera>Start Camera)

Ultraspeech is now ready

Press REC button (in the main window) to start the recording.
Press REC button to stop the recording.
Expand recorded image archive files (general menu: General>Expand archives file).

Ultrasound data should be available in My Document/Ultraspeech Capture/us_capture/Thomas_X_Y.bmp (X and Y are respectively the timestamp and a arbitrary image index)
Video data should be available in My Document/Ultraspeech Capture/video_capture/Thomas_X_Y.bmp.
Audio wave file should be available in My Document/Ultraspeech Capture/audio_capture/Thomas.wav

If the “stream post-sync” option is checked (see section on Project manager), the position in time of an ultrasound (or a video) frame is contained in its filename. Example: If the ultrasound stream is sampled at 60 fps, the image named Thomas_0432_22.bmp, the frame has been grabbed at time 432 ms, and should correspond to the audio signal between 416 ms and 432 ms (432-(1/60)).

Understanding the recording report

Ultraspeech generates a “recording report” for each acquisition. This report is displayed in the console as soon as the recording is stopped and is stored in the Capture directory. This report provides many information. Among them, two have to be checked carefully:

The hardware latency, i.e the delay between the instant the recording is triggered and the instant the first ultrasound/video image or the first audio buffer become available. Example: “Audio initial delay 8 ms”. A typical value should be between 0 or 1 frames for the visual streams and between 0 to 20 ms for the audio stream (it depends on your soundcard).
The number of “dropped frame”, i.e the number of frames which could have been skipped at acquisition.
Example: “Warning – Gap detected in the timeline 79→80 (25 ms>18 ms) – Synchrony may have been lost”
Recording simultaneously and synchronously 2 image streams at a high framerate, in addition to multi-channel audio signal, is a challenging task (in terms of programming complexity, CPU load, etc.). Besides, MS Windows XP, which is the only operating system compatible with the Terason T3000, is not adapted to “real-time” constraints. For these reasons, Ultraspeech is sometimes unable to deal correctly with too many interruptions occurring at the same time, and a frame is skipped. To detect such events, Ultraspeech looks for “gaps” in the timeline of the recorded data, i.e delays between consecutive images which are higher than the theoretical delay (which is calculated from the frame rate). A warning message is prompted when a gap is detected. If a gap is detected, it is highly recommended to redo the acquisition since the stream synchrony may have been lost.

Checking stream synchronisation

As soon as you use Ultraspeech with a new hardware configuration (PC, soundcard, camera, etc.) or if you update software components of your system (service pack, drivers or firmware update), it is extremely important to check the stream synchrony. For now, Ultraspeech has only be tested on a very limited set of PC, soundcards, camera, etc. so it is NOT impossible that it behaves strangely when used with some unseen hardware configurations (especially soundcards).

Ultraspeech synchronisation checking procedure

Here is a simple protocol that you can use to test the synchronization of the different streams. A hammer is used to tap a bottle of ultrasound gel (fig 4, a and b), ejecting a droplet onto the probe ( c ). The droplet shows up immediately on the probe (e) and should be synchronized with video of the hammer (d) hitting the bottle and the sound of this contact (f). Stream synchrony can be observed in the figure on the left, where a 71 fps ultrasound stream is displayed with a 60 fps video stream and the audio signal on the same time scale.

When checking the synchrony, take into account only the timestamp contained in image filename (Example: in Thomas_list5_sent3_0432_32.bmp, the timestamp is 432 ms). Do not take into account the sequence number. Also, the first N ultrasound frames of a sequence, where N is the size of the ultrasound FIFO buffer, can have the same timestamp. Just ignore this frames.

This synchronization check procedure has been tested on a variety of different ultrasound probes (Terason 5MC2, 8MC3, 8MC4). The ultrasound and video streams are found always to be synchronized. A residual delay occasionally observed between visual (ultrasound and video) and audio is approximately the length of the inter-frame gap (i.e 15 ms at 60 fps).