Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering

Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces.


General Information
A challenging issue in computational neurosciences is the development of algorithms to detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting of spikes still remains to be solved.
In answer to this problem, this novel software tool, FSPS TM (Fuzzy SPike Sorting), has been designed to optimize fast and accurate detection, off-line sorting and on-line classification of neuronal spikes with little or no human intervention.
The FSPS TM algorithm is based on a combination of the following main methods:

Terms & Conditions Agreement
FSPS TM is a computer programme for the fast and accurate automatic isolation of neurons during electrophysiological recordings; it can be used for signal acquisition, visualization, classification and analysis of neuronal spikes. For more information regarding the current version of FSPS TM please e-mail: info@spikesorting.com.
Prior to download and use, the user acknowledges that: 1. FSPS™ is distributed under Creative Commons Public License (CCPL BY-NC-ND) and can be used for academic applications providing they properly reference our work in any publication that uses results generated by FSPS™ software. Under this CCPL you are free: to Share -to copy, distribute and transmit the work -Permissions beyond the scope of this public license are available at www.spikesorting.com.
Under the following conditions: -You must attribute this work referring to the authors (Oliynyk A., Bonifazzi C., Montani F. and Fadiga L.) and citing the source URL.
-You may not use this work for commercial purposes*.
-You may not alter, transform, or build upon this work.  TO THE FULL EXTENT PERMITTED BY APPLICABLE  LAW, THE LICENSEE RECOGNIZES AND ACCEPTS THAT: A) THE SOFTWARE IS  SUPPLIED "AS IS," AND THE AUTHOR DOES NOT GUARANTEE THAT THE  SOFTWARE IS FREE OF BUGS, FLAWS OR DEFECTS; B) THE AUTHOR DOES NOT  GUARANTEE THAT THE SOFTWARE MEETS THE SPECIFIC NEEDS OF THE USER; AND C) THE AUTHOR EXPRESSLY DISCLAIMS ANY RESPONSIBILITY FOR ANY

Installation
The installation process is performed as follows: 1.

2.
3. To use FSPS TM with a NI DAQ, connect the latter to the computer and configure your NI DAQ device as specified in the relevant manual before running the FSPS TM programme.
As soon as the programme starts, configure your NI DAQ board. To do this, activate the DAQ control button at the DAQ tab (Control Tab 1) and set up I/O parameters against your particular board configuration (Fig. 1a).

/Figure 1a/
Press Preview button to view the signal at the selected channel in separate window (Fig. 1b). / Figure 1b (Fig. 1c). Thus, further operations and analysis will be performed on existing recordings selected in the Source tab. Signal THR -the voltage threshold level for the spike detection algorithm; Batch № -the number of the data chunk that is currently being processed; SPK found -the total number of spikes detected in the data set being processed; Hist THR -the level to threshold norm histogram (needed to determine the number of classes); Digitalization rate -is an important parameter indicating the number of samples per second of your acquired data. Since this parameter influences many algorithms of data analysis and visualization, we recommend paying particular attention to it. Peak width -specifies the number of consecutive data points used in the least-squares fit for finding peaks in the norm histogram; Sensitivity -an important parameter for adjusting the construction of the norm histogram, thereby influencing the number of clusters determined. Tolerated range is 1.8-3.3 (Default is 2.6, the most common for various types of signals and acquisitions); SPK show -specifies the number of spike shapes to be visualized in each window of the Shapes Tab indicator; Noise detect -indicates if the detection of noisy spike shapes based on Mahalanobis distance evaluation is On or Off. 7 -Control Tab 2 (FCM status and performance information); see Chapter 2.2 for details. 8 -Information panel for number of clusters and features used. 9 -Control button TEST to RUN clustering/classification algorithm.
10 -Control button Online to RUN online classifier.

-Control button Redraw to update several on-screen parameters .
12 -Save button performs screenshots capture of spike-sorting results (currently visualized area of Control Tab 3 is captured and automatically saved to the same disk space where data files are stored). 13 -Control button M-dist applies Mahalanobis Distance measurement to evaluate noisy spikes (actually, their PC scores) located in the multidimensional feature space, away from any class centre, using probability density (see p.14). 14 -Confidence interval for the Mahalanobis distance measurement. 15 -Group indicator of data processing. 16 -Control button to stop the program.

-Control Tab 3 (visualization and statistics); See chapter 2.3 for details
18 -Information about detected and accepted spikes. 19 -Squares at the top left corner of each graph are controls to change the colour of the class. 20 -Sliding bar to navigate "Raw signal graph" ("2"). 21 -The indicator of the cluster's quality (colour scale).

Chapter 2.1 (Control Tab 0, Continuous or Triggered acquisition)
Allows you to choose between continuous (this chapter) and triggered (Chapter 3) acquisition, depending on your needs and experimental design. Filtering Tab options: Filter (On/Off) is activating/deactivating software bandpass Butterworth filter. The default is Off.
-"high cutoff freq: Hz" is the high cut-off frequency in Hz. It must be greater than low cut-off frequency and fulfil the Nyquist criterion.
-"low cutoff freq: Hz" is the low cut-off frequency in Hz and must fulfil the Nyquist criterion. If low cut-off frequency is less than or equal to 0, or greater than half the sampling frequency value, the algorithm is not working. The low cut-off frequency must be less than the high cut-off frequency.
-"order" specifies the filter order and must be greater than 0. The default is 5. If the order is less than or equal to 0, the algorithm signals an error. -"Inverse signal" inverts the signal (ON); OFF, unchanged, is default.
Save Tab options: -Save clustering results control button allows the results of clustering/classification to be saved automatically (when ON). At the end of each run of clustering/classification algorithm (by pressing the TEST button) it saves a text file (*.tcs) in the same folder. The file contains: 1) Peak occurrence time (1 st value in each row); 2) Class label (2 nd value in each row) 3) 3 rd -the end of each row is an interpolated waveform of the spike Therefore, the number of rows corresponds to a number of extracted spikes, while the number of columns depends on the length of the waveform (consult the table above).
-Save PCs control button allows Principal Components to be saved automatically (when ON). At the end of each run of clustering/classification algorithm (by pressing the TEST button) it saves a text file (*.psvd) in the same folder. The file contains: 1) First column -Class label (2 nd value in each row) 2) Second column till the last one -Principal components (like PC1, PC2, PC3 … PCk).
-Export raw data control button allows to save your current channel (visualized in "Raw signal graph", p.3 at the Figure 2) in separate .txt file.

Source Tab options:
Folder path (use dialog) is a path control that allows you to select a file/folder with data. Before selecting the folder you have to select the file type from drop down menu. At the moment there are 6 different formats available: a) Binary (GE), a stream of unsigned 16-bit integers (U16) having little-endian byte order. There is no header information. b) Text (UD, TXT) for text files as follows: X,Y.YY, where X -is a progressive number of samples; ,-to delimit fields in the text file Y.YY -is a value in which the dot is used as the decimal separator (see example below): 1,-5.20 2,-7.40 3,-6.97 4,-4.57 5,-1.67 6,-1.05 etc.
c) Triggered (FE) reads data from the 16-bit Integer file, which contains 12 rows corresponding to the triggered acquisition trials. d) Binary (MAP) reads .map files generated by the AlphaMap software from Alpha Omega (Nazareth, Israel). This converter is experimental. e) Simulated (Qui) -another kind of TEXT file to import data saved in engineering format as follows: 1. The number of channels (depends on how many input physical channels are selected in the "I channel" control).
2. The number of samples in one data chunk for each channel (1000 is a default).
3. The digitalization rate as samples/s (10000 is a default). 4. The number of reserved U8 lines (the default is 0). At the moment, the rest 1000 I64 values reserved for multi-electrode array and brain-computer interface (in progress).
After the header comes a series of chunks. A chunk consists of four parts: -length (integer I32); -eight U8 values (can be "0" or "1") for digital events (if any); -I32 integer of millisecond timer value that has been split into its component bytes (four U8 values).
-an array of single-precision (SGL) elements of measured data (multiple of a number of samples in one data chunk for each channel, see above).
* -if you need just export the data recorded with FSPS™ software into your software you may use "Export raw data" at the Save Tab. It saves your current channel in the .txt file without the need to build your own file converter.
-READ DATA -there are two additional buttons that allow a specific part of the binary file (designed primarily for long GE files) to be read, allowing it to be processed in batches. The number of such batches is counted automatically when this option is selected. For triggered (FE) data the same buttons allow solely a specific trial to be read.
Note that for batch processing of long files it is necessary to switch OFF all automatic options at the Main Tab to maintain the same parameters for all parts of the file. The report (if you want generate and save it using Save Tab) will show information from these batches as one continuous file.

Chapter 2.3 (Control Tab 2, FCM status and performance information)
Norm Tab

Chapter 2.4 (Control Tab 3, Visualization and Statistics)
Shapes Tab contents: The number of spikes shown for each cluster can be adjusted using the tool SPK show, found on control window "6" (see Figure 2); the default is 50. The white spike profile is an average of that class. AUTO button "a" auto-scales (X and Y axes) the first window (shapes marked in blue in our specific case). MAX button "b" performs the same operation as "a", but takes into account the highest amplitude found in any of the classes shown. Clicking ALL button "c" applies the scales and ranges selected for the first window to windows 2-12. You can also adjust Y scale parameters manually by entering its numerical values in the first window (click ALL button "c" to apply it to all of the other windows). Command group "d" allows you to change the order of the classes manually (by clicking on button Change) or merge one class with another (by clicking on the button Merge clusters). To return to the initial cluster and order, switch OFF green LED located to the right of the Merge clusters switch. Tab contents: PSTH histograms are particularly useful in triggered trials but can also be used in continuous acquisitions. Histogram length and height can be set up automatically, when "a" and "e" controls are "ON" (default settings), or manually (controls "a" and "e").
Use Switch 2 (see Figure 2) to change X scale between samples or seconds. Both X and Y scales can be set manually.
To change the duration of the bin, select the appropriate "# bin" parameter (control "b").
ALL button "c" extends the first graph's settings to the other graphs displayed, as described above for the Shapes Tab.
Detailed information about the contents of each bin for each histogram can be obtained by pressing the Excel button ("d"). This exports the information to Excel tables, one for each class. Note: Microsoft Excel must be installed for this to function.

a c d e b
Rasters contents: Displays Raster plots for each detected class and trail. Button "b" switches ON and OFF vertical grid. Grid colour can be chosen by clicking on "a". ALL button "c" function is the same as described above for the Shapes Tab.

Quality Measures Tab contents:
Any kind of clustering or classification needs an objective measure of its quality the most conventional indices associated with FCM, including partition coefficient, partition entropy and proportion exponent, are incorporated in the Toolbox, although our tests show that the index L ratio is superior and so is the default quality measure. In the vast majority of cases, values below 5 (threshold) means acceptable cluster quality. Values greater than 5 are usually associated with poor clusters (with reference to their compactness and distance from other clusters in the dataset).
Exact L ratio values for each cluster are shown in the graph legend. The button Smart update is experimental. It allows tracking of clusters in the feature space when parameters of spike waveform is changing due to some deterministic (non-noise) influences (e.g. small periodic displacements of the location of electrical activity relative to the microelectrode, systematic reduction of the amplitude of an action potential when the cell fires in a sustained high-frequency burst, systematic slow drift of electrode etc). The same as for Continuous Acquisition

Chapter 3.1 (Control Tab 0, Continuous or Triggered acquisition)
Allows you to choose between continuous (Chapter 2) and triggered (this chapter) acquisition depending on your needs and experimental design.

Chapter 3.2 (Control Tab 1, hardware info & settings)
PSTH internal tab (hardware-based) Commonly, electrophysiological recording systems contain an option to perform spike isolation using hardware voltage discriminator. This makes it possible to roughly characterize neuronal activity and its specificity instantly and during recording process. This function is also offered by this software, in parallel with the Fuzzy Spike Sorting procedure. Please note, that to acquire the signal from your window discriminator hardware (in our case it is channel 2), you need to reserve a separate I/O channel at your DAQ . "A" can be used to switch auto scale for PSTH graph ON (green) or OFF (dark).

Events internal tab (supplemental TTLs)
The average duration (MEAN ± SEM) of each external event occurring during execution of the task is given at this Tab.
Indeed, during your electrophysiological study (single unit recordings, or whatever), you may need to control some other external parameters, such as the duration of button or sensor holding, precise touch timing and/or kind of kinematic data etc., This additional information could help the experimenter to control task execution or to align recording neuronal activity to one of these external events.
The current version of the software supports up to different 6 TTL channels (thus 6 events); however their number depends mainly on the type of DAQ board utilized.
One example of their application in an example experimental task is shown below. The task in question was the execution of goal-directed movement by the left hand of the monkey during recording from premotor cortex (area F5) of the right cerebral hemisphere.
The duration of the trial was 3s. In this time the monkey reached out his hand to the handle of the small door of the box, opened it and took the food as a reward. The Figure shows, as well as the neuronal activity recorded by the electrode, the external events characterizing this behavioural task: "1" (green line) -the hand of the monkey is still in its initial position; "2" (pink) -the hand crosses the infra-red barrier in front of the door; "3" (upper yellow) -touching the upper sensor of the handle; "4" (lower yellow) -touching the lower sensor of the handle; "5" (red) -main trigger, contact with both the upper and lower sensors of the handle; "6" (blue) -taking the food from the box.

Trigger internal tab (hardware settings)
Before starting acquisition, use this Tab to define desired digitalization settings and trigger parameters.

1
-Control group for Input/Output parameters 2 -Classification control button to switch classification algorithm "ON" and "OFF" . "ON" is default value.

3
-Waveform control button to switch visualization of waveforms in indicators 12 and 13 "ON" and "OFF". Switching "OFF" may slightly improve overall classification performance on some slow computers. "ON" is default value. 4 -Sound control button to switch "ON" and "OFF" sound generation and reproduction (both clicks or original signal, see "6") through line-out of the computer board. "OFF" is default value.

5
-Actual threshold level for spike detection algorithm.* 6 -Original/Classified control button consents reproduction of "Original" signal picked up by electrode, or "Clicks" pertaining to classified waveforms though lineout of computer sound board. "Classified" is default value. 7 -Signal Inversion switches the inversion of the input signal "ON" and "OFF" .* 8 -HDD streaming switches "ON" and "OFF" data storage to HDD of the computer with a filename.dat, indicated above, as a Flow Chart indicating single unit activity.

9
-Pull down menu for selection of the isolated unit to be acoustically reproduced and visualized in "12".

10
-Information panel displaying number of "Classes" and "Features" used.

11
STOP control button stops on-line monitoring and closes current window.

12
-Graph indicating spike shapes that successfully passed classification procedure and satisfied the parameters of class selected in "9".

13
-Graph indicating all spike shapes extracted at current time instant. 14 -Moving chart indicator displaying raw signal.

15
-Flow chart displaying current firing rate of isolated and classified single units.
Note: * -settings are determined during "TEST" procedure (offline) and transferred from interface window shown in Figure 2.

Chapter 4.1 (Input/Output parameters)
"I Channel" -The name and the number of DAQ Analogue Input channel connected to recording electrode. Its activity is visualized in the "Moving Chart" indicator (see Figure 4). By default, it is the same channel that has been used in "TEST" clustering procedure (Fig. 1a). "Rate" -digitalization rate of the analogue input signal. It must be the same as that used during the "TEST" clustering procedure. This parameter is set up automatically, but can be manually adjusted here if required (not recommended).
"sample/ch"-the number of samples to be acquired during each measurement session. For instance, at a digitalization rate of 10,000 samples/s, the acquisition of 500 samples will take 20 ms (default value). This can be adjusted to your specifications.

Firing rate Tab
This Flow Chart shows the firing rate of isolated single units. At the top part of this Tab there is an indicator showing the file name and the path for data storage when the HDD streaming button is "ON". Pressing the button New (top right) creates a new empty file. The filenames will feature progressively increasing numbers (\1.dat, \2.dat, \3.dat, etc.).

FSPS User Manual
Page: 32 of 35

UNIFE, Section of Human Physiology, 2012
STAT Tab The STAT Tab shows some additional information about the on-line classification module performance .
-"Base path" shows the path for the V.dat file with the cluster prototype for your current recording. It is updated each time when you press the TEST button (see Chapter 2, Figure 2). -"Spikes found" shows the number of spikes detected in the current time-frame and subjected to the automatic classification algorithm.
-"Spikes cancelled" shows the number of spikes in the current time-frame that have been cancelled by our false-positive-removing algorithm. -"Classification performance" -group indicators showing basic Fuzzy C-Mean parameters and quality measurements. -"Latency, ms" is a duration of each critical step of spikes classification in milliseconds.

Chapter 4.3 (HDD streaming)
While monitoring neuronal activity during experiments or clinical tests you may need to record some important fragments for future in-depth analysis. Pressing the button HDD streaming will start this process, saving the data in indicated file. To start a new file click the New button.
The number of your *.dat file will automatically increase.
You can select the source of your data to be saved depending using the Original/Classified (Figure 4) switch; "Original" saves raw signal data while "Classified" saves timestamps for isolated spikes only.