An experimental comparison of MSVMpack with other implementations of M-SVMs is provided here for illustrative purposes. The aim is not to conclude on what type of M-SVM model is better, but rather to give an idea of the efficiency of the different implementations. The following implementations are considered:
The characteristics of the data sets used in the experiments are given in Table 4. The ImageSeg data set is taken from the UCI repository10. The USPS_500 data set is a subset of the USPS data provided with the MCSVM software. The Bio data set is provided with MSVMpack. The CB513_01 data set corresponds to the first split of the 5-fold cross validation procedure on the entire CB513 data set [3]. MSVMpack uses the original MNIST data11 with integer values in the range , while other implementations use scaled data downloaded from the LIBSVM homepage12. BSVM scaling tool is applied to the Letter data set downloaded from the UCI repository13 to obtain a normalized data set in floating-point data format.
All methods use the same kernel functions and hyperparameters as summarized in Table 4. In particular, for MCSVM, we used . When unspecified, the kernel hyperparameters are set to MSVMpack defaults. Also, SMSVM requires with larger values of to reach a similar training error, so is used. Other low level parameters (such as the size of the working set) are kept to each software defaults. These experiments are performed on a computer with 2 Xeon processors (with 4 cores each) at 2.53 GHz and 32 GB of memory running Linux (64-bit).
The results are shown in Tables 5-6. Though no definitive conclusion can be drawn from this small set of experiments, the following is often observed in practice:
Data set | #classes | #training | #test | data | Training parameters | |
---|---|---|---|---|---|---|
examples | examples | dim. | format | |||
ImageSeg | 7 | 210 | 2100 | 19 | double | , RBF kernel, |
data normalized | ||||||
USPS_500 | 10 | 500 | 500 | 256 | float | , RBF kernel, |
Bio | 4 | 12471 | 12471 | 68 | bit | , linear kernel |
CB513_01 | 3 | 65210 | 18909 | 260 | short | , RBF kernel |
CB513 | 3 | 84119 | 5-fold CV | 260 | short | , RBF kernel |
MNIST | 10 | 60000 | 10000 | 784 | short | , RBF kernel, |
double | data normalized, | |||||
Letter | 26 | 16000 | 4000 | 16 | float | , RBF kernel, |
data normalized | ||||||
Data set | M-SVM | Software | #SV | Training | Test | Training | Testing |
---|---|---|---|---|---|---|---|
model | error (%) | error (%) | time | time | |||
ImageSeg | WW | Spider | 113 | 5.24 | 11.05 | 11s | 3.3s |
BSVM | 98 | 2.38 | 9.81 | 0.1s | 0.1s | ||
MSVMpack | 192 | 0 | 10.33 | 1.2s | 0.1s | ||
CS | MCSVM | 97 | 2.86 | 8.86 | 0.1s | 0.1s | |
MSVMpack | 141 | 0 | 9.24 | 3.3s | 0.1s | ||
LLW | SMSVM | 210 | 0.48 | 7.67 | 15s | 0.1s | |
MSVMpack | 182 | 0 | 10.57 | 23s | 0.1s | ||
MSVM | MSVMpack | 199 | 0 | 10.48 | 4.5s | 0.1s | |
0=` height 1.5pt @axhline USPS_500 |
WW | Spider | 303 | 0.40 | 10.20 | 4m 19s | 0.2s |
BSVM | 170 | 0 | 10.00 | 0.2s | 0.1s | ||
MSVMpack | 385 | 0 | 10.40 | 2.5s | 0.1s | ||
CS | MCSVM | 284 | 0 | 9.80 | 0.5s | 0.3s | |
MSVMpack | 292 | 0 | 9.80 | 30s | 0.1s | ||
LLW | SMSVM | 500 | 0 | 12.00 | 5m 58s | 0.1s | |
MSVMpack | 494 | 0 | 11.40 | 1m 22s | 0.1s | ||
MSVM | MSVMpack | 500 | 0 | 12.00 | 22s | 0.1s | |
0=` height 1.5pt @axhline Bio |
WW | Spider | out | of | memory | ||
BSVM | 2200 | 6.23 | 6.23 | 11s | 4.6s | ||
MSVMpack | 4301 | 6.23 | 6.23 | 4m 00s | 0.5s | ||
CS | MCSVM | 1727 | 6.23 | 6.23 | 9s | 4.2s | |
MSVMpack | 3647 | 6.23 | 6.23 | 8s | 0.5s | ||
LLW | SMSVM | out | of | memory | |||
MSVMpack | 12467 | 6.23 | 6.23 | 2m 05s | 1.1s | ||
MSVM | MSVMpack | 12471 | 6.23 | 6.23 | 11m 44s | 0.9s | |
0=` height 1.5pt @axhline CB513_01 |
WW | Spider | out | of | memory | ||
BSVM | 39183 | 19.46 | 25.70 | 1h 41m 52s | 9m 02s | ||
MSVMpack | 42277 | 16.94 | 25.55 | 10m 31s | 26s | ||
CS | MCSVM | 41401 | 17.07 | 25.45 | 4h 12m 35s | 27m 11s | |
MSVMpack | 40198 | 16.93 | 25.41 | 4m 04s | 32s | ||
LLW | SMSVM | out | of | memory | |||
MSVMpack | 54313 | 22.14 | 27.65 | 11m 46s | 32s | ||
MSVM | MSVMpack | 62027 | 14.31 | 25.32 | 1h 08m 54s | 47s | |
0=` height 1.5pt @axhline MNIST |
WW | Spider | out | of | memory | ||
BSVM | 13572 | 0.078 | 1.46 | 4h 03m 29s | 2m 39s | ||
MSVMpack | 14771 | 0.015 | 1.41 | 2h 50m 29s | 15s | ||
CS | MCSVM | 13805 | 0.038 | 2.76 | 1h 54m 26s | 4m 09s | |
MSVMpack | 14408 | 0.032 | 1.44 | 49m 04s | 15s | ||
LLW | SMSVM | out | of | memory | |||
MSVMpack | 47906 | 0.282 | 1.57 | 4h 36m 45s | 45s | ||
MSVM | MSVMpack | 53773 | 0.027 | 1.51 | 10h 55m 10s | 55s | |
0=` height 1.5pt @axhline Letter |
WW | Spider | out | of | memory | ||
BSVM | 7460 | 4.30 | 5.85 | 6m 20s | 5s | ||
MSVMpack | 7725 | 3.14 | 4.75 | 24m 38s | 3s | ||
CS | MCSVM | 6310 | 2.03 | 3.90 | 2m 38s | 4s | |
MSVMpack | 7566 | 4.72 | 6.92 | 6m 54s | 3s | ||
LLW | SMSVM | out | of | memory | |||
MSVMpack | 16000 | 16.90 | 18.80 | 3h 56m 08s | 4s | ||
MSVM | MSVMpack | 16000 | 5.08 | 7.28 | (S) 48h 00m 00s | 3s |
M-SVM | Software | Cross validation | Training | Testing | Total |
---|---|---|---|---|---|
model | error | time | time | time | |
WW | BSVM | 23.96 % | 9h 48m 40s | 46m 29s | 10h 34m 50s |
MSVMpack 1.0 | 23.72 % | 1h 05m 11s | 1m 51s | 1h 07m 02s | |
MSVMpack 1.4 | 23.63 % | N/A | N/A | 21m 07s | |
CS | MCSVM | 23.55 % | 22h 52m 10s | 2h 08m 33s | 25h 00m 43s |
MSVMpack 1.0 | 23.63 % | 1h 00m 36s | 2m 06s | 1h 02m 42s | |
MSVMpack 1.4 | 23.55 % | N/A | N/A | 12m 05s | |
LLW | MSVMpack 1.0 | 25.65 % | 1h 14m 21s | 2m 33s | 1h 16m 54s |
MSVMpack 1.4 | 25.52 % | N/A | N/A | 30m 29s | |
MSVM | MSVMpack 1.0 | 23.47 % | 6h 44m 50s | 2m 49s | 6h 47m 39s |
MSVMpack 1.4 | 23.36 % | N/A | N/A | 2h 55m 05s |
The ability of MSVMpack to compensate for the lack of cache memory by extra computations in parallel is illustrated by Table 7 for the WW model type (which uses a random working set selection and thus typically needs to compute the entire kernel matrix). In particular, for the Bio data set, changing the size of the cache memory has little influence on the training time. Note that this is also due to the nature of the data which can be processed very efficiently by the linear kernel function for bit data format (see the dot_bit() function in kernel.c). For the CB513_01 data set, very large cache sizes allow the training time to be divided by 2 or 3. However, for smaller cache sizes (below 8 GB) the actual cache size has little influence on the training time.
Bio | |||||||
Cache memory size in MB | 10 | 60 | 120 | 300 | 600 | 1200 | |
and in % of the kernel matrix | |||||||
Training time | 5m 55s | 6m 00s | 6m 13s | 4m 29s | 3m 56s | 4m 00s | |
CB513_01 | |||||||
Cache memory size in MB | 1750 | 3500 | 8200 | 16500 | 24500 | 30000 | |
and in % of the kernel matrix | |||||||
Training time | 31m 49s | 30m 40s | 26m 06s | 23m 00s | 15m 52s | 10m 31s |
lauer 2014-07-03