Description of steps to reproduce the results of the Brown corpus evaluation.
Note: The current open-source version of TextSegFault does not implement all extensions described in the paper (no dispersion based stop word filtering, no dynamic block size and no top segment selection). Therefore the results are slightly worse than reported:
Default | 0.11 | 0.10 | 0.10 | 0.13 |
Number of segments known | 0.11 | 0.09 | 0.08 | 0.06 |
Download and extract the version 1.2 of the C99 algorithm from the homepage of the author.
Create a shell script name TextSegFault in the bin directory. It should contain the commands:
#!/bin/bash java -cp $PATH_TO_TEXTSEGFAULT_JAR net.sourceforge.textsegfault.TextSegFault $@
Add lines to execute the TextSegFault in ebin/public.batch:
public.testAlgorithm $DATASET "43" "TextSegFault" public.testAlgorithm $DATASET "44" "TextSegFault -n 10"
Add the bin directory to the PATH and to the CLASSPATH, remove all old results with find ../data -name TestLog*.txt -exec rm \;. If you get errors in the file private.testOneCase, try to add /usr/bin/ in front of the time command.
Run the tests via public.BatchAll and create the summary with public.SummaryAll.