|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.cmu.taghelpertools.middle_layer.util.SimpleFeatureSpaceBuilder
public class SimpleFeatureSpaceBuilder
SimpleFeatureSpaceBuilder is a simple implementation that provides methods for fulfilling some general needs for programmers to build feature space by supplying a list of texts or a .dic file. Example:
...
SimpleFeatureSpaceBuilder simpleBuilder = new SimpleFeatureSapceBuilder();
simpleBuilder.build(texts); //texts is a list
Iterator itr = simpleBuilder.getBinaryFeatureIterator(text);
...
Constructor Summary | |
---|---|
SimpleFeatureSpaceBuilder()
Instantiating a SimpleFeatureSapceBuilder object with all default options for feature selection: deafult feature selection options: punctuation, ungrams, bigrams, POS bigrams, line length, contains non-stopwords, remove rare features(threshold=5), remove storwords=true, stemming=true |
|
SimpleFeatureSpaceBuilder(java.lang.String options)
Instantiating a SimpleFeatureSapceBuilder object with customized feature selection options |
Method Summary | |
---|---|
java.util.ArrayList |
build(java.util.ArrayList<java.lang.String> texts)
Building an attribute space from the list of texts where attirbutes should be extracted from |
java.util.ArrayList |
build(java.lang.String dicFilePath)
Building an attribute space from a dic file |
java.util.Iterator |
getBinaryFeatureIterator()
Retrieve the binary dimensions of the attribute space |
java.util.Iterator |
getBinaryFeatureIterator(java.lang.String text)
Retrieve the binary attributes which are true for the given text |
java.util.ArrayList |
getBinaryFeatures()
Retrieve the binary dimensions of the attribute space |
java.util.Iterator |
getNumericFeatureIterator()
Retrieving a list of names of all existent numeric attributes |
java.util.ArrayList |
getNumericFeatures()
Retrieving a list of names of all existent numeric attributes |
java.util.ArrayList<java.lang.Double> |
getNumericFeatureValues(java.lang.String text,
java.util.ArrayList featureNames)
Retrieving values of numeric features for the given text. |
java.lang.String |
printoutOptions()
Print out the feature selection options and also return the string that has been printed |
void |
writeDicFile(java.lang.String dicFilePath)
Output the built attribute space to a dictionary file, which can be reloaded and reused later |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SimpleFeatureSpaceBuilder(java.lang.String options)
options
- -- a list of feature selection options separated by spaces
-stop --remove stopwords -stem --do stemming -rr --remove rare words -rt (integer value) --set the threshold of rare word removal to an integer value, specifying this option automatically turns on -rr -f {uni, bi, posbi, punc, ll, cns} --set whether to include following features: ungrams, bigrams, pos bigrams, punctuation, line length, and containing non-stopwords -lang {eng, ger, chi} --set the language of the dataset, default language setting can be specified in default_lang_setting.txt (English, Germany, or Chinese(extra module required; the Chinese module is licensed separately) )
-stop -stem -lang ger -f uni -f bi -f posbi
Also see edu.cmu.taghelpertools.middle_layer.Tester for a concrete example
public SimpleFeatureSpaceBuilder()
punctuation, ungrams, bigrams, POS bigrams, line length, contains non-stopwords, remove rare features(threshold=5), remove storwords=true, stemming=true
Method Detail |
---|
public java.util.ArrayList build(java.lang.String dicFilePath)
dicFilePath
- -- the path of the input dictionary file
public void writeDicFile(java.lang.String dicFilePath)
dicFilePath
- -- the path of the input dictionary filepublic java.lang.String printoutOptions()
public java.util.ArrayList build(java.util.ArrayList<java.lang.String> texts)
texts
- -- a list of string
public java.util.Iterator getBinaryFeatureIterator()
public java.util.ArrayList getBinaryFeatures()
public java.util.Iterator getBinaryFeatureIterator(java.lang.String text)
public java.util.Iterator getNumericFeatureIterator()
public java.util.ArrayList getNumericFeatures()
public java.util.ArrayList<java.lang.Double> getNumericFeatureValues(java.lang.String text, java.util.ArrayList featureNames)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |