K - The type of the keysV - The type of the vaues
public class KMeansSampler<K,V>
extends java.lang.Object
| Constructor and Description |
|---|
KMeansSampler(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass, java.lang.Class<? extends RecordInfoProvider<K,V>> riProviderClass, int dimensions)
Creates a new instance
|
| Modifier and Type | Method and Description |
|---|---|
double[] |
sampleClusterCenters(int k, int maxSampleRecords, int maxSamplePartitions, org.apache.hadoop.mapred.JobConf conf)
Samples the input data and return the k initial cluster center points
|
double[] |
sampleClusterCenters(int k, org.apache.hadoop.mapred.JobConf conf)
Samples the input data and return the k initial cluster center points
|
void |
writeClusterCenters(org.apache.hadoop.fs.Path outFile, int k, int maxSampleRecords, int maxSamplePartitions, org.apache.hadoop.mapred.JobConf conf)
Samples the input data to find the initial cluster centers and write them to the given file
|
public KMeansSampler(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat<K,V>> inputFormatClass, java.lang.Class<? extends RecordInfoProvider<K,V>> riProviderClass, int dimensions)
inputFormatClass - InputFormat used to read the datariProviderClass - RecordInfoProvider implementation used to extract spatial information from the recordsdimensions -
public double[] sampleClusterCenters(int k,
org.apache.hadoop.mapred.JobConf conf)
throws java.lang.Exception
k - the number of initial clustersconf - the job configurationjava.lang.Exception
public double[] sampleClusterCenters(int k,
int maxSampleRecords,
int maxSamplePartitions,
org.apache.hadoop.mapred.JobConf conf)
throws java.lang.Exception
k - the number of initial clustersmaxSampleRecords - the maximum number of records that should be sampledmaxSamplePartitions - the maximum number of partitions to be sampledconf - the job configurationjava.lang.Exception
public void writeClusterCenters(org.apache.hadoop.fs.Path outFile,
int k,
int maxSampleRecords,
int maxSamplePartitions,
org.apache.hadoop.mapred.JobConf conf)
throws java.lang.Exception
outFile - the path where the initial clusters will be written tok - the number of initial clustersmaxSampleRecords - the maximum number of records that should be sampledmaxSamplePartitions - the maximum number of partitions to be sampledconf - the job configurationjava.lang.Exception