public abstract class FileInputFormat<K,V> extends InputFormat<K,V>
InputFormats.
FileInputFormat is the base class for all file-based
InputFormats. This provides a generic implementation of
getSplits(JobContext).
Subclasses of FileInputFormat can also override the
isSplitable(JobContext, Path) method to ensure input-files are
not split-up and are processed as a whole by Mappers.
| Constructor and Description |
|---|
FileInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
static void |
addInputPath(Job job,
org.apache.hadoop.fs.Path path)
Add a
Path to the list of inputs for the map-reduce job. |
static void |
addInputPaths(Job job,
java.lang.String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for
the map-reduce job.
|
protected long |
computeSplitSize(long blockSize,
long minSize,
long maxSize) |
protected int |
getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations,
long offset) |
protected long |
getFormatMinSplitSize()
Get the lower bound on split size imposed by the format.
|
static org.apache.hadoop.fs.PathFilter |
getInputPathFilter(JobContext context)
Get a PathFilter instance of the filter set for the input paths.
|
static org.apache.hadoop.fs.Path[] |
getInputPaths(JobContext context)
Get the list of input
Paths for the map-reduce job. |
static long |
getMaxSplitSize(JobContext context)
Get the maximum split size.
|
static long |
getMinSplitSize(JobContext job)
Get the minimum split size
|
java.util.List<InputSplit> |
getSplits(JobContext job)
Generate the list of files and make them into FileSplits.
|
protected boolean |
isSplitable(JobContext context,
org.apache.hadoop.fs.Path filename)
Is the given filename splitable? Usually, true, but if the file is
stream compressed, it will not be.
|
protected java.util.List<org.apache.hadoop.fs.FileStatus> |
listStatus(JobContext job)
List input directories.
|
static void |
setInputPathFilter(Job job,
java.lang.Class<? extends org.apache.hadoop.fs.PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job.
|
static void |
setInputPaths(Job job,
org.apache.hadoop.fs.Path... inputPaths)
Set the array of
Paths as the list of inputs
for the map-reduce job. |
static void |
setInputPaths(Job job,
java.lang.String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs
for the map-reduce job.
|
static void |
setMaxInputSplitSize(Job job,
long size)
Set the maximum split size
|
static void |
setMinInputSplitSize(Job job,
long size)
Set the minimum input split size
|
createRecordReaderprotected long getFormatMinSplitSize()
protected boolean isSplitable(JobContext context, org.apache.hadoop.fs.Path filename)
FileInputFormat implementations can override this and return
false to ensure that individual input files are never split-up
so that Mappers process entire files.context - the job contextfilename - the file name to checkpublic static void setInputPathFilter(Job job, java.lang.Class<? extends org.apache.hadoop.fs.PathFilter> filter)
job - the job to modifyfilter - the PathFilter class use for filtering the input paths.public static void setMinInputSplitSize(Job job, long size)
job - the job to modifysize - the minimum sizepublic static long getMinSplitSize(JobContext job)
job - the jobpublic static void setMaxInputSplitSize(Job job, long size)
job - the job to modifysize - the maximum split sizepublic static long getMaxSplitSize(JobContext context)
context - the job to look at.public static org.apache.hadoop.fs.PathFilter getInputPathFilter(JobContext context)
protected java.util.List<org.apache.hadoop.fs.FileStatus> listStatus(JobContext job) throws java.io.IOException
job - the job to list input paths forjava.io.IOException - if zero items.public java.util.List<InputSplit> getSplits(JobContext job) throws java.io.IOException
getSplits in class InputFormat<K,V>job - job configuration.InputSplits for the job.java.io.IOExceptionprotected long computeSplitSize(long blockSize,
long minSize,
long maxSize)
protected int getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations,
long offset)
public static void setInputPaths(Job job, java.lang.String commaSeparatedPaths) throws java.io.IOException
job - the jobcommaSeparatedPaths - Comma separated paths to be set as
the list of inputs for the map-reduce job.java.io.IOExceptionpublic static void addInputPaths(Job job, java.lang.String commaSeparatedPaths) throws java.io.IOException
job - The job to modifycommaSeparatedPaths - Comma separated paths to be added to
the list of inputs for the map-reduce job.java.io.IOExceptionpublic static void setInputPaths(Job job, org.apache.hadoop.fs.Path... inputPaths) throws java.io.IOException
Paths as the list of inputs
for the map-reduce job.job - The job to modifyinputPaths - the Paths of the input directories/files
for the map-reduce job.java.io.IOExceptionpublic static void addInputPath(Job job, org.apache.hadoop.fs.Path path) throws java.io.IOException
Path to the list of inputs for the map-reduce job.job - The Job to modifypath - Path to be added to the list of inputs for
the map-reduce job.java.io.IOExceptionpublic static org.apache.hadoop.fs.Path[] getInputPaths(JobContext context)
Paths for the map-reduce job.context - The jobPaths for the map-reduce job.Copyright © 2009 The Apache Software Foundation