public class DBInputFormat<T extends DBWritable> extends InputFormat<org.apache.hadoop.io.LongWritable,T> implements org.apache.hadoop.conf.Configurable
DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.
| Modifier and Type | Class and Description |
|---|---|
static class |
DBInputFormat.DBInputSplit
A InputSplit that spans a set of rows
|
static class |
DBInputFormat.NullDBWritable
A Class that does nothing, implementing DBWritable
|
| Constructor and Description |
|---|
DBInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
closeConnection() |
protected RecordReader<org.apache.hadoop.io.LongWritable,T> |
createDBRecordReader(DBInputFormat.DBInputSplit split,
org.apache.hadoop.conf.Configuration conf) |
RecordReader<org.apache.hadoop.io.LongWritable,T> |
createRecordReader(InputSplit split,
TaskAttemptContext context)
Create a record reader for a given split.
|
org.apache.hadoop.conf.Configuration |
getConf() |
java.sql.Connection |
getConnection() |
protected java.lang.String |
getCountQuery()
Returns the query for getting the total number of rows,
subclasses can override this for custom behaviour.
|
DBConfiguration |
getDBConf() |
java.lang.String |
getDBProductName() |
java.util.List<InputSplit> |
getSplits(JobContext job)
Logically split the set of input files for the job.
|
void |
setConf(org.apache.hadoop.conf.Configuration conf) |
static void |
setInput(Job job,
java.lang.Class<? extends DBWritable> inputClass,
java.lang.String inputQuery,
java.lang.String inputCountQuery)
Initializes the map-part of the job with the appropriate input settings.
|
static void |
setInput(Job job,
java.lang.Class<? extends DBWritable> inputClass,
java.lang.String tableName,
java.lang.String conditions,
java.lang.String orderBy,
java.lang.String... fieldNames)
Initializes the map-part of the job with the appropriate input settings.
|
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf in interface org.apache.hadoop.conf.Configurablepublic org.apache.hadoop.conf.Configuration getConf()
getConf in interface org.apache.hadoop.conf.Configurablepublic DBConfiguration getDBConf()
public java.sql.Connection getConnection()
public java.lang.String getDBProductName()
protected RecordReader<org.apache.hadoop.io.LongWritable,T> createDBRecordReader(DBInputFormat.DBInputSplit split, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOExceptionpublic RecordReader<org.apache.hadoop.io.LongWritable,T> createRecordReader(InputSplit split, TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
RecordReader.initialize(InputSplit, TaskAttemptContext) before
the split is used.createRecordReader in class InputFormat<org.apache.hadoop.io.LongWritable,T extends DBWritable>split - the split to be readcontext - the information about the taskjava.io.IOExceptionjava.lang.InterruptedExceptionpublic java.util.List<InputSplit> getSplits(JobContext job) throws java.io.IOException
Each InputSplit is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader to read the InputSplit.
getSplits in class InputFormat<org.apache.hadoop.io.LongWritable,T extends DBWritable>job - job configuration.InputSplits for the job.java.io.IOExceptionprotected java.lang.String getCountQuery()
public static void setInput(Job job, java.lang.Class<? extends DBWritable> inputClass, java.lang.String tableName, java.lang.String conditions, java.lang.String orderBy, java.lang.String... fieldNames)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the
Java object holding tuple fields.tableName - The table to read data fromconditions - The condition which to select data with,
eg. '(updated > 20070101 AND length > 0)'orderBy - the fieldNames in the orderBy clause.fieldNames - The field names in the tablesetInput(Job, Class, String, String)public static void setInput(Job job, java.lang.Class<? extends DBWritable> inputClass, java.lang.String inputQuery, java.lang.String inputCountQuery)
job - The map-reduce jobinputClass - the class object implementing DBWritable, which is the
Java object holding tuple fields.inputQuery - the input query to select fields. Example :
"SELECT f1, f2, f3 FROM Mytable ORDER BY f1"inputCountQuery - the input query that returns
the number of records in the table.
Example : "SELECT COUNT(f1) FROM Mytable"setInput(Job, Class, String, String, String, String...)protected void closeConnection()
Copyright © 2009 The Apache Software Foundation