ChunkRecordReader

java.lang.Object
- org.apache.hawq.pxf.plugins.hdfs.ChunkRecordReader

All Implemented Interfaces:

org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
```
public class ChunkRecordReader
extends java.lang.Object
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
```
ChunkRecordReader is designed for fast reading of a file split. The idea is to bring chunks of data instead of single records. The chunks contain many records and the chunk end is not aligned on a record boundary. The size of the chunk is a class hardcoded parameter - CHUNK_SIZE. This behaviour sets this reader apart from the other readers which will fetch one record and stop when reaching a record delimiter.

Constructor Summary

Constructors
Constructor and Description
`ChunkRecordReader(org.apache.hadoop.conf.Configuration job, org.apache.hadoop.mapred.FileSplit split)` Constructs a ChunkRecordReader instance.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`close()` Closes the input stream.
`org.apache.hadoop.io.LongWritable`	`createKey()` Used by the client of this class to create the 'key' output parameter for next() method.
`ChunkWritable`	`createValue()` Used by the client of this class to create the 'value' output parameter for next() method.
`long`	`getPos()` Returns the position of the unread tail of the file
`float`	`getProgress()` Gets the progress within the split.
`org.apache.hadoop.hdfs.DFSInputStream.ReadStatistics`	`getReadStatistics()` Returns statistics of the input stream's read operation: total bytes read, bytes read locally, bytes read in short-circuit (directly from file descriptor).
`boolean`	`next(org.apache.hadoop.io.LongWritable key, ChunkWritable value)` Fetches the next data chunk from the file split.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ChunkRecordReader
```
public ChunkRecordReader(org.apache.hadoop.conf.Configuration job,
                         org.apache.hadoop.mapred.FileSplit split)
                  throws java.io.IOException
```
    Constructs a ChunkRecordReader instance.
    
    Parameters:
    
    job - the job configuration
    
    split - contains the file name, begin byte of the split and the bytes length
    
    Throws:
    
    java.io.IOException - if an I/O error occurs when accessing the file or creating input stream to read from it
- Method Detail
  - getReadStatistics
```
public org.apache.hadoop.hdfs.DFSInputStream.ReadStatistics getReadStatistics()
```
    Returns statistics of the input stream's read operation: total bytes read, bytes read locally, bytes read in short-circuit (directly from file descriptor).
    
    Returns:
    
    an instance of ReadStatistics class
  - createKey
```
public org.apache.hadoop.io.LongWritable createKey()
```
    Used by the client of this class to create the 'key' output parameter for next() method.
    
    Specified by:
    
    createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Returns:
    
    an instance of LongWritable
  - createValue
```
public ChunkWritable createValue()
```
    Used by the client of this class to create the 'value' output parameter for next() method.
    
    Specified by:
    
    createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Returns:
    
    an instance of ChunkWritable
  - next
```
public boolean next(org.apache.hadoop.io.LongWritable key,
                    ChunkWritable value)
             throws java.io.IOException
```
    Fetches the next data chunk from the file split. The size of the chunk is a class hardcoded parameter - CHUNK_SIZE. This behaviour sets this reader apart from the other readers which will fetch one record and stop when reaching a record delimiter.
    
    Specified by:
    
    next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Parameters:
    
    key - - output parameter. When method returns will contain the key - the number of the start byte of the chunk
    
    value - - output parameter. When method returns will contain the value - the chunk, a byte array inside the ChunkWritable instance
    
    Returns:
    
    false - when end of split was reached
    
    Throws:
    
    java.io.IOException - if an I/O error occurred while reading the next chunk or line
  - getProgress
```
public float getProgress()
                  throws java.io.IOException
```
    Gets the progress within the split.
    
    Specified by:
    
    getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Throws:
    
    java.io.IOException
  - getPos
```
public long getPos()
            throws java.io.IOException
```
    Returns the position of the unread tail of the file
    
    Specified by:
    
    getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Returns:
    
    pos - start byte of the unread tail of the file
    
    Throws:
    
    java.io.IOException
  - close
```
public void close()
           throws java.io.IOException
```
    Closes the input stream.
    
    Specified by:
    
    close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,ChunkWritable>
    
    Throws:
    
    java.io.IOException

Class ChunkRecordReader

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ChunkRecordReader

Method Detail

getReadStatistics

createKey

createValue

next

getProgress

getPos

close