Class TabularInputFile

  • All Implemented Interfaces:
    java.lang.Iterable<RowGetterIfc>

    public class TabularInputFile
    extends TabularFile
    implements java.lang.Iterable<RowGetterIfc>
    An abstraction for reading rows of tabular data. Columns of the tabular data can be of numeric or text. Using this sub-class of TabularFile users can read rows of data. The user is responsible for iterating rows with data of the appropriate type for the column and reading the row into their program. Use the static methods of TabularFile to create and define the columns of the file. Use the methods of this class to read rows.
    See Also:
    TabularFile, For example code
    • Field Detail

      • DEFAULT_ROW_BUFFER_SIZE

        public static final int DEFAULT_ROW_BUFFER_SIZE
        See Also:
        Constant Field Values
    • Constructor Detail

      • TabularInputFile

        public TabularInputFile​(java.nio.file.Path pathToFile)
        Parameters:
        pathToFile - the path to a valid file that was written using TabularOutputFile
      • TabularInputFile

        protected TabularInputFile​(java.util.LinkedHashMap<java.lang.String,​DataType> columnTypes,
                                   java.nio.file.Path pathToFile)
    • Method Detail

      • getColumnTypes

        public static java.util.LinkedHashMap<java.lang.String,​DataType> getColumnTypes​(java.nio.file.Path pathToFile)
        Gets the meta data for an existing TabularInputFile. The path must lead to a file that has the correct internal representation for tabular data files. Such a file can be created via TabularOutputFile.
        Parameters:
        pathToFile - the path to the input file, must not be null
        Returns:
        the meta data for the file column names and data type
      • getRowBufferSize

        public final int getRowBufferSize()
        Returns:
        the current row buffer size
      • setRowBufferSize

        public void setRowBufferSize​(int rowBufferSize)
        Parameters:
        rowBufferSize - must be at least 1, bigger implies more memory.
      • iterator

        public TabularInputFile.RowIterator iterator​(long startingRow)
        Parameters:
        startingRow - the starting row for the iteration
        Returns:
        an iterator for moving through the rows
      • getTotalNumberRows

        public final long getTotalNumberRows()
        Returns:
        the total number of rows in the tabular file
      • fetchRows

        public final java.util.List<RowGetterIfc> fetchRows​(long minRowNum,
                                                            long maxRowNum)
        Returns the rows between minRowNum and maxRowNum, inclusive. Since there may be memory implications when using this method, please use it wisely. In fact, use the provided iterator instead.
        Parameters:
        minRowNum - the minimum row number, must be less than maxRowNum, and 1 or bigger
        maxRowNum - the maximum row number, must be greater than minRowNum, and 2 or bigger
        Returns:
        the list of rows, the list may be empty, if there are no rows in the row number range
      • fetchOneRow

        public final RowGetterIfc fetchOneRow​(long rowNum)
        Returns the row. if the provided row number is larger than the number of rows in the file then an exception is thrown. Use fetchRow() if you do not check the number of rows.
        Parameters:
        rowNum - the row number, must be 1 or more and less than getTotalNumberRows()
        Returns:
        the row
      • fetchRow

        public final java.util.Optional<RowGetterIfc> fetchRow​(long rowNum)
        Returns an optional wrapping the row. The optional will only be empty if the provided row number is larger than the number of rows in the file
        Parameters:
        rowNum - the row number, must be 1 or more
        Returns:
        the row wrapped in an Optional
      • convertRecordsToRows

        protected java.util.List<RowGetterIfc> convertRecordsToRows​(org.jooq.Result<org.jooq.Record> records,
                                                                    long startingRowNum)
      • convertRecordToRow

        protected RowGetterIfc convertRecordToRow​(org.jooq.Record record,
                                                  long rowNum)
      • getNumericColumns

        public final java.util.LinkedHashMap<java.lang.String,​java.lang.Double[]> getNumericColumns​(int maxRows)
        Parameters:
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        a map of all of the data keyed by column name
      • getNumericColumns

        public final java.util.LinkedHashMap<java.lang.String,​java.lang.Double[]> getNumericColumns​(int maxRows,
                                                                                                          boolean removeMissing)
        Parameters:
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        a map of all of the data keyed by column name
      • getTextColumns

        public final java.util.LinkedHashMap<java.lang.String,​java.lang.String[]> getTextColumns​(int maxRows)
        Parameters:
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        a map of all of the data keyed by column name
      • getTextColumns

        public final java.util.LinkedHashMap<java.lang.String,​java.lang.String[]> getTextColumns​(int maxRows,
                                                                                                       boolean removeMissing)
        Parameters:
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        a map of all of the data keyed by column name
      • getNumericColumn

        public final java.lang.Double[] getNumericColumn​(int colNum,
                                                         int maxRows)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        colNum - the column number to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        the array of values, including any missing values marked as null
      • getNumericColumn

        public final java.lang.Double[] getNumericColumn​(java.lang.String columnName,
                                                         int maxRows)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        columnName - the column name to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        the array of values, including any missing values marked as null
      • getNumericColumn

        public final java.lang.Double[] getNumericColumn​(java.lang.String columnName,
                                                         int maxRows,
                                                         boolean removeMissing)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        columnName - the column name to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        the array of values
      • getNumericColumn

        public final java.lang.Double[] getNumericColumn​(int colNum,
                                                         int maxRows,
                                                         boolean removeMissing)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        colNum - the column number to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        the array of values
      • getTextColumn

        public final java.lang.String[] getTextColumn​(int colNum,
                                                      int maxRows)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        colNum - the column number to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        the array of values, including any missing values marked as null
      • getTextColumn

        public final java.lang.String[] getTextColumn​(java.lang.String columnName,
                                                      int maxRows)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        columnName - the column name to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        Returns:
        the array of values, including any missing values marked as null
      • getTextColumn

        public final java.lang.String[] getTextColumn​(java.lang.String columnName,
                                                      int maxRows,
                                                      boolean removeMissing)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        columnName - the column name to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        the array of values
      • getTextColumn

        public final java.lang.String[] getTextColumn​(int colNum,
                                                      int maxRows,
                                                      boolean removeMissing)
        Obviously, there are memory issues if their are a lot of rows.
        Parameters:
        colNum - the column number to retrieve, must be between [0,getNumberColumns())
        maxRows - the total number of rows to extract starting at row 1
        removeMissing - if true, then missing (NaN values) are removed
        Returns:
        the array of values
      • writeAsCSV

        public final void writeAsCSV​(java.io.PrintWriter out,
                                     boolean header)
        A very simple write to CSV. If you need something more complex, then iterate the rows yourself. This CSV does not apply quote characters to any elements.
        Parameters:
        out - the file to write to the data to, the writer is NOT closed
      • writeAsText

        public final void writeAsText​(java.io.PrintWriter out)
        Writes all of the rows. This is not optimized for large files and may have memory and performance issues.
      • writeAsText

        public final void writeAsText​(long minRow,
                                      java.io.PrintWriter out)
        Writes from the given row to the end of the file. This is not optimized for large files and may have memory and performance issues.
        Parameters:
        minRow - the row to start the printing
      • writeAsText

        public final void writeAsText​(long minRow,
                                      long maxRow,
                                      java.io.PrintWriter out)
        This is not optimized for large files and may have memory and performance issues.
        Parameters:
        minRow - the row to start the printing
        maxRow - the row to end the printing
      • printAsText

        public final void printAsText()
        Prints all of the rows. This is not optimized for large files and may have memory and performance issues.
      • printAsText

        public final void printAsText​(long minRow)
        Prints from the given row to the end of the file This is not optimized for large files and may have memory and performance issues.
        Parameters:
        minRow - the row to start the printing
      • printAsText

        public final void printAsText​(long minRow,
                                      long maxRow)
        This is not optimized for large files and may have memory and performance issues.
        Parameters:
        minRow - the row to start the printing
        maxRow - the row to end the printing
      • writeToExcelWorkbook

        public final void writeToExcelWorkbook​(java.lang.String wbName,
                                               java.nio.file.Path wbDirectory)
                                        throws java.io.IOException
        This is not optimized for large files and may have memory and performance issues.
        Parameters:
        wbName - the name of the workbook, must not be null
        wbDirectory - the path to the directory to contain the workbook, must not be null
        Throws:
        java.io.IOException - if something goes wrong with the writing
      • asDatabase

        public final DatabaseIfc asDatabase()
                                     throws java.io.IOException
        Transforms the file into an SQLite database file
        Returns:
        a reference to the database
        Throws:
        java.io.IOException - if something goes wrong