Package jslx.tabularfiles
Class TabularInputFile
- java.lang.Object
-
- jslx.tabularfiles.TabularFile
-
- jslx.tabularfiles.TabularInputFile
-
- All Implemented Interfaces:
java.lang.Iterable<RowGetterIfc>
public class TabularInputFile extends TabularFile implements java.lang.Iterable<RowGetterIfc>
An abstraction for reading rows of tabular data. Columns of the tabular data can be of numeric or text. Using this sub-class of TabularFile users can read rows of data. The user is responsible for iterating rows with data of the appropriate type for the column and reading the row into their program. Use the static methods of TabularFile to create and define the columns of the file. Use the methods of this class to read rows.- See Also:
TabularFile
,For example code
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
TabularInputFile.BufferedRecordsIterator
A class to make iterating of JOOQ records buffered and easier.class
TabularInputFile.RowIterator
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_ROW_BUFFER_SIZE
-
Fields inherited from class jslx.tabularfiles.TabularFile
JOOQ_TYPE, myColumnNames, myColumnTypes, myDataTypes, myNameAndIndex, myNumericIndices, myPath, myTextIndices
-
-
Constructor Summary
Constructors Modifier Constructor Description TabularInputFile(java.nio.file.Path pathToFile)
protected
TabularInputFile(java.util.LinkedHashMap<java.lang.String,DataType> columnTypes, java.nio.file.Path pathToFile)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description DatabaseIfc
asDatabase()
Transforms the file into an SQLite database fileprotected java.util.List<RowGetterIfc>
convertRecordsToRows(org.jooq.Result<org.jooq.Record> records, long startingRowNum)
protected RowGetterIfc
convertRecordToRow(org.jooq.Record record, long rowNum)
RowGetterIfc
fetchOneRow(long rowNum)
Returns the row.java.util.Optional<RowGetterIfc>
fetchRow(long rowNum)
Returns an optional wrapping the row.java.util.List<RowGetterIfc>
fetchRows(long minRowNum, long maxRowNum)
Returns the rows between minRowNum and maxRowNum, inclusive.static java.util.LinkedHashMap<java.lang.String,DataType>
getColumnTypes(java.nio.file.Path pathToFile)
Gets the meta data for an existing TabularInputFile.java.lang.Double[]
getNumericColumn(int colNum, int maxRows)
Obviously, there are memory issues if their are a lot of rows.java.lang.Double[]
getNumericColumn(int colNum, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.java.lang.Double[]
getNumericColumn(java.lang.String columnName, int maxRows)
Obviously, there are memory issues if their are a lot of rows.java.lang.Double[]
getNumericColumn(java.lang.String columnName, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.java.util.LinkedHashMap<java.lang.String,java.lang.Double[]>
getNumericColumns(int maxRows)
java.util.LinkedHashMap<java.lang.String,java.lang.Double[]>
getNumericColumns(int maxRows, boolean removeMissing)
int
getRowBufferSize()
java.lang.String[]
getTextColumn(int colNum, int maxRows)
Obviously, there are memory issues if their are a lot of rows.java.lang.String[]
getTextColumn(int colNum, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.java.lang.String[]
getTextColumn(java.lang.String columnName, int maxRows)
Obviously, there are memory issues if their are a lot of rows.java.lang.String[]
getTextColumn(java.lang.String columnName, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.java.util.LinkedHashMap<java.lang.String,java.lang.String[]>
getTextColumns(int maxRows)
java.util.LinkedHashMap<java.lang.String,java.lang.String[]>
getTextColumns(int maxRows, boolean removeMissing)
long
getTotalNumberRows()
TabularInputFile.RowIterator
iterator()
TabularInputFile.RowIterator
iterator(long startingRow)
void
printAsText()
Prints all of the rows.void
printAsText(long minRow)
Prints from the given row to the end of the file This is not optimized for large files and may have memory and performance issues.void
printAsText(long minRow, long maxRow)
This is not optimized for large files and may have memory and performance issues.void
setRowBufferSize(int rowBufferSize)
void
writeAsCSV(java.io.PrintWriter out, boolean header)
A very simple write to CSV.void
writeAsText(long minRow, long maxRow, java.io.PrintWriter out)
This is not optimized for large files and may have memory and performance issues.void
writeAsText(long minRow, java.io.PrintWriter out)
Writes from the given row to the end of the file.void
writeAsText(java.io.PrintWriter out)
Writes all of the rows.void
writeToExcelWorkbook(java.lang.String wbName, java.nio.file.Path wbDirectory)
This is not optimized for large files and may have memory and performance issues.-
Methods inherited from class jslx.tabularfiles.TabularFile
asDouble, checkTypes, column, columnNames, columnNames, columns, columns, createAllNumeric, createAllNumeric, createAllNumeric, getColumn, getColumnName, getColumnNames, getColumnTypes, getDataType, getDataTypes, getNumberColumns, getNumericColumnNames, getNumNumericColumns, getNumTextColumns, getPath, getTextColumnNames, isAllNumeric, isAllText, isNumeric, isNumeric, isText, numericColumn, textColumn, toString
-
-
-
-
Field Detail
-
DEFAULT_ROW_BUFFER_SIZE
public static final int DEFAULT_ROW_BUFFER_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TabularInputFile
public TabularInputFile(java.nio.file.Path pathToFile)
- Parameters:
pathToFile
- the path to a valid file that was written using TabularOutputFile
-
TabularInputFile
protected TabularInputFile(java.util.LinkedHashMap<java.lang.String,DataType> columnTypes, java.nio.file.Path pathToFile)
-
-
Method Detail
-
getColumnTypes
public static java.util.LinkedHashMap<java.lang.String,DataType> getColumnTypes(java.nio.file.Path pathToFile)
Gets the meta data for an existing TabularInputFile. The path must lead to a file that has the correct internal representation for tabular data files. Such a file can be created via TabularOutputFile.- Parameters:
pathToFile
- the path to the input file, must not be null- Returns:
- the meta data for the file column names and data type
-
getRowBufferSize
public final int getRowBufferSize()
- Returns:
- the current row buffer size
-
setRowBufferSize
public void setRowBufferSize(int rowBufferSize)
- Parameters:
rowBufferSize
- must be at least 1, bigger implies more memory.
-
iterator
public TabularInputFile.RowIterator iterator()
- Specified by:
iterator
in interfacejava.lang.Iterable<RowGetterIfc>
-
iterator
public TabularInputFile.RowIterator iterator(long startingRow)
- Parameters:
startingRow
- the starting row for the iteration- Returns:
- an iterator for moving through the rows
-
getTotalNumberRows
public final long getTotalNumberRows()
- Returns:
- the total number of rows in the tabular file
-
fetchRows
public final java.util.List<RowGetterIfc> fetchRows(long minRowNum, long maxRowNum)
Returns the rows between minRowNum and maxRowNum, inclusive. Since there may be memory implications when using this method, please use it wisely. In fact, use the provided iterator instead.- Parameters:
minRowNum
- the minimum row number, must be less than maxRowNum, and 1 or biggermaxRowNum
- the maximum row number, must be greater than minRowNum, and 2 or bigger- Returns:
- the list of rows, the list may be empty, if there are no rows in the row number range
-
fetchOneRow
public final RowGetterIfc fetchOneRow(long rowNum)
Returns the row. if the provided row number is larger than the number of rows in the file then an exception is thrown. Use fetchRow() if you do not check the number of rows.- Parameters:
rowNum
- the row number, must be 1 or more and less than getTotalNumberRows()- Returns:
- the row
-
fetchRow
public final java.util.Optional<RowGetterIfc> fetchRow(long rowNum)
Returns an optional wrapping the row. The optional will only be empty if the provided row number is larger than the number of rows in the file- Parameters:
rowNum
- the row number, must be 1 or more- Returns:
- the row wrapped in an Optional
-
convertRecordsToRows
protected java.util.List<RowGetterIfc> convertRecordsToRows(org.jooq.Result<org.jooq.Record> records, long startingRowNum)
-
convertRecordToRow
protected RowGetterIfc convertRecordToRow(org.jooq.Record record, long rowNum)
-
getNumericColumns
public final java.util.LinkedHashMap<java.lang.String,java.lang.Double[]> getNumericColumns(int maxRows)
- Parameters:
maxRows
- the total number of rows to extract starting at row 1- Returns:
- a map of all of the data keyed by column name
-
getNumericColumns
public final java.util.LinkedHashMap<java.lang.String,java.lang.Double[]> getNumericColumns(int maxRows, boolean removeMissing)
- Parameters:
maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- a map of all of the data keyed by column name
-
getTextColumns
public final java.util.LinkedHashMap<java.lang.String,java.lang.String[]> getTextColumns(int maxRows)
- Parameters:
maxRows
- the total number of rows to extract starting at row 1- Returns:
- a map of all of the data keyed by column name
-
getTextColumns
public final java.util.LinkedHashMap<java.lang.String,java.lang.String[]> getTextColumns(int maxRows, boolean removeMissing)
- Parameters:
maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- a map of all of the data keyed by column name
-
getNumericColumn
public final java.lang.Double[] getNumericColumn(int colNum, int maxRows)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
colNum
- the column number to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1- Returns:
- the array of values, including any missing values marked as null
-
getNumericColumn
public final java.lang.Double[] getNumericColumn(java.lang.String columnName, int maxRows)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
columnName
- the column name to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1- Returns:
- the array of values, including any missing values marked as null
-
getNumericColumn
public final java.lang.Double[] getNumericColumn(java.lang.String columnName, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
columnName
- the column name to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- the array of values
-
getNumericColumn
public final java.lang.Double[] getNumericColumn(int colNum, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
colNum
- the column number to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- the array of values
-
getTextColumn
public final java.lang.String[] getTextColumn(int colNum, int maxRows)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
colNum
- the column number to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1- Returns:
- the array of values, including any missing values marked as null
-
getTextColumn
public final java.lang.String[] getTextColumn(java.lang.String columnName, int maxRows)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
columnName
- the column name to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1- Returns:
- the array of values, including any missing values marked as null
-
getTextColumn
public final java.lang.String[] getTextColumn(java.lang.String columnName, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
columnName
- the column name to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- the array of values
-
getTextColumn
public final java.lang.String[] getTextColumn(int colNum, int maxRows, boolean removeMissing)
Obviously, there are memory issues if their are a lot of rows.- Parameters:
colNum
- the column number to retrieve, must be between [0,getNumberColumns())maxRows
- the total number of rows to extract starting at row 1removeMissing
- if true, then missing (NaN values) are removed- Returns:
- the array of values
-
writeAsCSV
public final void writeAsCSV(java.io.PrintWriter out, boolean header)
A very simple write to CSV. If you need something more complex, then iterate the rows yourself. This CSV does not apply quote characters to any elements.- Parameters:
out
- the file to write to the data to, the writer is NOT closed
-
writeAsText
public final void writeAsText(java.io.PrintWriter out)
Writes all of the rows. This is not optimized for large files and may have memory and performance issues.
-
writeAsText
public final void writeAsText(long minRow, java.io.PrintWriter out)
Writes from the given row to the end of the file. This is not optimized for large files and may have memory and performance issues.- Parameters:
minRow
- the row to start the printing
-
writeAsText
public final void writeAsText(long minRow, long maxRow, java.io.PrintWriter out)
This is not optimized for large files and may have memory and performance issues.- Parameters:
minRow
- the row to start the printingmaxRow
- the row to end the printing
-
printAsText
public final void printAsText()
Prints all of the rows. This is not optimized for large files and may have memory and performance issues.
-
printAsText
public final void printAsText(long minRow)
Prints from the given row to the end of the file This is not optimized for large files and may have memory and performance issues.- Parameters:
minRow
- the row to start the printing
-
printAsText
public final void printAsText(long minRow, long maxRow)
This is not optimized for large files and may have memory and performance issues.- Parameters:
minRow
- the row to start the printingmaxRow
- the row to end the printing
-
writeToExcelWorkbook
public final void writeToExcelWorkbook(java.lang.String wbName, java.nio.file.Path wbDirectory) throws java.io.IOException
This is not optimized for large files and may have memory and performance issues.- Parameters:
wbName
- the name of the workbook, must not be nullwbDirectory
- the path to the directory to contain the workbook, must not be null- Throws:
java.io.IOException
- if something goes wrong with the writing
-
asDatabase
public final DatabaseIfc asDatabase() throws java.io.IOException
Transforms the file into an SQLite database file- Returns:
- a reference to the database
- Throws:
java.io.IOException
- if something goes wrong
-
-