public class TabularFileNormalizer extends Object
| Modifier and Type | Field and Description |
|---|---|
static String |
NORMALIZED_END_OF_LINE |
| Constructor and Description |
|---|
TabularFileNormalizer() |
| Modifier and Type | Method and Description |
|---|---|
static int |
normalizeFile(Path source,
Path destination,
Charset sourceCharset,
char delimiterChar,
String endOfLineSymbols,
Character quoteChar)
Normalizes the provided tabular "file" (provided as
Reader to let the caller deal with charset). |
public static final String NORMALIZED_END_OF_LINE
public TabularFileNormalizer()
public static int normalizeFile(Path source, Path destination, Charset sourceCharset, char delimiterChar, String endOfLineSymbols, Character quoteChar) throws IOException
Reader to let the caller deal with charset).
Normalization includes: striping of Control Characters (see CONTROL_CHAR_REGEX),
usage of \n as end-line-character, ensuring there is an end-of-line character on the last line and
removing empty (completely empty) lines.
The normalized content will have unnecessary quotes removed.source - Path representing the sourcedestination - Path representing the destination. If the file already exists it will be overwritten.sourceCharset - optionally, the Charset of the source. If null UTF-8 will be used.delimiterChar - endOfLineSymbols - quoteChar - optionalIOExceptionCopyright © 2024 Global Biodiversity Information Facility (GBIF). All rights reserved.