public class TabularFileNormalizer extends Object
Modifier and Type | Field and Description |
---|---|
static String |
NORMALIZED_END_OF_LINE |
Constructor and Description |
---|
TabularFileNormalizer() |
Modifier and Type | Method and Description |
---|---|
static int |
normalizeFile(Path source,
Path destination,
Charset sourceCharset,
char delimiterChar,
String endOfLineSymbols,
Character quoteChar)
Normalizes the provided tabular "file" (provided as
Reader to let the caller deal with charset). |
public static final String NORMALIZED_END_OF_LINE
public TabularFileNormalizer()
public static int normalizeFile(Path source, Path destination, Charset sourceCharset, char delimiterChar, String endOfLineSymbols, Character quoteChar) throws IOException
Reader
to let the caller deal with charset).
Normalization includes: stripping of Control Characters (see CONTROL_CHAR_REGEX
),
usage of \n as end-line-character, ensuring there is an end-of-line character on the last line and
removing empty (completely empty) lines.
The normalized content will have unnecessary quotes removed.source
- Path
representing the sourcedestination
- Path
representing the destination. If the file already exists it will be overwritten.sourceCharset
- optionally, the Charset
of the source. If null UTF-8 will be used.delimiterChar
- endOfLineSymbols
- quoteChar
- optionalIOException
Copyright © 2024 Global Biodiversity Information Facility (GBIF). All rights reserved.