The Watch a Folder for New CSV Files Input Connector can be used to read and adapt event data, formatted as delimited text, from a system file. The text delimiter is usually a comma, so this type of input file is sometimes referred to a comma separated values data file, but ArcGIS GeoEvent Server can use any normal ASCII character as a delimiter to separate data attribute values.
Oftentimes, data values are simple. Commas are used to separate, or delimit, individual attribute values and literal string values are enclosed in double-quotes as illustrated below.
Sometimes, when a data file includes JSON string representations to specify geometry values for example, using a delimiter other than a comma is useful to avoid ambiguity when double-quotes and/or commas are embedded within an attribute value. The use of semi-colon characters as delimiters is illustrated below.
Usage notes
- Use this input connector to read data, formatted as delimited text, from a system file and adapt it to create event data records for processing.
- This input connector pairs the Text Inbound Adapter with the File Inbound Transport.
- The input connector watches the specified system folder and will read an entire file as soon as the file appears in the folder.
- The entire file’s content will be reread if changes are made to the file and are saved.
- All files in a watched folder will be reread, from the beginning of the files, if:
- The input connector’s properties are updated and saved.
- The input connector is stopped and restarted (or the ArcGIS GeoEvent Server service is restarted).
- Delimited text does not have to contain data which represents a geometry.
- The adapter supports the ability to construct a point geometry from X, Y, and Z attribute values.
- The registered server folder Input Folder Data Store can be specified using either an absolute path or UNC path. If a UNC path is used, the Windows service account running GeoEvent Server needs read/write permission to the folder.
- As a best practice, use absolute paths, for example C:\GeoEvent\input, for the Input Folder Data Store property.
- The Input Directory allows a subfolder relative to the registered server folder to be specified.
- Include Subfolders allows you to specify whether folders beneath the Input Folder Data Store should be searched recursively. Oftentimes, organizing data with different schemas into different folders, and changing Include Subfolders from its default to disable recursive search, allows a more direct and simpler configuration of this input connector.
- When a data file has one or more headers (for example, field names or attribute data types) which are not data values, specify the Number of Lines to Skip from Start of File. When a data file is particularly large, reduce the Max Number of Lines per Batch to help manage data retrieval by limiting the number of lines retrieved as the file’s content is retrieved. You can also set the Batch Flush Interval to specify how many milliseconds to wait before the next batch of lines are retrieved from the file.
- A Message Separator and an Attribute Separator are required to parse delimited text. The Message Separator indicates the character which identifies the end of a data record, the default is \n (newline). The Attribute Separator specifies the character used to separate one attribute value from another in a single line of text. The illustrations above show data that uses different characters as attribute separators. Each illustration, however, assumes that a newline is the natural message separator.
A single data file can contain different types of data, for example, Light Truck vs. Tractor Trailer. If different lines of text represent event data from different types sensors or assets, then the first attribute value of each line of text must identify the type of event record. The property Incoming Data Contains GeoEvent Definition specifies whether the connector should use the first attribute value as the name of the GeoEvent Definition, to specify the data type and number of attribute values which follow. This is often a source of confusion; when this property is set to Yes (default) and is coupled with a dependent property Create Unrecognized Event Definitions is set to No (default), and event data like that illustrated above is provided – no event records are created for processing. The reason being, the first attribute of the illustrated event data is not the name of a GeoEvent Definition, it is an assets unique name/identifier and it is unlikely that GeoEvent Definitions exist whose names matches the unique identifiers of every asset being monitored.
Consider the expected behavior if an input was configured with the default Incoming Data Contains GeoEvent Definition set to Yes and the Create Unrecognized Event Definitions property was changed to Yes. A new GeoEvent Definition would be created for every named asset or sensor. This is not likely the result you would want, especially if the data contains hundreds, or thousands, of unique asset names. To prevent this from happening, review the data, and if each line does not start with the name of a GeoEvent Definition, change the Incoming Data Contains GeoEvent Definition property value to No.
- Network latency can adversely impact the ability for GeoEvent Server to retrieve high volumes of event data.
Parameters
Parameter | Description |
---|---|
Name | A descriptive name for the input connector used for reference in GeoEvent Manager. |
Input Folder Data Store | The registered system folder beneath which files will be found. |
Input Directory | A subfolder directly beneath the registered system folder. Input Directory should be left blank if a subfolder beneath the registered system folder does not exist. |
Input File Filter | A regular expression pattern used to identify files appropriate for this input to ingest and adapt to create event data records for processing. The default is .*\.csv which matches any filename (.*) ending with the literal suffix (.csv). While this parameter is not required and can be left blank, it is recommended you specify a pattern which matches the file name of any file whose schema matches the GeoEvent Definition this input has been configured to use and exclude files (by name) which you do not want the input to ingest. |
Is the File Text | Specifies whether the file is text based or in a binary format. The default is Yes.
|
Max Number of Lines per Batch (Conditional) | The maximum number of lines to read from the file in each batch or interval. The default is 1000 lines. Reduce this value if each event record contains many attributes to limit the amount of data sent to the Text Adapter as a batch. Parameter is shown when Is the File Text is set to Yes and is hidden when set to No. |
Batch Flush Interval (milliseconds) (Conditional) | The number of milliseconds to wait before reading another batch of lines from the file. The default is 500. Reduce this value if file size is expected to be very large and/or additional time is necessary to process each batch of lines retrieved from a file. Parameter is shown when Is the File Text is set to Yes and is hidden when set to No. |
Number of Lines to Skip from Start of File (Conditional) | The number of lines to skip from the start of the file. The default is 0. Increase this value if you want a skip a specific number of lines, for example header lines specifying attribute field names or data types, because they do not contain actual data for processing. Parameter is shown when Is the File Text is set to Yes and is hidden when set to No. |
Default Spatial Reference | The well-known ID (WKID) of a spatial reference to be used when a geometry is constructed from attribute field values whose coordinates are not latitude and longitude values for an assumed WGS84 geographic coordinate system, or geometry strings are received that do not include a spatial reference. A well-known text (WKT) value or the name of an attribute field containing the WKID or WKT may also be specified. |
Message Separator | A single literal character which indicates the end of an event data record. Unicode values may be used to specify a character delimiter. The character should not be enclosed in quotes. A newline (\n) is a common end-of-record delimiter. |
Attribute Separator | A single literal character used to separate one attribute value from another in a message. Unicode values may be used to specify a character delimiter. The character should not be enclosed in quotes. A comma (,) is a common attribute delimiter |
Incoming Data Contains GeoEvent Definition | Specifies whether the first attribute value of each delimited line of text should be used as the name of a GeoEvent Definition. For more information, see the usage notes above.
|
Create Unrecognized Event Definitions (Conditional) | Specifies whether a new GeoEvent Definition should be created when one with the specified name does not exist. When a delimited text file includes event records from different types of sensors, the first attribute value is used to specify the type of event and this attribute value is used as the GeoEvent Definition name.
Parameter is shown when Incoming Data Contains GeoEvent Definition is set to Yes and is hidden when set to No. |
Create GeoEvent Definition (Conditional) | Specifies whether a new or existing GeoEvent Definition should be used for the inbound event data. A GeoEvent Definition is required for GeoEvent Server to understand the inbound event data attribute fields and data types.
Parameter is shown when Incoming Data Contains GeoEvent Definition is set to No and is hidden when set to Yes. |
GeoEvent Definition Name (New) (Conditional) | The name assigned to a new GeoEvent Definition. If a GeoEvent Definition with the specified name already exists, the existing GeoEvent Definition will be used. The first data record received will be used to determine the expected schema of subsequent data records, a new GeoEvent Definition will be created based on that first data record's schema. Parameter is shown when Create GeoEvent Definition is set to Yes and is hidden when set to No. |
GeoEvent Definition Name (Existing) (Conditional) | The name of an existing GeoEvent Definition to use when adapting received data to create event data for processing by a GeoEvent Service. Parameter is shown when Create GeoEvent Definition is set to No and is hidden when set to Yes. |
Construct Geometry from Fields | Specifies whether the input connector should construct a point geometry using coordinate values received as attributes. The default is No.
|
X Geometry Field (Conditional) | The attribute field in the inbound event data containing the X coordinate part (for example horizontal or longitude) of a point location. Parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No. |
Y Geometry Field (Conditional) | The attribute field in the inbound event data containing the Y coordinate part (for example vertical or latitude) of a point location. Parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No. |
Z Geometry Field (Conditional) | The name of the field in the inbound event data containing the Z coordinate part (for example depth or altitude) of a point location. If left blank, the Z value will be omitted and a 2D point geometry will be constructed. Parameter is shown when Construct Geometry from Fields is set to Yes and is hidden when set to No. |
Expected Date Format |
The pattern used to match expected string representations of date/time values and convert them to Java Date values. The pattern's format follows the Java SimpleDateFormat class convention. This parameter has no default value. While GeoEvent Server prefers date/time values to be expressed in the ISO 8601 standard, several string representations of date/time values commonly recognized as date values can be converted to Java Date values without specifying an Expected Date Format pattern. These include:
If the date/time values received are expressed using a convention other than one of the five shown above, you will have to specify an Expected Date Format so GeoEvent Server knows how the date/time values should be adapted. |
Language for Number Formatting | The locale identifier (ID) used for locale-sensitive behavior when formatting numbers from data values. The default is the locale of the machine GeoEvent Server is installed on. For more information, see Java Supported Locales. |
Include Subfolders | Specifies whether subfolders beneath the Input Folder Data Store and Input Directory (optional) for files. The default is Yes, however, organizing data with different schemas into different folders and changing this parameter to No, to disable recursive search, allows a simpler configuration.
|
Delete Files After Processing | Specifies whether the files in the registered system folder will be deleted after their content has been processed. Note that even if a file's content cannot be adapted, no event records are created, and no real-time event processing occurs, the inbound transport will still delete a file whose contents were successfully read. The default is No.
Files not deleted will be reread, from the beginning of the file, if the input connector's properties are changed and saved or if the input is stopped and restarted, for example, if the ArcGIS GeoEvent Server service is restarted. |