CSV Format¶
As explained in a previous section, the input expected by the Draco system consists of
two tables which need to be passed as pandas.DataFrame objects:
The
target_timestable, which containing the specification of the problem that we are solving in the form of training examples with aturbine_id, acutoff_timeand atargetvalue.The
readingstable, which contains the signal readings from the different sensors, withturbine_id,signal_id,timestampandvaluefields.
However, in most scenarios the size of the available will far exceed the memory limitations
of the system on which Draco is being run, so loading all the data in a single
pandas.DataFrame will not be possible.
In order to solve this situation, Draco provides a CSVLoader class which can be used to load data from what we call the Raw Data Format.
Raw Data Format¶
The Raw Data Format consists on a collection of CSV files stored in a single folder with the following structure:
All the data from all the turbines is inside a single folder, which here we will call
readings.Inside the
readingsfolder, one folder exists for each turbine, named exactly like the turbine:readings/T001readings/T002…
Inside each turbine folder one CSV file exists for each month, named
%Y-%m.csv.readings/T001/2010-01.csvreadings/T001/2010-02.csvreadings/T001/2010-03.csv…
Each CSV file contains three columns:
signal_id: name or id of the signal.timestamp: timestamp of the reading formatted as%m/%d/%y %H:%M:%S.value: value of the reading.
This is an example of what a CSV contents look like:
signal_id |
timestamp |
value |
|
|---|---|---|---|
0 |
S1 |
01/01/01 00:00:00 |
1 |
1 |
S1 |
01/01/01 12:00:00 |
2 |
2 |
S1 |
01/02/01 00:00:00 |
3 |
3 |
S1 |
01/02/01 12:00:00 |
4 |
4 |
S1 |
01/03/01 00:00:00 |
5 |
5 |
S1 |
01/03/01 12:00:00 |
6 |
6 |
S2 |
01/01/01 00:00:00 |
7 |
7 |
S2 |
01/01/01 12:00:00 |
8 |
8 |
S2 |
01/02/01 00:00:00 |
9 |
9 |
S2 |
01/02/01 12:00:00 |
10 |
10 |
S2 |
01/03/01 00:00:00 |
11 |
11 |
S2 |
01/03/01 12:00:00 |
12 |