Generating time series for doing time-series forecasting with LSTM

By | July 11, 2018

I have a .db file with columns as described below. This data has been collected by a software which monitors the file usage in a filesystem or in other words generates metadata about all the files in the system.

fid | opcode | count | formatdate (timestamp, YYYY/MM/DD HH:mm)  

124 |   2    |   1   | 2018/06/08 09:00  
454 |   1    |   7   | 2018/06/08 09:01  
433 |   1    |   2   | 2018/06/08 09:01

The description of columns is as follows :
1. fid: unique file id given to every file
2. opcode: These are two discrete values created by the software. 1 stands for read, 2 for write on the file.
3. count: number of time read/write happens in a minute
4. timestamp: timestamp when the activity takes place. This is separated by 1 minute each. For e.g. if a read operation happens on the file at 2018/06/08 9:01:21 and another happens by another user at 2018/06/08 9:01:34, it will increment the count and count will be 2 for opcode 1 and timestamp will be 2018/06/08 9:01.

Now I need to generate time series for each file which is separated by a window of 8 hrs.
So the output which I need is a time series for every file spaced by a window of 8 hrs. e.g. fid = 123 | time series:54,64,67,0,53,31,10...........
The data that I have is of 6 months, it means I will have 3*180=540 length time series for each file. I need two type of time series :
1. A time series for each file(activity time series) which doesn't consider read and write as different and adds them together. e.g. if a file was read 56 times and written 32 times within first window of 8 hrs, it just adds them and shows an activity of 88. So the time series will be 88,........(540 terms)
2. Two different read and write time series for each file.

I need the output time series in a suitable format from where I can copy them and load them as numpy array for training a LSTM model for doing time-series forecasting.