Transferring a very large datasets to fiware
backend: SourceDB ->Orion -> Cygnus -> Postgres
. For this I write a python script fetching rows and for each row fetched, send a payload to the Orion.
The script is firing at 150ms(0.15sec) intervals, but to my surprise after about 10 iterations, only two values (the first & last payloads sent) was persisted to the Postgres
sink. Meaning 80% of the datasets aren't persisted to the sink.
Script:
import psycopg2
from time import sleep
from config import config
from tqdm import tqdm
import requests
import json
def val_json():
db = "select to_json(d) from ( select \
n.noise_data as measurand, \
n.factor as \"sonometerClass\", \
to_timestamp(n.seconds) as \"dateObserved\", \
l.description as name, \
json_build_object( \
'coordinates', \
json_build_array(l.node_lon, l.node_lat) \
) as location \
from noise as n \
inner join deployment as d on \
d.deployment_id = n.deployment_id \
inner join location as l on \
l.location_id = d.location_id \
order by n.seconds asc \
) as d"
return db
def main():
url = 'http://localhost:1026/v2/entities/002/attrs?options=keyValues'
headers = {"Content-Type": "application/json", \
"fiware-service": "urbansense", \
"fiware-servicepath": "/basic"}
conn = None
try:
params = config()
with psycopg2.connect(**params) as conn:
with conn.cursor(name='my_cursor') as cur:
cur.itersize = 5000
cur.execute(val_json())
for row in tqdm(cur):
jsonData = json.dumps(row)
if jsonData.startswith('[') and jsonData.endswith(']'):
jsonData = jsonData[1:-1]
print()
print(jsonData)
requests.post(url, data= jsonData, headers=headers)
sleep(0.15)
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
if __name__ == '__main__':
main()
First ten iterations (payloads):
$python3 noiselevelObserved.py
0it [00:00, ?it/s]
{"measurand": 64.8, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:15+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
1it [00:00, 1.75it/s]
{"measurand": 58.8, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:16+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
2it [00:00, 2.23it/s]
{"measurand": 56.5, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:17+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
3it [00:00, 2.76it/s]
{"measurand": 61.1, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}
4it [00:01, 3.31it/s]
{"measurand": 108.5, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Pr. Liberdade Cardosas", "location": {"coordinates": [-8.611119, 41.146023]}}
5it [00:01, 3.86it/s]
{"measurand": 56.5, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
6it [00:01, 4.35it/s]
{"measurand": 59.9, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}
7it [00:01, 4.78it/s]
{"measurand": 97.2, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "D. Manuel II", "location": {"coordinates": [-8.625192, 41.148558]}}
8it [00:01, 5.12it/s]
{"measurand": 108.6, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Pr. Liberdade Cardosas", "location": {"coordinates": [-8.611119, 41.146023]}}
9it [00:01, 5.41it/s]
{"measurand": 57.1, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
10it [00:02, 5.63it/s]
{"measurand": 53.9, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:20+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}
Reading attributes' values persisted into postgres:
postgres=# select * from urbansense.basic_002_noiselevelobserved ;
recvtimets | recvtime | fiwareservicepath | entityid | entitytype | attrname | attrtype | attrvalue | attrmd
---------------+--------------------------+-------------------+----------+--------------------+----------------+-----------------+---------------------------------------+--------
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | dateObserved | Text | 1970-01-01T01:00:15+01:00 | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | latitude | Number | 41.1591 | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | location | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | longitude | Number | -8.65915 | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | measurand | Number | 64.8 | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | name | Text | Trindade | []
1559045918129 | 2019-05-28T12:18:38.129Z | /basic | 002 | NoiseLevelObserved | sonometerClass | Number | 1 | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | dateObserved | Text | 1970-01-01T01:00:20+01:00 | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | latitude | Number | 41.1591 | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | location | StructuredValue | {"coordinates":[-8.63041,41.158091]} | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | longitude | Number | -8.65915 | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | measurand | Number | 53.9 | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | name | Text | Casa da Musica | []
1559045919723 | 2019-05-28T12:18:39.723Z | /basic | 002 | NoiseLevelObserved | sonometerClass | Number | 0 | []
(14 rows)
Changing the firing rate to 1-second interval doesn't give much improvement, only the 3 result sets (payloads) were persisted (70% lost):
postgres=# select * from urbansense.basic_002_noiselevelobserved ;
recvtimets | recvtime | fiwareservicepath | entityid | entitytype | attrname | attrtype | attrvalue | attrmd
---------------+--------------------------+-------------------+----------+--------------------+----------------+-----------------+---------------------------------------+--------
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | dateObserved | Text | 1970-01-01T01:00:15+01:00 | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | latitude | Number | 41.1591 | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | location | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | longitude | Number | -8.65915 | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | measurand | Number | 64.8 | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | name | Text | Trindade | []
1559046840569 | 2019-05-28T12:34:00.569Z | /basic | 002 | NoiseLevelObserved | sonometerClass | Number | 1 | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | dateObserved | Text | 1970-01-01T01:00:18+01:00 | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | latitude | Number | 41.1591 | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | location | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | longitude | Number | -8.65915 | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | measurand | Number | 56.5 | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | name | Text | Trindade | []
1559046845620 | 2019-05-28T12:34:05.620Z | /basic | 002 | NoiseLevelObserved | sonometerClass | Number | 0 | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | dateObserved | Text | 1970-01-01T01:00:20+01:00 | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | latitude | Number | 41.1591 | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | location | StructuredValue | {"coordinates":[-8.63041,41.158091]} | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | longitude | Number | -8.65915 | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | measurand | Number | 53.9 | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | name | Text | Casa da Musica | []
1559046850679 | 2019-05-28T12:34:10.679Z | /basic | 002 | NoiseLevelObserved | sonometerClass | Number | 0 | []
(21 rows)
I honestly wouldn't like the firing rate to be up to a second because of the data size to be migrated to continue my research. When I first tried it at 1-second interval, I realised is going to take months to complete (maybe 4).
Question: Is Orion CB
(or maybe Cygnus
) not built to have such power of receiving entity/attributes values at such a rate (150ms/0.15s), Or maybe Cygnus
isn't smart enough to receive notification from Orion
at such a rate?.
I will appreciate any suggestion to ensure all values are persisted in the shortest time.
Based on the discussion in the question comments, the solution to this was to remove the throttling
parameter in the subscription. It makes sense: the throttling makes some notification not being sent (in this particular case, 80% of all the notifications).