motamman - The system has three components: A c...
The system has three components: A command builder that creates PUT-enabled bool paths, such as commands.capturePassage or commands.captureAnchor, for example. (I use KIP to toggle these on and off.)
An archive/data manager listens for regimen changes (capture commands). When true, it saves the subscribing paths’ data to disk. I use the Signalk path as a file system path. So vessels.urn:mrm:uni:mmsi:338043907.navigation.position is stored as ~/.signalk/data/vessels./urn_mrm_uni_mms_338043907/navigation/position/signalk_data_2025-07-08T1025.parquet (That data is buffered and written episodically.)
Finally, a Signalk web app/plugin manages a Python-based api server. Any path that exists on the file system automatically gets an endpoint and can be filtered based on dates and values or date/value ranges. Custom queries are created using the plug-in management by assigning custom SQL to an ID, which in turn receives a unique endpoint.
I'm not sure if what I've built here is helpful to anyone but me. However, I do believe this has significant advantages over InfluxDB, under certain circumstances, and it could easily integrate with the playback and history API and even with Grafana.
I am happy to share it more broadly, and I am also content to keep it to myself.
8 Replies
this is extremely interesting! storing sk data for analysis has a special place in my heart - please share
how does your system integrate with sk server - how does it get the data updates?
how does parquet do with larger amounts of data, like if i want to graph my speed over a two week trip?
i have been working on the Signal K history API for quite some while. the latest is the Grafana Signal K datasource that I recently published in Grafana plugins https://grafana.com/grafana/plugins/tkurki-signalk-datasource/
as the readme says For access to history data you will need a database where the data is stored and access to the data - if you were to implement the history API on your system you could use the SK history datasource to bring the parquet data to Grafana
Grafana Labs
Signal K plugin for Grafana | Grafana Labs
Signal K datasource for streaming realtime data over WebSocket and querying historical data via History API. The plugin is designed to work with the Signal K server and signalk-to-influxdb2 plugin.
this is one major goal for the datasource: to bring SK data to Grafana, independent of the the db system you use
if i wanted to give your system a go what would i need to do - how would the users install it?
How does your system integrate with sk server - how does it get the data updates?
The plugin listens for updates to specified Signalk paths and, when they change, converts the entire delta (is that the correct term?) to Parquet data, saving it to a buffer that is written to disk episodically. The plugin allows users to set the save interval, the max buffer size, the root data folder name and the file name prefix. The context and path determine the folder structure.
How does parquet do with larger amounts of data, like if i want to graph my speed over a two week trip?
It will be fine. Currently, at work, I am working with a dataset that spans 60 years, and some folders contain 1 million plus records for each day. Queries take a few seconds. A few tens of thousands of rows is nothing. I have been running this off and on for a couple of weeks, and 20,000 records come back in a second or so. Parquet files are organized as columnar data, not row data. The result is a massive speed and uncompressed storage advantage over row-based data. There are drawbacks, but in our context, they are immaterial. (https://en.wikipedia.org/wiki/Data_orientation)
I took a cursory peek at the plugin and realized it wasn’t immediately compatible with what I built. (I did create a Grafana endpoint system that should be compatible with Grafana generically.) I have played around with the Grafana but put it aside because i find InfluxDB to be byzantine and sometimes baffling.
I will take a close look at the code to see how the history API is implemented. Perhaps my API is redundant? Hooking into the history API makes more sense. I found it a bit of a pain in the arse to set up a Python app that is managed by a Signalk plugin/web app.
And right now, I have one plugin/web app that manages the archiving and another that manage the API server. They could be combined, but they seem distinct enough to keep separate.
I am using HTTP requests to deliver the JSON from the stored data. It works great with Node-RED dashboard charts.
For a user, the command builder and the archiver would be simple npm installs, followed by configuring each according to the users preferences. Right now, I am installing everything manually by copying the folders into the node_modules folder, running npm install to grab the dependencies, and then adding each of the package.json so they don’t get overwritten when something else is installed or updated.
The Python API server works on my RPI, and I have an installer that should work on other RPIs. However, switching from an Ubuntu/AWS instance (where I initially developed it) to Debian/RPI was a manual and tedious process.
If you'd like to take a look-see, I can add you as a collaborator to the GitHub repositories.
Monday evening. I could also create npm packages, if you prefer. Regardless, it might be helpful for you to see it in action.
Also, I recently adapted several Node-RED flows to plugins, including a Mosquitto MQTT broker manager (I send all my local data to a local broker, which bridges to a remote broker. This allows me to get around setting up a proxy.)
Building on that thought, I adapted an MQTT Signalk exporter (I know there are others, but I prefer my approach).
Sorry for the length of these messges. i didn't have time to write short ones. The crew is awakening.
Pax
ah, i must have skipped the plugin part and only read the Python reference - so the archiving part is a plugin. does it use https://www.npmjs.com/package/@dsnp/parquetjs ?
npm
@dsnp/parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format. Latest version: 1.8.7, last published: a month ago. Start using @dsnp/parquetjs in your project by running
npm i @dsnp/parquetjs
. There are 18 other projects in the npm registry using @dsnp/parquetjs.instead of installing everything manually are you aware that npm can install modules directly from github or from tar files?
the way i deploy wip plugins to my own server i run
npm pack
on the dev machine, copy the resulting .tar.gz over to the target machine and then npm install
from the tar file
so no need to publish to npm to let others access itinstead of a "command" to direct the archiving you could manipulate navigation.state and use its value to control the archiving. this would be more universal and interoperable, even if we have yet to document well known values for the state
there would definitely be value going forward with standardising on a single history API. it is not set in stone, but we can change it as we see fit and needs arise
now would probably be a great time to finish the wip OpenAPI description that i have lying around somewhere.....
you clearly don't like InfluxDB - it has served me and others pretty well, but i have completely lost faith in it as they first went all in with their inhouse Flux query language, only to completely drop it a few years later. and then i found out that really simple queries are way more performant using InfluxQL (v1 query lang) than Flux. and as a long term storage format InfluxDb sucks as monolithic & propietary. so far i have just exported data in Influx Line Protocol format that is like CSV, but i really like the idea of exporting the data as parquet, which would solve both long term archival and queryable db needs
there's also one specific area that Influx does not solve: geo queries, in practice being able to retrieve data based on a bounding box and/or within a distance from a point. is that somehow doable with parquet?
are you using MQTT just for access to realtime data or do you have an archival process off the cloud broker?
i could also take a stab at implementing the History API on top of your parquet format. never worked with parquet before. my preliminary plan for moving from Influx was to use TimescaleDB, with the added benefit of getting a relational database for other use
but Influxdb and timescaledb both have the drawback of requiring a separate service that you need to install, update and to an extent manage. it would be just so much easier to have just a sk server plugin and files