Signal K•5mo ago

motamman - The system has three components: A c...

The system has three components: A command builder that creates PUT-enabled bool paths, such as commands.capturePassage or commands.captureAnchor, for example. (I use KIP to toggle these on and off.) An archive/data manager listens for regimen changes (capture commands). When true, it saves the subscribing paths’ data to disk. I use the Signalk path as a file system path. So vessels.urn:mrm:uni:mmsi:338043907.navigation.position is stored as ~/.signalk/data/vessels./urn_mrm_uni_mms_338043907/navigation/position/signalk_data_2025-07-08T1025.parquet (That data is buffered and written episodically.) Finally, a Signalk web app/plugin manages a Python-based api server. Any path that exists on the file system automatically gets an endpoint and can be filtered based on dates and values or date/value ranges. Custom queries are created using the plug-in management by assigning custom SQL to an ID, which in turn receives a unique endpoint. I'm not sure if what I've built here is helpful to anyone but me. However, I do believe this has significant advantages over InfluxDB, under certain circumstances, and it could easily integrate with the playback and history API and even with Grafana. I am happy to share it more broadly, and I am also content to keep it to myself.

10 Replies

Teppo Kurki•5mo ago

this is extremely interesting! storing sk data for analysis has a special place in my heart - please share how does your system integrate with sk server - how does it get the data updates? how does parquet do with larger amounts of data, like if i want to graph my speed over a two week trip?

Teppo Kurki•5mo ago

i have been working on the Signal K history API for quite some while. the latest is the Grafana Signal K datasource that I recently published in Grafana plugins https://grafana.com/grafana/plugins/tkurki-signalk-datasource/ as the readme says For access to history data you will need a database where the data is stored and access to the data - if you were to implement the history API on your system you could use the SK history datasource to bring the parquet data to Grafana

Grafana Labs

Signal K plugin for Grafana | Grafana Labs

Signal K datasource for streaming realtime data over WebSocket and querying historical data via History API. The plugin is designed to work with the Signal K server and signalk-to-influxdb2 plugin.

Teppo Kurki•5mo ago

this is one major goal for the datasource: to bring SK data to Grafana, independent of the the db system you use if i wanted to give your system a go what would i need to do - how would the users install it?

motammanOP•5mo ago

How does your system integrate with sk server - how does it get the data updates? The plugin listens for updates to specified Signalk paths and, when they change, converts the entire delta (is that the correct term?) to Parquet data, saving it to a buffer that is written to disk episodically. The plugin allows users to set the save interval, the max buffer size, the root data folder name and the file name prefix. The context and path determine the folder structure. How does parquet do with larger amounts of data, like if i want to graph my speed over a two week trip? It will be fine. Currently, at work, I am working with a dataset that spans 60 years, and some folders contain 1 million plus records for each day. Queries take a few seconds. A few tens of thousands of rows is nothing. I have been running this off and on for a couple of weeks, and 20,000 records come back in a second or so. Parquet files are organized as columnar data, not row data. The result is a massive speed and uncompressed storage advantage over row-based data. There are drawbacks, but in our context, they are immaterial. (https://en.wikipedia.org/wiki/Data_orientation) I took a cursory peek at the plugin and realized it wasn’t immediately compatible with what I built. (I did create a Grafana endpoint system that should be compatible with Grafana generically.) I have played around with the Grafana but put it aside because i find InfluxDB to be byzantine and sometimes baffling. I will take a close look at the code to see how the history API is implemented. Perhaps my API is redundant? Hooking into the history API makes more sense. I found it a bit of a pain in the arse to set up a Python app that is managed by a Signalk plugin/web app. And right now, I have one plugin/web app that manages the archiving and another that manage the API server. They could be combined, but they seem distinct enough to keep separate. I am using HTTP requests to deliver the JSON from the stored data. It works great with Node-RED dashboard charts. For a user, the command builder and the archiver would be simple npm installs, followed by configuring each according to the users preferences. Right now, I am installing everything manually by copying the folders into the node_modules folder, running npm install to grab the dependencies, and then adding each of the package.json so they don’t get overwritten when something else is installed or updated. The Python API server works on my RPI, and I have an installer that should work on other RPIs. However, switching from an Ubuntu/AWS instance (where I initially developed it) to Debian/RPI was a manual and tedious process. If you'd like to take a look-see, I can add you as a collaborator to the GitHub repositories. Monday evening. I could also create npm packages, if you prefer. Regardless, it might be helpful for you to see it in action. Also, I recently adapted several Node-RED flows to plugins, including a Mosquitto MQTT broker manager (I send all my local data to a local broker, which bridges to a remote broker. This allows me to get around setting up a proxy.) Building on that thought, I adapted an MQTT Signalk exporter (I know there are others, but I prefer my approach). Sorry for the length of these messges. i didn't have time to write short ones. The crew is awakening. Pax

Teppo Kurki•5mo ago

ah, i must have skipped the plugin part and only read the Python reference - so the archiving part is a plugin. does it use https://www.npmjs.com/package/@dsnp/parquetjs ?

npm

@dsnp/parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format. Latest version: 1.8.7, last published: a month ago. Start using @dsnp/parquetjs in your project by running npm i @dsnp/parquetjs. There are 18 other projects in the npm registry using @dsnp/parquetjs.

Teppo Kurki•5mo ago

instead of installing everything manually are you aware that npm can install modules directly from github or from tar files? the way i deploy wip plugins to my own server i run npm pack on the dev machine, copy the resulting .tar.gz over to the target machine and then npm install from the tar file so no need to publish to npm to let others access it

Teppo Kurki•5mo ago

instead of a "command" to direct the archiving you could manipulate navigation.state and use its value to control the archiving. this would be more universal and interoperable, even if we have yet to document well known values for the state

Appendix A: Keys Reference (Vessel) · Signal K Documentation

Teppo Kurki•5mo ago

there would definitely be value going forward with standardising on a single history API. it is not set in stone, but we can change it as we see fit and needs arise now would probably be a great time to finish the wip OpenAPI description that i have lying around somewhere..... you clearly don't like InfluxDB - it has served me and others pretty well, but i have completely lost faith in it as they first went all in with their inhouse Flux query language, only to completely drop it a few years later. and then i found out that really simple queries are way more performant using InfluxQL (v1 query lang) than Flux. and as a long term storage format InfluxDb sucks as monolithic & propietary. so far i have just exported data in Influx Line Protocol format that is like CSV, but i really like the idea of exporting the data as parquet, which would solve both long term archival and queryable db needs there's also one specific area that Influx does not solve: geo queries, in practice being able to retrieve data based on a bounding box and/or within a distance from a point. is that somehow doable with parquet? are you using MQTT just for access to realtime data or do you have an archival process off the cloud broker? i could also take a stab at implementing the History API on top of your parquet format. never worked with parquet before. my preliminary plan for moving from Influx was to use TimescaleDB, with the added benefit of getting a relational database for other use but Influxdb and timescaledb both have the drawback of requiring a separate service that you need to install, update and to an extent manage. it would be just so much easier to have just a sk server plugin and files

motammanOP•5mo ago

Parquet can absolutely be geospacial via duckdb, which is what i use. https://duckdb.org/docs/stable/core_extensions/spatial/overview.html. Works pretty much like it does on SQL Server and Postgis, or any geospatial-capable data server. Supports all standard geo types and projections. So bounding boxes, distance from a point, intersects, and on and on and on. Duckdb is brilliant. (And, yes, I also use parquet.js.) I think you will be surprised with the simplicity pf the implementation. I use an MQTT bridge between the local/boat box and the AWS instance. the bridge is initiated by the local box so no need for a proxy. It is incredibly reliable. I can mirror as much or as little data as I want. All the payloads are fully formed signalk deltas and the MQTT topics mirror the signalk paths, much in the same way as the parquet files. As a result, the AWS signalk ingestion is a very simple Node-Red flow. (Still, I plan of getting rid of the Node-Red the MQTT import/ingest this week, if I have time, and add it the plugin.) I would love to work with you on the history API implementation. I am not a professional coder, not by a mile, though I have made my career being someone who can code and work with data. In fact, while I have 100s of github repos, I have never intentionally made one public. And I have never used npm in anger. I am engaged here because of irrational love of sailing and my endless need to fiddle and search for the meaning of life , the universe, and everything inside patterns and data.
Having said that, I will set up an install package from my repo, which I will make public. Thanks for the tips. As far as the commands are concerned, I wanted something more flexible than the navigation state. I look forward to getting your feedback on how commands/capture regimens work with archiving. I am not married to the system but I find it really handy. Thank you. One more thing on the regimens: commands are not a requirement. They can be any existing bool.

Teppo Kurki•5mo ago

I’ve been just writing everything, as storage and performance have not really been a factor

Gaming

Programming

motamman - The system has three components: A c...

Did you find this page helpful?