Interface

The sMAP archiver is a streaming storage manager which adds tools for storing time-series data from sMAP sources, and accessing both historical and real-time data. It may be used as an interface for developing applications which access sMAP data, or retrieve data for offline analysis.

The archiver API is available over HTTP as well as several alternative interfaces (mentioned below)

API

The Giles HTTP API offers the following endpoints:

The default port for this interface is 8079, though this is configurable.

Giles offers non-HTTP interfaces to make it easier to use the archiver from embedded devices, web services and other sources. These interfaces currently include

These interfaces, while different from the usual HTTP interface (no such thing as a "URL" at layer 4), do their best to provide analogous functionality. Detailed documentation is forthcoming, but currently the easiest way to adapt the Giles interface to non-HTTP clients is to write a short bit of middleware to do the protocol translation.

Query Language

The sMAP query language (original formulation can be found here) is a simple, SQL-like language that allows the user to treat Metadata tags like SQL column names. Giles implements a modern reimplementation with an eye towards extensibility. The full YACC implementation of the sMAP query language is here. Aside from sMAP operators, which have yet to be implemented, the Giles-flavored sMAP query language aims to support the full range of old sMAP queries, as well as some new features.

To execute queries, query strings can be sent as the body of a POST request to the query-endpoint on an archiver instance. Over the HTTP interface, this might look something like (for a local archiver)

$ curl -XPOST -d "select data before now where Metadata/XYZ=123" http://localhost:8079/api/query

In the following snippets of documentation, bolded words indicate keywords that are meant to be typed as-is (e.g. if a query definition starts with select, the actual query string will start with the word select). Non-bolded words will be defined elsewhere.

Select Query

select selector where where-clause

The basic select query retrieves a JSON list of documents that match the provided where-clause. Each JSON document will correspond to a single timeseries stream, and will contain the tags contained in the selector. Omitting where where-clause from this query will evaluate the selector against all timeseries streams in the database.

Selector

A selector can be

Where

The where-clause describes how to filter the result set. There are several operators you can use: Tag values should be quoted strings, and tag names should not be quoted. Statements can be grouped using parenthesis. The where-clause construction is used in nearly all sMAP queries, not just select-based ones.

Operator Description Usage Example
= Compare tag values. tagname = "tagval" Metadata/Location/Building = "Soda Hall"
like String matching. Use Perl-style regex tagname like "pattern" Metadata/Instrument/Manufacturer like "Dent.*"
has Filters streams that have the provided tag has tagname has Metadata/System
and Logical AND of two queries (on either side) where-clause and where-clause has Metadata/System and Properties/UnitofTime = "s"
or Logical OR of two queries
not Inverts a where clause not where-clause not Properties/UnitofMeasure = "volts"
in Matches set intersection on lists of tags [list,of,tags] in tagname ["zone","temp"] in Metadata/HaystackTags

Data Query

select data in (start-reference, end-reference) limit as where where-clause
select data before reference limit as where where-clause
select data after reference limit as where where-clause

You can access stored data from multiple streams by using a data query. Data matching the indicated ranges will be returned for each of the streams that match the provided where-clause.

As

The as component allows a query to specify what units of time it would like the data returned as. The default is milliseconds, but the user can specify others (ns, us, ms, s) as per the Unix-compatible notation in the Time Reference table below.

For a sample source, here's the same data point with 4 different units of time. Obviously the resolution is only as good as the underlying source. The archiver does not add additional time resolution, so if our source published in milliseconds, querying for data as micro- or nanoseconds would not return more detailed information. The sample source here reported in nanoseconds.

Also note that the nanosecond representation is returned in scientific notation. This is a known issue and will be fixed in an upcoming release

smap> select data before now as s where uuid = "50e4113d-f58e-468f-b197-8b90a49d42e9";
{
  "Readings": [
    [
    1431290271.0,
    577
    ]
  ],
  "uuid": "50e4113d-f58e-468f-b197-8b90a49d42e9"
}

smap> select data before now as ms where uuid = "50e4113d-f58e-468f-b197-8b90a49d42e9";
{
  "Readings": [
    [
    1431290271944.0,
    577
    ]
  ],
  "uuid": "50e4113d-f58e-468f-b197-8b90a49d42e9"
}

smap> select data before now as us where uuid = "50e4113d-f58e-468f-b197-8b90a49d42e9";
{
  "Readings": [
    [
    1431290271944557.0,
    577
    ]
  ],
  "uuid": "50e4113d-f58e-468f-b197-8b90a49d42e9"
}

smap> select data before now as ns where uuid = "50e4113d-f58e-468f-b197-8b90a49d42e9";
{
  "Readings": [
    [
    1.431290271944557e+18,
    577
    ]
  ],
  "uuid": "50e4113d-f58e-468f-b197-8b90a49d42e9"
}

Limit

The limit is optional, and has two components: limit and streamlimit. limit controls the number of points returned per stream, and streamlimit controls the number of streams returned. For the before and after queries, limit will always be 1, so it only makes sense to use streamlimit in those cases. The exact syntax looks like

limit number streamlimit number

where number is some positive integer. Both the limit and streamlimit components are optional and can be specified independently, together or not at all.

Time Reference

Data can be retrieved for some time region using a range query (in) or relative to some point in time (before, after). These reference times must be a UNIX-style timestamp, the now keyword, or a quoted time string.

Time references use the following abbreviations:

Unit Abbreviation Unix support Conversion to Seconds
nanoseconds ns yes 1 second = 1e9 nanoseconds
microseconds us yes 1 second = 1e6 microseconds
milliseconds ms yes 1 second = 1000 milliseconds
seconds s yes 1 second = 1 second
minutes m no 1 minute = 60 seconds
hours h no 1 hour = 60 minutes
days d no 1 day = 24 hours

Time reference options:

Examples

Retrieve the last 15 minutes of data for streams 26955ca2-e87b-11e4-af77-0cc47a0f7eea and 344783b6-e87b-11e4-af77-0cc47a0f7eea

smap> select data in (now -15m, now) where uuid = "344783b6-e87b-11e4-af77-0cc47a0f7eea" or uuid = "26955ca2-e87b-11e4-af77-0cc47a0f7eea";

Retrieve a week of data for all streams from Soda Hall

smap> select data in ("1/1/2015", "1/7/2015") where Metadata/Location/Building = "Soda Hall";

Retrieve the most recent data point for all temperature sensors

smap> select data before now where Metadata/Type = "Sensor" and Metadata/Sensor = "Temperature";

Set Query

set set-list where where-clause

The set command applies tags to a set of streams identified by a where-clause. set-list is a comma-separated list of tag names and values, e.g.

smap> set Metadata/NewTag = "New Value" where not has Metadata/NewTag

Unless Giles is configured to ignore API keys, a set command will only apply tags to streams that match the where clause AND have the same API key as the query invoker.

Delete Query

delete tag-list where where-clause delete where where-clause

Currently, Giles only supports delete queries on metadata, not timeseries data. A delete query is applied to all documents that match the provided where-clause. tag-list is a comma-separated list of tag names. If provided, the delete query will remove those tags from all matched documents. If tag-list is ommitted, the delete query will remove every document that matches the where clause.

Example of removing tags from a set of documents

smap> delete Metadata/System, Metadata/OtherTag  where Metadata/System = "Botched Value";

Example of removing set of documents

smap> delete where Path like "/oldsensordeployment/.*"

Republish

Giles provides the ability to get near real-time access to data incoming to Giles. This is called republish in sMAP parlance, and is a variation of content-based pub-sub. A client registers a subscription with the archiver using a where clause. Following subscription, the archiver will forward all data to the client on streams that match the provided where clause. If the metadata for a stream changes, the set of matching streams is updated for each related query.

HTTP-based republish is initiated by a client sending a POST request containing a where clause to the /republish resource on the archiver. This connection is kept open by the archiver, and real-time data from the subscription is forwarded to the client for as long as the connection is kept open.

Here is an example of republish using cURL, subscribing to all temperature sensors (for a local archiver)

$ curl -XPOST -d "Metadata/Type = 'Sensor' and Metadata/Sensor = 'Temperature'" http://localhost:8079/republish

The Python sMAP library provides a nice helper class for doing republish from Python. It uses the Python Twisted library for asynchronous networking support:

from twisted.internet import reactor
from smap.archiver.client import RepublishClient

archiverurl = 'http://localhost:8079'

# called every time we receive a new data point
def callback(uuids, data):
    print 'uuids',uuids
    print 'data',data

query = "Metadata/Type = 'Sensor' and Metadata/Sensor = 'Temperature'"
r = RepublishClient(archiverurl, callback, restrict=query)
r.connect()

reactor.run()

Republish is also available over WebSockets. If the WebSocket interface is enabled on Giles, then a client can open up a WebSocket-based subscription by opening a WebSocket to ws://localhost:8078/republish (for a local archiver), and then sending the where clause as a message. Here is an example in Python

from ws4py.client.threadedclient import WebSocketClient

class DummyClient(WebSocketClient):
    def opened(self):

        self.send("Metadata/Type = 'Sensor' and Metadata/Sensor = 'Temperature'")

    def closed(self, code, reason=None):
        print "Closed down", code, reason

    def received_message(self, m):
        print m

try:
    ws = DummyClient('ws://localhost:8078/republish')
    ws.connect()
    ws.run_forever()
except KeyboardInterrupt:
    ws.close()

Obviously, these are just Python-based examples. Being web-based technologies, it is possible to use any language/library you want (that correctly implements HTTP or WebSockets) to interface with republish (or any other feature of the archiver).

Data Publication

The majority of data is published to the sMAP archiver through the instantiation and execution of a sMAP driver, but this is not the only way to send data to the archiver. Indeed, for some of the newer features supported by Giles but not yet by the Python sMAP client library distribution, alternative methods of data publication are the only way to use some functionality.

Illustrated here are the JSON-versions of sMAP objects, though translations of these exist for other non-JSON/HTTP interfaces.

{
    "/sensor0": {                           // At the top level of a sMAP object is the Path
        "Metadata": {                       // Metadata describes attributes of the data source, and is specified
            "Location": {                   //  as nested dictionaries. Here, "Berkeley" is under the key
                "City": "Berkeley"          //  Metadata/Location/City
            },
            "SourceName": "Test Source"     // Metadata/SourceName is how the plotter identifies a stream of data
        },
        "Properties": {                         // Properties describe attributes of the stream. These MUST be
            "Timezone": "America/Los_Angeles",  //  kept consistent, because they affect how the stream is stored.
            "ReadingType": "double",        // If a "numeric" stream, designates the class of number permitted
            "UnitofMeasure": "Watt",        // Units of measure for the stream
            "UnitofTime": "ms",             // Units of time used in the timestamp
            "StreamType": "numeric"         // Describes type of data in Readings: "numeric" or "object"
        },
        "Readings": [                       // This is an array of (timestamp, value) tuples. Timestamps should be
            [                               //  consistent with Properties/UnitofTime
                1351043674000,              // A timestamp
                0                           // A numeric value
            ],
            [                               // Readings can contain more than one tuple
                1351043675000,
                1
            ]
        ],
        "uuid": "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61" // The globally unique identifier for this stream
    }
}

Each sMAP object sent to the archiver MUST contain at least the top-level Path, which contains a dictionary with the Readings and uuid keys, e.g.

{
    "/sensor0": {
        "Readings": [
            [
                1351043674000,
                0
            ]
        ],
        "uuid": "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61"
    }
}

Typically, the first object sent to the archiver for a new stream "initializes" the stream by sending all of the Metadata and Properties at once, and then just sending the minimal object above for updates to the Readings. Metadata/Properties can be changed by including the updates for those keys/values in the sent object, much like the "initial" object. For example, if we wanted to update the Metadata for the above stream to change the city from Berkeley to Mendocino, we would send the following object

{
    "/sensor0": {
        "Metadata": {
            "Location": {
                "City": "Mendocino"
            }
        },
        "Readings": [
            [
                1351043674000,
                0
            ]
        ],
        "uuid": "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61"
    }
}

For the HTTP interface, each of these JSON objects would be sent as the body of a HTTP POST request sent to the /add/<key> resource of a running archiver.

In Python, using the requests library, this would look like

import requests
import json
archiverurl = "http://localhost:8079/add/apikey"
smapMsg = {
    "/sensor0": {
        "Readings": [
            [
                1351043674000,
                0
            ]
        ],
        "uuid": "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61"
    }
}
requests.post(archiverurl, data=json.dumps(smapMsg))

It is good practice to include the Content-Type: application/json HTTP header, though many libraries will add this automatically.

Publishing Objects

Recently introduced to Giles is the ability to archive and subscribe to non-numeric data. This should be considered an extremely alpha feature, and potentially buggy.

Usage is very similar to the normal numeric interface, with two exceptions.

Firstly, all object-based streams (rather than numeric streams) must have Properties/StreamType = "object" set. A stream cannot publish both objects and numeric data (unless that numeric data is transmitted as an object) because the archiver needs to track which data store to query data from. Streams should be made numeric-only wherever possible, because this enables a much richer set of operations and queries upon the data.

Secondly, the Readings portion of a sMAP message, instead of only having numbers as the second element of each 2-tuple reading, can now contain any JSON-serializable data. This means:

For object storage, Giles encodes each object as a MsgPack-encoded binary string. Giles places no restrictions on the consistency of objects, so an individual stream is not limited to pushing only arrays or only strings, but can vary the data-type.

Here is an example of a stream that pushes arrays

{
    "/sensor0": {
        "Metadata": {
            "Location": {
                "City": "Berkeley"
            },
            "SourceName": "Test Source"
        },
        "Properties": {
            "Timezone": "America/Los_Angeles",
            "ReadingType": "double",
            "UnitofMeasure": "Watt",
            "UnitofTime": "ms",
            "StreamType": "object"          // This denotes this timeseries as an object-stream
        },
        "Readings": [
            [
                1351043674000,              // A timestamp
                [1,2,3]                     // A vector value (JSON serializable)
            ],
            [                               // Readings can still contain more than one object
                1351043675000,
                ["a","b","c"]               // Object types do not have to be consistent
            ]
        ],
        "uuid": "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61" // The globally unique identifier for this stream
    }
}