Title: Efficient time-series storage for enabling IoT device intelligence
Subject: Computer science
Level: Basic
Description: Background:
During the last few years there has been a staggering pace of development, attention, and increasing maturity, of cloud enabled services. Even in the popular press there is every week articles on how Artificial Intelligence will change many aspects of our everyday life. Intelligent devices are also often mentioned in the same context, the so called Internet-of-things (IoT) devices. However, to develop intelligent systems and devices there is often need for huge amounts of data from many different sources. For example, Facebook has billions of users which each day perform actions which is one important driving force for steering future development of the services Facebook provides. The access to this vast amount of data, and the capabilities of transferring this data to the cloud for processing and storage, is not as easy for industrial systems. Often industrial systems are in remote locations with less than ideal connectivity options. Still, providers of industry systems also want to develop the same kind of intelligence for their systems and devices. One big difference though, is that industry systems typically wants to collect data with millisecond sampling rates, or even nanosecond rates, while Facebook and many others work with a second as the lowest resolution.

One important source of data for developing machine intelligence is time-series data, and there are already many existing open-source time-series databases (TSDB) focusing on this particular problem. One reason for having specific databases for time-series data is due to the need to compress the data in order to be able to increase the amount of data stored and/or the frequency with which data is stored; traditional databases are simply too in-efficient in this respect. Typically a time-series consists of a time stamp (8bytes) and a value (typically 8 bytes), i.e., 16 bytes per sample. Current open-source TSDBs can on a typical case reduce the storage need from 16 bytes per sample to 1.3 bytes per sample, i.e., more than a 10 time reduction in storage need. However, there is one caveat with this. Many of the open-source TSDBs are focusing on time-series data generated from machine logs in large datacenters, where the sampling frequency is at most 1 second. Consequently, their TSDB implementations are optimized for that particular use-case and cannot directly be used in an industry context. (Facebook has an open-source TSDB here: https://github.com/facebookincubator/beringei, an extensive list of existing of TSDBs can be found here: https://misfra.me/2016/04/09/tsdb-list/ )

This thesis project will be conducted at ABB Force Measurement in Västerås, Sweden. At Force Measurement we develop, produce, and sell advanced measurement equipment and control systems for use in process industry. Several of our products are used in cold-rolling mills, hot-rolling mills, paper production lines, and even cruise ships. This thesis project could possibly become one important part of the puzzle of enabling further intelligence in our products.

Goals, problems:
Benchmarking of a select set of existing open-source time-series databases to determine their suitability for using in an industrial context. Based on the benchmarking and study of features of the set of TSDB solutions there will most likely be a gap compared to what is needed by industry. The second phase of the project is concerned with proposing changes to one of the open-source TSDBs, implementing the proposed changes, and finally benchmarking the implementation to see if the goals have been met.

The proposed solution also needs to consider how the data is gathered in an industry site and transferred to the cloud.
To be more specific, there will most likely be a need for an in-memory TSDB, with possibilities for persisting it to disk.

Expected outcomes:
Thesis report documenting:
- Result of benchmarking and selection criteria used in evaluating the set of open-source TSDBs
- Proposed design changes to at least one existing TSDB
Prototype implementation:
- Extend one open-source TSDB to become useful in the industrial context
- If time permits, also look into how this data can be transferred to the cloud for further processing and storage
Start date:
End date:
Prerequisites: Knowledge in embedded systems and programming in C/C++. Meriting with understanding and knowledge in Cloud offerings, such as those from Microsoft Azure.
IDT supervisors: Dag Nyström
Examiner: Dag Nyström
Comments: Preferably two students on Bachelor level, but open for discussion.

Application to thesis has to be done through ABB Job Offerings portal. Link to be provided later.
Company contact: ABB Force Management Markus Lindgren markus.r.lindgren@se.abb.com