Features and Roadmap

High-performance Database

  • Columnar in-memory engine for extremely high throughput and low latency

  • Columnar hybrid (in-memory and disk-based) engine delivers fast performance for data warehouses with vast amount of data

  • Flexible partition schemes: value, range, list, hash and composite partitions

  • Support millions of partitions for a table

  • In-database analytics: complicated computing can be executed within the database. Significantly reduces time for data transfer.

  • Native support for processing time series data with up to nanosecond precision

  • Standard SQL with enhancements, such as panel data processing, bi-temporal joins (asof join, window join), window functions, pivoting, composite columns, etc.

  • Table co-location for fast join

  • Support data compression

  • Support dynamically increasing table columns

Support dynamically increasing table columns

  • Highly expressive. Support imperative programming, functional programming, vector programming, SQL programming, and RPC (remote procedure call) programming.

  • Easy to learn. The syntax is very similar to SQL and Python.

  • About 600 built-in functions for various data types (number, temporal, string), data structures (vector, matrix, set, dictionary, table), and system calls (file, database, distributed computing).

  • Extended functionalities with user defined functions and plugins

Distributed Computing

  • High speed distributed computing through in-memory engine, data localization, fine-grained data partitioning, and parallel computing.

  • Offer various built-in computing models such as pipeline, map-reduce, and iterative computing.

  • Provide snapshot isolations on computing of distributed dynamic data

  • Boost system throughput by sharing data copies in memory among multiple jobs

  • Efficient programming for distributed computing. Can write script on one node to execute on the entire cluster instantly without the need of compilation and deployment.

  • Automatic data replica management for load balance and fault tolerance with embedded distributed file system

  • Convenient horizontal scaling on both storage capacity and computing capacity

Real-time Data Streaming

  • Adopt publish/subscribe framework. Support chained subscription.

  • First-class support for stream-table duality. Publishing a message is equivalent to inserting a row to a table. Can use SQL queries on local or distributed streaming data.

  • Deliver messages with sub-millisecond latency

  • Update historical data warehouse with live data with sub-second delay.

  • Replay historical messages from arbitrary offset.

  • Provide configurable building blocks (e.g. partition, worker, queue) for traffic control and performance tuning

System Management and APIs/plugins

  • Embedded web interface for cluster management, performance monitoring and data access.

  • System monitoring via built-in functions, web interface, or Prometheus.

  • Portable IDE for data analysis.

  • Programming APIs for C++, C#, Java, Python, R, JavaScript and Excel.

  • User access control on tables and functions

  • Run scheduled tasks of user-defined functions

We are working on the following features:

  • Use just-in-time compiling to improve the performance of iterative computing.

  • Offer more built-in machine learning packages.

  • Provide support for other distributed file systems.