HomeCategoriesAll Tags

Introduction to Cube.js

Quick guide on Cube.js concepts

Introduction

  • Cube is the Semantic Layer for building data apps.
  • It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application.
  • Cube was designed to work with all SQL-enabled data sources.
  • Expose your data to every application via REST, GraphQL, and SQL APIs.
  • Cube is headless, API-first, and decoupled from visualizations.
  • Cube fully manages query orchestration, database connections, as well as caching and access control layers. The result is then sent back to the client.
  • You can get started with Cube locally or self-host it with Docker or use Cube Cloud.
  • Cube acts as a data access layer, translating API requests into SQL, managing caching, queuing, and database connection.
  • Cube accepts queries via REST, GraphQL or SQL interfaces. Based on the data model and an incoming query, Cube generates a SQL query and executes it in your database.
  • Cube fully manages query orchestration, database connections, as well as caching and access control layers. The result is then sent back to the client.

Getting Started

  • Think of Cube as a Data Schema that is used to model raw data into meaningful business definitions and pre-aggregate data for optimal results.
  • So instead of database tables think of designing Cubes in such a way that you can query them in simple language and get meaningful analytics from them.
  • It is like putting a question which Cube.js will turn into query and fetch that data.
  • Cube is conceptually similar to a view in SQL.

My Takeaway

  • Its a handy tool for cases where you need to make dynamic queries.
  • You may almost replace your BE server by cube.js for all your GET requests. It will automatically make the joins and return the requested data.
  • For security, we can simply put a auth server or serverless auth function in between client and cube.js server.
  • Make sure to normalize the SQL DB schema as much as possible. Then define the relationships between the cubes.
  • Sometimes, offering this flexibility to client to query the database for any information dynamically may lead to complex queries which can spike the load on DB, resulting in performance issues.

Technical cubejs concepts and terms:

1. Measures

  • Quantitive data, such as number of users, total amount, average price, and so on.
  • SELECT count(id) from users;
  • Measures can also be filtered and we can perform calculations like percentage or anything as pwer our need. Then the SQL query will be like
cube(`Users`, {
  measures: {
    count: {
      sql: `id`,
      type: `count`,
    },

    paying_count: {
      sql: `id`,
      type: `count`,
      filters: [{ sql: `${CUBE}.paying = 'true'` }],
    },

    paying_percentage: {
      sql: `100.0 * ${paying_count} / ${count}`,
      type: `number`,
      format: `percent`,
    },
  },
});
SELECT
  100.0 * count(
    CASE WHEN (users.paying = 'true') THEN users.id END
  ) / count(users.id) "users.paying_percentage"
FROM users

2. Dimensions

  • Categorical data like columns of table such as product name, product category, order status.

You may use Parse_Timestamp to convert time stored in string format to timestamps. Eg:

  dimensions: {
    date: {
      sql: `PARSE_TIMESTAMP('%Y-%m-%d', date)`,
      type: `time`,
    },
  },

3. Joins

  • Joins define the relationships between cubes, which then allows accessing and comparing properties from two or more cubes at the same time. In Cube, all joins are LEFT JOINs.

relationship: belongsTo, relationship: hasOne, relationship: hasMany,

  • Be careful about the direction of joins, that can have an impact on final result as explained here.

4. Segments

Segments are filters that are predefined in the schema instead of a Cube query. They allow simplifying Cube queries and make it easy to re-use common filters across a variety of queries.

cube('Orders', {
  segments: {
    only_completed: {
      sql: `${CUBE}.status = 'completed'`},
  },
});

5. Pre-Aggregations

Pre-aggregations are a powerful way of caching frequently-used, expensive queries and keeping the cache up-to-date on a periodic basis.

cube('Orders', {
  preAggregations: {
    main: {
      measures: [CUBE.count],
      dimensions: [CUBE.status],
      timeDimension: CUBE.created_at,
      granularity: 'day',
    },
  },
});

6. Data Blending

Here we create a cube by using sql’s UNION All query to get a combined data of both the tables when those tables are not joined by any relationship.

7. Dynamic Schema Creation

Create schemas on the fly using asyncModule() function.

Utility functions like convertStringPropToFunction, transformDimensions, transformMeasures as described here.

8. Tools Involved

Cube.js offers a developer playground that is a web based tool to see it in action from generating the data schema, executing queries, building dashboards. Its useful at times for dev testing and local development.

Resources to learn more

Hope this was helpful. Thanks!!

- Ayush 🙂