Skip to content

The Data Scientist

Data architecture and its significance

In a world that is driven by data and digital transformation it is absolutely vital that businesses have the right infrastructure, data strategy and solution set up early-on. This is where data architecture comes into play.

What is Data Architecture?

Data Architecture is a significant component required to gather and analyse your business data. Once in place it will allow you to manage the level of data handled effectively. It is defined as “a set of rules, policies, and models that determine what kind of data gets collected, and how it gets used, processed and stored within a database system.” (Keith D.Foote)

The Data Management Body of Knowledge (DMBOK), describes data architecture as “specifications used to describe existing state, define data requirements, guide data integration, and control data assets as put forth in a data strategy.”

In the early development stages of a corporation choosing and even changing from one data architecture platform to another may not seem like that big of a deal. However, if it is done at a later stage the ramifications can be huge, costing the business both time and money.

Slacks Data Architecture Migration

An example of a business migrating their data architecture is the world-leading communications company Slack. They had spent 3 years (from 2017 to 2020) migrating their data architecture from active-active clusters to a tool called Vitess. (Slack Engineering).

Slack highlights that setting up the new infrastructure was not an easy process. Instead its requiring a lot of time and effort. This included them altering their “operational processes for provisioning deployments, service discovery, backup/restore, topology management, credentials, and more.”(Slack Engineering). Ultimately, they were completely satisfied with the end result.

This is why it is key to explore your possibilities and pay careful thought as early on as possible into which data infrastructure will work best for your business model, to avoid any obstacles later on.  Additionally, hiring a data science architect can be of great assistance. 

The graph below shows Slacks migration progression and a few of their milestones over the last few years:

Source: https://slack.engineering/scaling-datastores-at-slack-with-vitess/

Data Architecture Components

There are some vital components required to ensure a truly modern data architecture. Redpoint Global sums up the five critical components for data architecture as being:  

  1. Flexibility at scale
  2. Support for parallel and distributed processing.
  3. Democratized data access
  4. Easy to use without specialized training
  5. Ability to handle all data types.

Ultimately, it is significant to remember not to consume all your time overwhelming yourself trying to figure out the perfect data architecture. You can only go off the knowledge you have at that current time therefore, moving at a timely pace and adjusting accordingly is the best way forward.

You can also read more about the data science architect role and how to build a data-driven culture.

Wanna know more about data science? Make sure to check out The Tesseract Academy’s upcoming events and our free What it’s like to be a data scientist and What’s the best way to become a data scientist !