Skip to content

The Data Scientist

the data scientist logo
Data-driven insights

Using AI Chatbots to Query Databases


Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !

Data has always been valuable to businesses, with executives, business leaders, and other stakeholders constantly seeking data-based insights. For a long time, these insights were difficult to access and required a lot of time to prepare.

But over the past few decades, the barriers to data-driven insights have fallen, one after another. Businesses have been steadily able to analyze more data, apply deeper analytics, speed up time to insights, and add more visualizations.

The latest change has been the rise of artificial intelligence (AI) chatbots, which allow users to query data using natural language. ChatGPT and other conversational AI tools use large language models (LLMs), which are trained on massive datasets, making them able to understand and respond in natural language.

Life in the data dark ages

Once upon a time, business leaders received new data insights sporadically, possibly as often as once a quarter. Data exploration required extensive training and was only possible for experts.

Eventually, faster business intelligence (BI) data analytics tools appeared, which could run queries in minutes rather than weeks. However, stakeholders mostly still needed data scientists to identify which datasets to analyze, prepare data, and determine which charts, graphs, or visualizations to set up. Waiting for the data science team to address your query could still take days.

Even experts took some time to set up and prepare a database, and to convert their query into clean SQL code. While data scientists surely appreciated the new tools that were available to them, they were often overwhelmed with requests for insights. Each request was “urgent,” leaving them constantly reprioritizing requests as more came in, and delaying their own work to respond.

Finally, in the last year or so, AI-powered data analytics tools have begun to arrive.

The AI data exploration advantage

The new wave of data exploration platforms leverage AI to operate as chatbots. Any business user can ask a data-based query in natural language, without needing to know how the data is structured, SQL syntax or any other code. This democratizes access, eases the burden on data science teams, and speeds up time to insights.

Chatbots can help the user formulate their query more effectively, sometimes even suggesting the best visualization, chart, or graph for any given situation. This can help the user improve their query for far better results, guide them to new and better lines of inquiry, and optimize queries in ways that improve performance, bringing faster responses as well as more useful answers. According to a BARC survey, faster time to insight and a reduced workload are the top anticipated benefits of bringing GenAI into BI, cited by 49% and 48% of respondents.

AI chatbot data exploration delivers answers that are explained more clearly and are easier to understand. For data analysts, this means help with communicating results in ways that non-tech experts can grasp, while non-technical users receive responses that they can understand quickly, along with contextual explanations that simplify complex data concepts.

Additionally, AI-powered data platforms assist with data prep tasks, automating repetitive data entry and cleansing to reduce the risk of errors and free data scientists for more strategic tasks. They can map data from different sources, open up new data possibilities, and identify and rectify data inconsistencies to deliver higher data quality.

AI can generate the code snippets needed for data discovery and database setup, operations, and management. This saves time for data scientists, but also enables non-techies to carry out these tasks independently for the first time, whereas previously they might have struggled to manage a dashboard that was already set up for them. All these innovations translate into real time insights, and fewer or zero delays in gathering, uploading, and preparing data for querying.

Different approaches to AI database queries

As AI data exploration tools become available, it’s not surprising to see a range of different approaches.

Databricks Assistant

Databricks recently launched the Databricks Assistant, described as a context-aware AI assistant that translates your native-language data questions into clean SQL queries. Once you describe your task in the text window, the Assistant creates SQL code that you can run on your own database.

Databricks Assistant doesn’t access your data itself. Instead, it leverages metadata to understand your data assets so that it can help you generate the best code for your needs.

Databricks Assistant is aimed at more experienced users, who may not be data scientists themselves, but are already familiar with using code to query data. Explanations, visualizations and reports need to be generated elsewhere.

Pyramid Analytics

Pyramid Analytics is a GenAI native platform that scans your data to create contextual metadata that it shares with the LLM of your choosing. As a non-SaaS tool, it doesn’t import or cache your data to any Pyramid servers or apps. It just sits as a layer on top of your data repository, so your data is never exposed to any LLM, and there’s no risk of adding vulnerabilities.

The Pyramid platform uses LLMs to produce the most appropriate query for your purposes, and then runs the query for you on your linked data sources. It creates a full range of dynamic, manipulatable reports, charts, and dashboards, automatically selecting the best visuals for your needs, and then allows people to refine charts, segment data, and even ask follow-up questions.

Pyramid stands out for its accessibility to every data user, including those who have no experience with databases or code, as long as they have some knowledge of the information they wish to explore. It responds to both voice and text prompts, even those that are worded vaguely.

Julius

Julius offers another GenAI native data exploration platform, although on a smaller scale.

Unlike Pyramid and Databricks, the Julius AI data assistant uploads your data to the LLM, and then uses its GenAI models to run your queries based on natural-language text prompts. It delivers plenty of different visualizations, including fancy animations, and it automates data prep for your datasets.

Julius promises strong security policies that protect your data privacy and ensure data security. It’s more suitable for users with smaller and more evergreen datasets, since you’ll need to upload the data before you can query it, which can take a long time for large enterprise datasets.

Ana by TextQL

TextQL’s Ana is billed as a personal data assistant that sits over and navigates your full semantic layer. It integrates with your existing data sources and tools to scan, understand, and retrieve data from every location, platform, and dashboard, including forums like Slack and Teams.

Users can write natural language queries, and receive suitable reports, graphs, and visualizations in response.

Since Ana connects with your data, it offers strong security guarantees to affirm your data security. The tool anonymizes your data before using it, to preserve data privacy, but it does apply the data to finetune LLMs according to your needs. Ana is intended for the most basic and non-technical data user, but some find the interface to be a bit confusing.

The challenges of AI-powered data exploration

There remain a number of challenges and drawbacks to the use of AI chatbots for data queries. For a start, limited context awareness could result in inaccurate responses, and/or the model might hold hidden biases.

Natural language queries bring their own potential for confusion. The LLM might interpret ambiguous or unclear language incorrectly. Conversational queries could be translated into complex SQL statements which are very heavy to process, while a data scientist faced with the same question might have formulated a far simpler SQL statement that would have produced the same answers with fewer compute needs.

Security risks remain a serious concern. A Metomic survey reported that 72% of CISOs worry about security breaches connected to GenAI, and for good reason. Natural language interfaces are vulnerable to XSS and other types of attacks that aren’t usually a threat to databases. Using natural language could result in a SQL injection of malicious code, or cause a response that inadvertently exposes sensitive data.

AI can bring a new era of data insights

AI-powered data exploration platforms have definitively arrived at the desks of enterprises everywhere. Business users who aren’t data science experts are finally able to independently process data and run queries, and receive responses in minutes, without waiting for their data science teams. While there remain issues that need to be addressed, the use of AI for data investigation promises a new equality for data-based decision-making.


Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !