Leveraging OpenAI's Turbo 3.5 Model for
Natural Language Querying of
Sustainable Development Goals Data

Training OpenAI foundational model with custom SQL schema
Introduction
In the context of United Nations Sustainable Development Goals (SDGs), monitoring and analyzing data is crucial for assessing progress and identifying areas that require intervention. However, traditional data analysis methods often involve complex SQL queries and technical expertise, limiting accessibility for non-technical users.

This case study demonstrates the application of OpenAI's Turbo 3.5 model, coupled with custom database schema and Llama indexing, to enable natural language querying of SDG data, empowering non-technical users to easily access and analyze relevant information.

Problem Statement
Assigning the task of data analysis to non-technical users can present challenges, particularly when dealing with complex data sets and intricate SQL queries. Traditional methods often require users to possess technical expertise, limiting accessibility and hindering effective data exploration.

Solution
To address this challenge, we employed OpenAI's Turbo 3.5 model, a powerful language model, to translate natural language queries into executable SQL statements. This enables users to pose questions in plain English, eliminating the need for SQL proficiency.

Technical Implementation

  1. Data Preparation: Data from various sources was integrated into a unified database with a customized schema tailored to the specific SDG indicators under consideration.
  2. Model Training: OpenAI's Turbo 3.5 model was trained using the Retrieval Augmented Generation (RAG) approach, incorporating our custom database schema to enhance its understanding of the data structure and relationships.
  3. Indexing: Llama indexing, specifically TreeIndex, was utilized to efficiently index the custom database schema, and Llama Querying was used for retrieval of relevant data during query execution.
  4. Front-end Development: Streamlit, a Python-based framework, was used to develop a user-friendly front-end application. This interface allows users to input natural language queries. The user query along with contextual data are then processed by the Turbo 3.5 model, translated into SQL queries, and executed against the database.
Technical Architecture
Results
The implementation of this solution has significantly improved data accessibility and usability for non-technical users. Users can now easily formulate queries in plain English, receive meaningful insights, and monitor trends related to the SDG indicators without the need for technical expertise.

Benefits
  1. Enhanced Data Accessibility: Natural language querying empowers non-technical users to access and analyze SDG data, fostering greater involvement in data-driven decision-making.
  2. Improved Data Exploration: The solution facilitates seamless data exploration, enabling users to uncover patterns and trends with ease.
  3. Simplified Data Analysis: The translation of natural language queries into SQL eliminates the need for technical expertise, simplifying data analysis for a wider audience.

Conclusion
By leveraging OpenAI's Turbo 3.5 model, custom database schema, and Llama indexing, we have successfully developed a solution that enables natural language querying of SDG data. This approach has significantly enhanced data accessibility and usability for non-technical users, empowering them to actively participate in data-driven decision-making and contribute to the achievement of Sustainable Development Goals.

Have a project in your mind?