IntroductionIn the context of United Nations Sustainable Development Goals (SDGs), monitoring and analyzing data is crucial for assessing progress and identifying areas that require intervention. However, traditional data analysis methods often involve complex SQL queries and technical expertise, limiting accessibility for non-technical users.
This case study demonstrates the application of OpenAI's Turbo 3.5 model, coupled with custom database schema and Llama indexing, to enable natural language querying of SDG data, empowering non-technical users to easily access and analyze relevant information.
Problem StatementAssigning the task of data analysis to non-technical users can present challenges, particularly when dealing with complex data sets and intricate SQL queries. Traditional methods often require users to possess technical expertise, limiting accessibility and hindering effective data exploration.
SolutionTo address this challenge, we employed OpenAI's Turbo 3.5 model, a powerful language model, to translate natural language queries into executable SQL statements. This enables users to pose questions in plain English, eliminating the need for SQL proficiency.
Technical Implementation- Data Preparation: Data from various sources was integrated into a unified database with a customized schema tailored to the specific SDG indicators under consideration.
- Model Training: OpenAI's Turbo 3.5 model was trained using the Retrieval Augmented Generation (RAG) approach, incorporating our custom database schema to enhance its understanding of the data structure and relationships.
- Indexing: Llama indexing, specifically TreeIndex, was utilized to efficiently index the custom database schema, and Llama Querying was used for retrieval of relevant data during query execution.
- Front-end Development: Streamlit, a Python-based framework, was used to develop a user-friendly front-end application. This interface allows users to input natural language queries. The user query along with contextual data are then processed by the Turbo 3.5 model, translated into SQL queries, and executed against the database.