Data science through natural language with ChatGPT's Code Interpreter.

Published in: Translational and clinical pharmacology (2024)

Large language models (LLMs) have emerged as a powerful tool for biomedical researchers, demonstrating remarkable capabilities in understanding and generating human-like text. ChatGPT with its Code Interpreter functionality, an LLM connected with the ability to write and execute code, streamlines data analysis workflows by enabling natural language interactions. Using materials from a previously published tutorial, similar analyses can be performed through conversational interactions with the chatbot, covering data loading and exploration, model development and comparison, permutation feature importance, partial dependence plots, and additional analyses and recommendations. The findings highlight the significant potential of LLMs in assisting researchers with data analysis tasks, allowing them to focus on higher-level aspects of their work. However, there are limitations and potential concerns associated with the use of LLMs, such as the importance of critical thinking, privacy, security, and equitable access to these tools. As LLMs continue to improve and integrate with available tools, data science may experience a transformation similar to the shift from manual to automatic transmission in driving. The advancements in LLMs call for considering the future directions of data science and its education, ensuring that the benefits of these powerful tools are utilized with proper human supervision and responsibility.

Keyphrases