Introduction
The Gene Expression Omnibus (GEO) database, hosted by the National Center for Biotechnology Information (NCBI), has emerged as one of the most important repositories for transcriptomics and other omics data, owing to its vast data collection and free availability.
Today, with the rise of large language models like ChatGPT, integrating artificial intelligence tools into research workflows has become increasingly common. ChatGPT serves as a powerful coding assistant, facilitating rapid code generation, optimization, and debugging, thereby reducing technical barriers and enhancing research efficiency.
This blog aims to provide a detailed introduction on leveraging ChatGPT to conveniently and efficiently download omics datasets from GEO. It is intended for biomedical researchers, bioinformatics beginners, and scientists interested in bioinformatics data mining.

In addition, we need to know three concepts:
GEO Platform (GPL): Chip/platform used by the user for sequencing
GEO Sample (GSM): Sample data submitted to GEO by users
GEO Series (GSE): A complete study with a brief introduction to the study

An article can have one or more GSE datasets, and a GSE dataset can have one or more GSM samples. And each dataset has its own corresponding chip platform, which is GPL.
Preparation
Before you start downloading GEO data using ChatGPT, some essential tools and setups must be prepared. This section will guide you through quickly installing the required software and understanding ChatGPT’s role in bioinformatics analysis.
(1) Install R and required packages
First, you need to install R and standard bioinformatics packages on your computer. Follow the steps below:
Installing R
Visit the official R website to download and install the latest version of R.
Installing RStudio
It’s recommended to use RStudio, an integrated development environment (IDE) for R, for easier code writing and running. Download it freely from the RStudio website.
Installing the GEOquery Package
In RStudio, quickly install GEOquery using the following code:
# Install BiocManager if not already installed
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install GEOquery
BiocManager::install("GEOquery")
- Checking Installation of GEOquery
After installation, verify if the package is loaded correctly with:
library(GEOquery)
If no error occurs, GEOquery is successfully installed.
(2) Role of AI in Bioinformatics Analysis
After setting up your R environment, it’s important to understand how ChatGPT contributes to your workflow. Although ChatGPT itself doesn’t directly download data, it significantly helps in the following ways:
Rapid generation of R code: Simply describe your required tasks, such as downloading specific GEO datasets, and ChatGPT can automatically generate corresponding R code, saving considerable time.
Code explanation and commenting: ChatGPT helps by explaining generated codes step-by-step, facilitating quick understanding for beginners.
Debugging and optimization suggestions: When encountering errors or unusual behaviors in your code, ChatGPT rapidly identifies issues and provides practical solutions.
Overall, ChatGPT serves as your “personal coach,” helping you efficiently and smoothly perform data mining tasks
(3) How to clearly communicate your needs to ChatGPT
To efficiently obtain suitable R code, clearly and specifically describing your question to AI is essential. Here, we use ChatGPT to as a demonstration. The following tips will significantly improve the accuracy of the code generated:
- Creat a new project Create a dedicated project and let AI remember your progress and background

Clearly describe the task by using a well-formulated prompt
eg. “Please help me write an R script using GEOquery package to download and process data from the GEO database. The dataset number is GSE75037 (you can change the number), and I would like to obtain and preview a gene expression matrix.”
Update the paper who used this database
