Due to recent scandals about ChatGPT leaking shared chats in which people left sensitive information at hand for any attacker or bad intended person, we took some action in implementing an automation solution which scans all the shared chats for a specific ChatGPT account.
Some context about this:
Here is a preview of our work with this sensitive information scanner:
We start by using the undetected chrome driver inside our scanner:
import json
import re
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.common.exceptions import TimeoutException, ElementClickInterceptedException
from selenium.webdriver import ActionChains
from bs4 import BeautifulSoup
# Replace with your credentials
EMAIL = ""
PASSWORD = ""
# Setup Chrome options
chrome_options = uc.ChromeOptions()
#chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--user-data-dir=/tmp/chatgpt-profile")
#chrome_options.add_argument("--window-size=1920,1080")
# chrome_options.headless = True
# chrome_options.add_argument('--headless')
# chrome_options.add_argument('--no-sandbox')
# chrome_options.add_argument("--disable-dev-shm-usage")
# chrome_options.add_argument('--disable-gpu')
# chrome_options.add_argument('--window-size=1280x1696')
# chrome_options.add_argument('--user-data-dir=/tmp/user-data')
# chrome_options.add_argument('--hide-scrollbars')
# chrome_options.add_argument('--enable-logging')
# chrome_options.add_argument('--log-level=0')
# chrome_options.add_argument('--v=99')
# chrome_options.add_argument('--single-process')
# chrome_options.add_argument('--data-path=/tmp/data-path')
# chrome_options.add_argument('--ignore-certificate-errors')
# chrome_options.add_argument('--homedir=/tmp')
# chrome_options.add_argument('--disk-cache-dir=/tmp/cache-dir')
# chrome_options.add_argument("--incognito")
# random agent!!!!!!!
chrome_options.add_argument(
'Mozilla/5.0 (Linux; arm; Android 6.0; VOX S507 4G VS5022PL) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.136 YaBrowser/20.2.4.153.00 Mobile Safari/537.36')
driver = uc.Chrome(options=chrome_options)
wait = WebDriverWait(driver, 5) # Faster base wait
There are some arguments to make it headless and some helper for that but at current moment it doesn’t work.
Next there is the login mechanism, where we try login just when not logged in:
def login_if_needed():
driver.get("https://chatgpt.com")
if not safe_click("button[data-testid='login-button']"):
return
if not safe_click("button[value='google']"):
return
try:
email_input = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.ID, "identifierId")))
email_input.send_keys(EMAIL)
driver.find_element(By.ID, "identifierNext").click()
password_input = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.NAME, "Passwd")))
time.sleep(5)
password_input.send_keys(PASSWORD)
driver.find_element(By.ID, "passwordNext").click()
time.sleep(5)
except Exception:
print("[*] Google login skipped (already logged in?).")
Email and Password will be used locally on your machines, therefore is no problem to have them there. We viewed it as a backup mechanism just do make the login through the scanner, because our best seen scenario is where we already are logged in our account and the login is skipped.
The cookies and the session are kept for quite a long period of time in the browser opened by undetected chrome driver.
Next there is the goto mechanism, in our case redirecting directly to datashares where the shared links are located:
def go_to_data_controls():
driver.get("https://chatgpt.com/#settings/DataControls")
print("[*] Navigated to settings page.")
try:
shared_section = WebDriverWait(driver, 4).until(EC.element_to_be_clickable(
(By.XPATH, "//div[div/div/div[text()='Linkuri partajate']]//button[div[text()='Gestionează']]")
))
driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", shared_section)
time.sleep(0.3)
shared_section.click()
print("[✓] Clicked 'Manage' in Shared Links.")
except TimeoutException:
print("[✗] Could not find Shared links Manage button.")
The extraction of the shared links:
def extract_shared_links():
try:
time.sleep(2)
links = driver.find_elements(By.CSS_SELECTOR, 'a[href^="/share/"]')
for l in links:
href = l.get_attribute("href")
if href and not href.startswith("http"):
href = "https://chatgpt.com" + href
shared_links.add(href)
print(f"\n[*] Collected {len(shared_links)} shared links.")
except Exception as e:
print(f"[!] Could not extract shared links: {e}")
And the information scanning inside the chat of the shared links:
def scrape_shared_conversation(url):
driver.get(url)
print(f"\n[*] Scraping: {url}")
time.sleep(2)
found_secret = False
try:
messages = driver.find_elements(By.CSS_SELECTOR, 'div[data-message-id]')
print(f"[*] Found {len(messages)} messages")
for m in messages:
try:
txt = m.text.strip()
if not txt:
continue
print("───")
print(txt)
# Match secrets
pattern = r"(?i)\b(API_KEY|APIKEY|API_TOKEN|TOKEN|ACCESS_TOKEN|SECRET|PASSWORD)[\s:=\"']+([A-Za-z0-9\-_.]{8,})"
matches = re.findall(pattern, txt)
if matches:
found_secret = True
print("[POSSIBLE SECRET FOUND!]")
for key, value in matches:
print(f"{key}: {value}")
except Exception as inner:
print(f"[!] Error reading a message: {inner}")
except Exception as e:
print(f"[✗] Could not scrape content: {e}")
if found_secret:
print("[✔] Secrets found in this chat.")
else:
print("[✓] No secrets detected in this chat.")
In this version we look for API_KEY, API_TOKEN and the rest of patterns seen in:
pattern = r"(?i)\b(API_KEY|APIKEY|API_TOKEN|TOKEN|ACCESS_TOKEN|SECRET|PASSWORD)[\s:=\"']+([A-Za-z0-9\-_.]{8,})"
As output information we have:
[✗] Element not found: button[data-testid='login-button']
[*] Navigated to settings page.
[✓] Clicked 'Manage' in Shared Links.
[*] Collected 2 shared links.
[*] Scraping: https://chatgpt.com/share/688ca4ac-bd0c-8012-9cd4-5b0030fbfabf
[*] Found 6 messages
───
test
───
Test received — I’m here and ready. How can I help you?
───
coco
───
Salut! Poți să-mi spui mai exact la ce te referi cu „coco”? Vrei să discutăm despre:
Coco, filmul animat de la Pixar?
Coco Chanel?
Un acronim (COCO – Common Objects in Context dataset)?
Altceva?
Spune-mi și te ajut imediat.
───
api_key: A1B2C3D4E5F6
[POSSIBLE SECRET FOUND!]
api_key: A1B2C3D4E5F6
───
Pentru siguranța ta, îți recomand să nu postezi chei API sau alte informații sensibile aici sau în alte spații publice.
Dacă acea cheie e reală, îți sugerez să o revoci imediat din consola serviciului respectiv și să generezi una nouă. Vrei ajutor să identifici unde ai folosit această cheie sau cum o poți înlocui într-un anumit cod?
[*] Secrets found in this chat.
*] Scraping: https://chatgpt.com/share/688bb63d-9b48-8012-bdbe-6251a54b1f8f
*] Found 2 messages
───
test
───
Test received — I’m here and ready. How can I help you, Vlad?
[✓] No secrets detected in this chat.
With this scanner we can view all the informations from the shared links in a very fast manner and identifying all the possible sensitive leaks that might be present in those chats.
Recent Comments