🤖 Python · Gemini API · Conversational AI

Python Chatbot

A context-aware conversational AI chatbot built from scratch in Python using the Gemini API. Maintains full multi-turn conversation history so the bot remembers everything said earlier in a session, uses a custom system prompt to define its persona, and features a clean formatted terminal interface with graceful error handling.

Turn Memory
1System Prompt
GeminiLLM Backend
~60Lines of Code
0Frameworks

The Conversation Loop

Multi-turn terminal chatbot — the bot's context window grows with every message, so it can reference anything said earlier in the session.

python chatbot.py
✦ Gemini Chatbot — type 'quit' to exit
───────────────────────────────────────────

You > hey, what can you help me with?
Hey! I'm your AI assistant powered by Gemini. I can help you brainstorm ideas, answer questions, explain concepts, debug code, draft text, or just have a conversation. What's on your mind?

You > can you explain what an API is in simple terms?
Sure! An API (Application Programming Interface) is basically a messenger that lets two programs talk to each other. Think of it like a waiter at a restaurant — you (the app) tell the waiter (the API) what you want, the waiter goes to the kitchen (another service), and brings back your order. In this chatbot, we're using the Gemini API — our Python script sends your message, Gemini's servers process it, and the API brings back the response.

You > nice analogy — did you come up with that yourself?
I did! I noticed you wanted a simple explanation, so I went with something concrete. I also remember you asked what I can help with at the start — so I'm keeping the tone conversational since that seems to be working for you.

You >

How Multi-Turn Memory Works

The key insight: the Gemini API is stateless — it doesn't remember anything between calls. So you have to build memory yourself by appending every message to a list and sending the entire conversation history with each new request.

⌨️
Step 1
User Types Input
Terminal captures the message with input(). Loop checks for "quit" command before processing.
📋
Step 2
Append to History
Message is appended to conversation_history list as a {"role": "user", ...} dict.
🚀
Step 3
Send Full Context
The entire history list — every prior turn — is sent to model.generate_content() as context.
💬
Step 4
Bot Replies + Saves
Response is printed, then appended back to history as "model" role — ready for the next turn.

The Full Code

Click a section to explore the implementation. Every part of the chatbot in about 60 lines of Python.

API Setup & Initialization

The chatbot uses google-generativeai, Google's official Python SDK for Gemini. The API key is loaded from an environment variable — never hardcoded — using os.getenv().

The GenerativeModel is initialized once with the system instruction baked in — this sets the bot's persona before the first message is ever sent.

import google.generativeai as genai
import os

# Load API key from environment variable
# Never hardcode keys — use .env or system env
genai.configure(
    api_key=os.getenv("GEMINI_API_KEY")
)

# Initialize the model with a system prompt
# The system_instruction shapes ALL responses
model = genai.GenerativeModel(
    model_name="gemini-2.0-flash",
    system_instruction=get_system_prompt()
)

# Start with an empty conversation history
conversation_history = []

print("✦ Gemini Chatbot — type 'quit' to exit")
print("─" * 42)

The Conversation Loop

A simple while True loop handles the entire conversation. It captures user input, checks for the exit command, appends to history, calls the API, prints the response, and appends the reply — then repeats.

The try/except block handles API errors gracefully — if the network drops or the API returns an error, the loop continues rather than crashing the program.

while True:
    # Get user input
    user_input = input("\nYou > ").strip()

    # Exit condition
    if user_input.lower() in ['quit', 'exit', 'bye']:
        print("\n✦ Conversation ended. Goodbye!\n")
        break

    if not user_input:
        continue

    # Append user message to history
    conversation_history.append({
        "role":  "user",
        "parts": [user_input]
    })

    try:
        # Send full history to the model
        response = model.generate_content(
            conversation_history
        )
        bot_reply = response.text

        # Print formatted response
        print(f"\nBot > {bot_reply}\n")

        # Append bot reply to history
        conversation_history.append({
            "role":  "model",
            "parts": [bot_reply]
        })
    except Exception as e:
        print(f"\n⚠ API error: {e}\n")

Conversation Memory

The conversation_history list grows with every turn. Each entry is a dict with a role ("user" or "model") and parts (a list with the message text).

Gemini requires this exact format. Sending the full list means the model has complete context — it can reference anything said in the current session, making follow-up questions and references work naturally.

# What conversation_history looks like
# after 2 turns:
conversation_history = [
    {
        "role":  "user",
        "parts": ["hey, what can you help with?"]
    },
    {
        "role":  "model",
        "parts": ["I can help you brainstorm, answer... (full reply)"]
    },
    {
        "role":  "user",
        "parts": ["explain APIs in simple terms"]
    },
    # ↑ This entire list is sent on every call
    # Gemini sees the full conversation context
]

# Without this, the bot would have no memory:
# "What did I just say?" → "I'm not sure..."
# With this, it tracks the whole thread.

System Prompt / Persona

The system instruction is passed directly to GenerativeModel at initialization. It shapes every single response — tone, format, expertise, and constraints — without the user ever seeing it.

This is where prompt engineering matters most: a well-written system prompt turns a generic LLM into a focused assistant with consistent behavior.

def get_system_prompt():
    return """
You are a helpful AI assistant built in Python
using the Gemini API as a class project.

Your personality:
- Clear and friendly, never condescending
- Use plain language unless asked for technical depth
- Give concise answers unless asked to elaborate
- Use analogies when explaining complex concepts

Capabilities:
- Answer general knowledge questions
- Explain programming and tech concepts
- Help brainstorm ideas or outlines
- Review or discuss short pieces of writing

Constraints:
- You do not have internet access mid-session
- Acknowledge when a question is outside your scope
- Never fabricate citations or specific statistics

At the start of each session you are fresh —
you have no memory of past sessions,
only the current conversation history.
"""

By the Numbers

Business Problem
Most people don't know how to interact with a language model API directly — they rely on polished consumer apps. This project strips away the UI layer to show exactly how a conversational AI works at the code level: API calls, stateless responses, and the developer's responsibility to manage context and memory manually.
One-Sentence Summary
A multi-turn Python terminal chatbot that connects to the Gemini API, maintains a conversation history list across turns for persistent memory, and uses a custom system prompt to define the bot's persona and response behavior.
Tools Used
Python 3.11 Gemini API google-generativeai os (env vars) VS Code GitHub Copilot
Key Features
Multi-turn conversation memory via a growing history list sent with every API call. Custom system prompt for persona definition. Graceful error handling with try/except so API failures don't crash the session. Clean terminal formatting with turn labels (You / Bot). Exit command detection ("quit", "exit", "bye"). API key loaded from environment variable — never hardcoded.
My Role / Contribution
Sole developer. Wrote the full chatbot from scratch — API configuration, conversation loop architecture, history management, system prompt design, error handling, and terminal formatting. Also researched the Gemini SDK documentation to understand the correct message format ({"role", "parts"} dicts) required for multi-turn context.
Biggest Challenge
Understanding why the API is stateless and what that means for memory. Initially the bot had no memory between messages — each response treated every question as the first. The fix was recognizing that you have to manually send the entire conversation history with every single API call. Once that clicked, multi-turn memory became straightforward to implement.
What I Learned
LLM APIs are fundamentally stateless — memory is always the developer's responsibility. System prompts are not just instructions; they're the primary lever for shaping model behavior and should be written carefully. I also learned the difference between using an AI tool and building with one — the latter requires understanding context windows, token limits, and message structure at the API level.
Screenshot / Visual
See the interactive terminal preview at the top of this page — it shows a full 3-turn conversation demonstrating how the bot references earlier messages, the formatting of user vs. bot turns, and the blinking cursor on the live input prompt.
GitHub & Demo
⌥ GitHub Repository ↗ Built in VS Code with GitHub Copilot assistance. Full source — chatbot.py, .env.example, and requirements.txt — available on request.