Docs

Technical Details

Technical Details

Developer and contributor reference for WalKEY-TalKEY. For the product overview, see the main README. For end-user setup and JSON config authoring, see the User Guide.

Hardware

FeatureSpec
MCUESP32-S3 @ 240 MHz
Display1.75" AMOLED (CO5300 controller)
TouchCST9217 capacitive
Flash16 MB QIO
PSRAMOctal SPI, 80 MHz

Board: Waveshare ESP32-S3 Touch AMOLED 1.75"

SD Card Notes

  • The board hardware can use large microSD cards, including 128 GB cards, as long as the card can be mounted by the ESP-IDF FAT filesystem stack.
  • For best compatibility, format the card as FAT32.
  • Avoid exFAT unless you have separately added and tested support for it in the firmware.
  • The BSP SD card path is wired for SDMMC in 1-bit mode.

Controls And UI

  • The main card shows the active mode as a large centered heading
  • There is no separate Touch Controller title row
  • Status/hint text lives inside the main card and is blank by default until a mode/action sets it
  • Gesture debug text is shown as a small gray label inside the main card and stays visible until a newer gesture replaces it
  • A large circular BOOT-position marker is drawn near the physical BOOT button for alignment/debugging
  • The BOOT marker uses the same green accent as the heading by default and turns red while BOOT is held
  • Holding BOOT opens the simplified mode selector with top instruction text, a bottom confirm hint, and the centered active mode label still visible
  • Pressing the touchscreen briefly shifts the touch-feedback palette to a dark red pressed state
  • Normal touch gestures show text labels such as PRESS, TAP, DOUBLE TAP, LONG PRESS, HOLD END, and swipe directions instead of arrow glyphs
  • Cursor-mode touch hold keeps the dictation workflow: 400+ ms hold enables mic gate and sends F13, release disables mic gate and releases F13
  • Cursor-mode tap sends F14
  • Cursor-mode double tap sends Enter
  • Cursor-mode swipe up sends Ctrl+A then Backspace to clear the field
  • Cursor-mode swipe down sends Ctrl+. to toggle Cursor text mode
  • Cursor-mode swipe left sends Ctrl+N
  • Cursor-mode swipe right sends Enter
  • BOOT long press resets the selection to Cursor
  • Swipe actions are edge-triggered and should fire once per gesture, not repeat while the finger is still moving

Prerequisites

USB Ports

This board has a single USB-C connector that is shared between two USB peripherals:

PortControllerPurposeWhen available
COM4 (typical)USB-Serial-JTAGFlashing and early boot consoleDownload mode only (BOOT + RESET)
COM6 (typical)USB-OTG via TinyUSB CDCRuntime log outputAfter firmware boots

The COM numbers are assigned dynamically by Windows and may differ on your machine -- check Device Manager under Ports (COM & LPT).

At runtime, TinyUSB owns the USB-OTG peripheral and the USB-Serial-JTAG port disappears. To flash new firmware, you must enter download mode first (hold BOOT, press RESET, release BOOT).

Build & Flash

Quick (using flash.ps1)

1.\flash.ps1 # build + flash (COM4 default) 2.\flash.ps1 -Port COM5 # different port 3.\flash.ps1 -BuildOnly # build without flashing 4.\flash.ps1 -FlashOnly # flash without rebuilding 5.\flash.ps1 -Clean # full clean build + flash

Manual

1idf.py set-target esp32s3 2idf.py build 3idf.py -p COM4 flash # replace COM4 with your flash port

Dependencies (waveshare/esp32_s3_touch_amoled_1_75, lvgl/lvgl 9.4.*, espressif/esp_tinyusb 2.1.1) are fetched automatically by the IDF Component Manager on first build.

Expected USB Behavior

  • Windows should enumerate the board as a USB microphone input device, a USB HID keyboard, a USB mass-storage drive, and a virtual COM port (CDC ACM serial)
  • The refreshed Windows-facing identity is VID_303A / PID_4214
  • The recording endpoint should appear as Microphone / PTT Smart Mic Microphone in Windows
  • The CDC serial port appears under Ports (COM & LPT) in Device Manager as WalKEY-TalKEY Serial
  • The USB microphone remains present even while idle
  • BOOT gates microphone audio content instead of connect/disconnect behavior

USB Serial Command Interface

The CDC ACM serial port serves dual purposes:

  1. ESP-IDF log output -- usb_cdc_log.c redirects all ESP_LOGx output to the CDC port
  2. JSON command channel -- usb_jtag_cmd.c accepts newline-delimited JSON commands on the same port

The command handler runs in a separate FreeRTOS task that polls tud_cdc_available() every 5 ms. Commands and log output share the same CDC endpoint; the web client filters JSON responses by looking for lines starting with {.

Protocol: Send a JSON object terminated by \n. The device responds with a JSON object (also \n-terminated) containing "ok": true or "ok": false.

Supported commands:

CommandDescriptionExample
pingCheck connectivity, returns device name and config version{"cmd":"ping"}
get_wifiRead current WiFi config (passwords masked){"cmd":"get_wifi"}
set_wifiUpdate WiFi STA/AP/hostname settings and reload{"cmd":"set_wifi","sta":{"ssid":"MyNet","password":"pw"}}

Web Serial integration: The Next.js dashboard at /usb uses the browser Web Serial API to connect to the CDC port (VID_303A / PID_4214). This allows WiFi configuration when the device is not yet on a network, solving the bootstrap problem where the web portal requires WiFi but WiFi credentials haven't been set yet.

Monitoring Logs

Firmware logs are redirected to the CDC ACM virtual serial port (see the USB Ports table above).

  1. Plug in the USB-C cable and wait for boot to complete
  2. Find the CDC COM port in Device Manager under Ports (COM & LPT) -- it shows as USB Serial Device (COMx)
  3. Connect with any serial terminal at 115200 baud:
1$port = New-Object System.IO.Ports.SerialPort COM6,115200 2$port.DtrEnable = $true; $port.Open() 3while($true) { if($port.BytesToRead) { Write-Host $port.ReadExisting() -NoNewline }; Start-Sleep -Milliseconds 100 }

Or use PuTTY, Tera Term, or the VS Code Serial Monitor extension.

  1. All ESP_LOGx output appears when device activity generates log messages (touch, button press, voice, etc.)

Early boot logs (before TinyUSB initializes) are not captured.

Expected HID Behavior

  • BOOT press enters temporary mode-selection state and should not send F13
  • BOOT release confirms the current mode and exits mode-selection state
  • In Cursor mode, a 400+ ms touchscreen hold sends F13 down and release sends F13 up
  • In Cursor mode, tap sends F14
  • In Cursor mode, double tap sends Enter
  • In Cursor mode, swipe up sends Ctrl+A then Backspace, swipe down sends Ctrl+., swipe left sends Ctrl+N, and swipe right sends Enter
  • Swipe gestures should execute their mapped action once per gesture
  • In swipe-driven modes, left/right swipes map to mode-specific keyboard-safe actions from main/mode_config.c
  • If USB is not mounted or not ready, the UI still updates and the serial log explains why HID was skipped

Expected Microphone Behavior

  • The USB microphone enumerates continuously as a normal Windows input device
  • Microphone transport follows a TinyUSB-style 48 kHz / 16-bit / mono full-speed profile
  • When BOOT is held, live mic frames are sent over USB Audio Class
  • When BOOT is released, the firmware still services the audio stream but sends silence
  • Serial logs should show USB attach/detach and microphone streaming start/stop events

Partition Table

Custom layout in partitions.csv -- 8 MB factory app, 5 MB model SPIFFS, 2 MB config/docs SPIFFS:

NameTypeSize
nvsdata24 KB
phy_initdata4 KB
factoryapp8 MB
modeldata (spiffs)5 MB
storagedata (spiffs)2 MB

The runtime mode JSON file lives at /spiffs/mode-config.json. A repo copy is provided at config/mode-config.json.

Wi-Fi Config Portal

The firmware exposes a local config portal over Wi-Fi:

  • It first tries the router credentials stored in the JSON config and advertises http://walkey-talkey.local/
  • If router join succeeds, browse to walkey-talkey.local or the IP shown on the BOOT overlay
  • If router join fails, it falls back to a device-hosted access point
  • Fallback SSID: walkey-talkey
  • Fallback password: secretKEY
  • Fallback URL: http://192.168.4.1/
  • The portal serves a small web UI and REST endpoints for GET /config, POST /config/validate, PUT /config, and POST /config/reset
  • The portal also offers direct documentation downloads for mode-config.schema.json, AI_GUIDE.md, and USER_GUIDE.md
  • The portal intentionally comes up after a short startup delay of about 8 seconds
  • The BOOT overlay shows Connecting... immediately during that startup delay, then switches to the active hostname, IP, or AP label when Wi-Fi is ready
  • Save and Reset both reapply the Wi-Fi config immediately, so a manual reboot is no longer required after changing network settings
  • Reset writes the built-in firmware JSON back to the external config file, then reloads the runtime from that restored config
  • The hardcoded failsafe config remains an internal safety net if both external and built-in JSON loading fail
  • If Save or Reset fails, the portal returns a detailed STORAGE_FAILED payload that explains whether the failure happened while mounting SPIFFS or writing /spiffs/mode-config.json, including stage, formatAttempted, path, partition, espError, errnoValue, errnoMessage, and suggested recovery steps

Portal/SR Coexistence Notes

  • The main limiter is internal runtime RAM and largest_internal, not the 16 MB flash size
  • Large portal responses such as GET /config and GET /portal were restored by keeping chunked/streamed sends, preferring PSRAM for temporary buffers, and enabling CONFIG_SPIRAM_ALLOW_BSS_SEG_EXTERNAL_MEMORY
  • Avoid reverting the PSRAM/BSS settings without re-testing GET /config and GET /portal, especially if SR or USB audio is active

JSON Macro Model

The JSON macro model is intentionally declarative:

  • Each binding is input + trigger + ordered actions
  • The actions array is the macro
  • Tap actions already include the firmware's built-in tap gap
  • Use sleep_ms only for extra delay between macro steps
  • Prefer hid_shortcut_tap with modifiers plus key for keyboard chords and hid_usage_* for media/system HID

For the full JSON authoring reference, see the User Guide.

Project Structure

├── CMakeLists.txt          # Top-level project CMake
├── flash.ps1               # Build & flash helper script (PowerShell)
├── partitions.csv          # Custom partition table
├── sdkconfig.defaults      # Default Kconfig (PSRAM, LVGL core settings)
└── main/
    ├── CMakeLists.txt      # Component CMake
    ├── README.md           # Notes for the main app modules
    ├── action_engine.c     # Executes declarative mode actions
    ├── action_engine.h
    ├── audio_input.c       # ES7210 + I2S microphone capture wrapper
    ├── audio_input.h
    ├── boot_button.c       # GPIO0 polling and debounce
    ├── boot_button.h
    ├── input_router.c      # Normalizes raw BOOT/touch events into triggers
    ├── input_router.h
    ├── mode_config.c       # Hybrid JSON/fallback mode config entry point
    ├── mode_config.h
    ├── mode_json_loader.c  # JSON-to-runtime config compiler
    ├── mode_json_loader.h
    ├── mode_controller.c   # Active mode and temporary boot-mode control
    ├── mode_controller.h
    ├── mode_system.Readme.md
    ├── mode_types.h
    ├── ptt_state.c         # Small host-testable PTT transition state machine
    ├── ptt_state.h
    ├── usb_cdc_log.c       # CDC ACM virtual serial port log redirect
    ├── usb_cdc_log.h
    ├── usb_composite.c     # Composite USB HID + microphone + MSC + CDC transport
    ├── usb_composite.h
    ├── usb_jtag_cmd.c      # CDC serial JSON command handler (ping, get/set WiFi)
    ├── usb_jtag_cmd.h
    ├── idf_component.yml   # IDF Component Manager dependencies
    ├── component.mk        # Legacy Make support
    ├── main.c              # App orchestration and queued mode/input/event handling
    ├── ui_status.c         # Current mode UI, BOOT overlay, and touch/swipe feedback
    └── ui_status.h

Key Configuration (sdkconfig.defaults)

  • Octal PSRAM with XIP enabled
  • 32 KB instruction cache / 64 KB data cache (64-byte lines)
  • LVGL refresh period: 15 ms
  • 2 SW draw units for parallel rendering
  • IRAM-placed fast-mem attributes for LVGL
  • FreeRTOS tick rate: 1000 Hz
  • LVGL demo features disabled for this custom UI app
  • TinyUSB HID interface count set to 1
  • TinyUSB Audio Class is enabled through project-wide compile definitions so the project does not rely on edited managed_components
  • TinyUSB CDC ACM is enabled (CFG_TUD_CDC=1) for a virtual serial port that carries ESP-IDF log output
  • USB audio sizing matches a Windows-friendlier TinyUSB microphone profile
  • Console output is set to none (CONFIG_ESP_CONSOLE_NONE=y) because USB-Serial-JTAG is unavailable while TinyUSB owns the USB peripheral

Manual Validation

Firmware-Side

  • Build with idf.py build or .\flash.ps1 -BuildOnly
  • Flash and confirm the default screen shows the centered active mode heading with no fallback Cursor mode placeholder text
  • Confirm the in-card hint/status area is blank until populated by mode activity
  • Verify there is no separate Touch Controller title row
  • Press and hold the touchscreen briefly without swiping and confirm the touch-down palette shifts to dark red while pressed
  • Perform touch gestures and confirm the in-card debug label shows text like PRESS, TAP, DOUBLE TAP, LONG PRESS, HOLD END, and swipe directions
  • Confirm the BOOT-position marker is visible near the physical BOOT button, uses the green accent at idle, and turns red while BOOT is held
  • Press and hold BOOT to confirm the BOOT selector appears with Swipe to switch mode, the active network address on the next line once Wi-Fi is ready, Release BOOT = Confirm, and the centered active mode label still visible
  • While holding BOOT, swipe left or right and confirm the selected mode changes
  • Release BOOT and confirm the newly selected mode remains active
  • Watch serial logs for USB init, BOOT press/release, touch events, HID send messages, and microphone streaming start/stop

Host-Side

  • Connect the board to the USB-OTG-capable port used for device mode
  • Confirm Device Manager shows a USB keyboard / HID entry and Windows Sound settings show PTT Smart Mic Microphone
  • Verify BOOT mode changes do not emit an F13 key event by themselves
  • Verify in Cursor mode that a 400+ ms stationary press sends F13 down and release sends F13 up
  • Verify in Cursor mode that tap sends F14
  • Verify in Cursor mode that double tap sends Enter
  • Verify in Cursor mode that swipe up sends Ctrl+A then Backspace
  • Verify in Cursor mode that swipe down sends Ctrl+.
  • Verify in Cursor mode that swipe left sends Ctrl+N
  • Verify in Cursor mode that swipe right sends Enter
  • Verify a short tap only performs the active mode's mapped tap behavior
  • Open Windows Sound settings or mmsys.cpl recording devices and confirm the mic meter stays quiet when idle
  • Hold touch in Cursor mode and speak into the board to confirm the recording meter reacts only while dictation is active

AI Context

  • Board BSP provided by waveshare/esp32_s3_touch_amoled_1_75 component (display init, touch, backlight).
  • The onboard microphone path uses the BSP audio layer plus esp_codec_dev.
  • The app uses the BSP default display/touch orientation.
  • sdkconfig is git-ignored; sdkconfig.defaults is the source of truth for configuration.
  • Mode-system behavior is split across main/main.c, main/mode_config.c, main/mode_json_loader.c, main/mode_controller.c, main/input_router.c, main/action_engine.c, and main/ui_status.c.
  • Dictation-specific behavior is still supported through Cursor mode plus main/ptt_state.c, main/audio_input.c, and main/usb_composite.c.
  • ptt_state.c keeps PTT transitions deterministic and host-testable without BSP, LVGL, or TinyUSB dependencies.
  • usb_composite.c owns the composite TinyUSB descriptors and callbacks for keyboard HID, USB microphone streaming, MSC storage, and CDC ACM serial, including key report state for F13 and any future extra keys.
  • usb_cdc_log.c redirects ESP_LOGx output to the CDC ACM virtual serial port via esp_log_set_vprintf().
  • usb_jtag_cmd.c runs a FreeRTOS task that polls the TinyUSB CDC FIFO for newline-delimited JSON commands (ping, get_wifi, set_wifi) and writes JSON responses back over the same port. The tud_cdc_rx_cb callback in usb_composite.c is intentionally empty because the command task handles all CDC reads.
  • ui_status.c owns touch gesture detection, the BOOT overlay, the in-card gesture debug label, the BOOT-position marker, and reports high-level touch events back to main.c.