Technical Details
Technical Details
Developer and contributor reference for WalKEY-TalKEY. For the product overview, see the main README. For end-user setup and JSON config authoring, see the User Guide.
Hardware
| Feature | Spec |
|---|---|
| MCU | ESP32-S3 @ 240 MHz |
| Display | 1.75" AMOLED (CO5300 controller) |
| Touch | CST9217 capacitive |
| Flash | 16 MB QIO |
| PSRAM | Octal SPI, 80 MHz |
Board: Waveshare ESP32-S3 Touch AMOLED 1.75"
SD Card Notes
- The board hardware can use large microSD cards, including
128 GBcards, as long as the card can be mounted by the ESP-IDF FAT filesystem stack. - For best compatibility, format the card as
FAT32. - Avoid
exFATunless you have separately added and tested support for it in the firmware. - The BSP SD card path is wired for
SDMMCin1-bitmode.
Controls And UI
- The main card shows the active mode as a large centered heading
- There is no separate
Touch Controllertitle row - Status/hint text lives inside the main card and is blank by default until a mode/action sets it
- Gesture debug text is shown as a small gray label inside the main card and stays visible until a newer gesture replaces it
- A large circular BOOT-position marker is drawn near the physical BOOT button for alignment/debugging
- The BOOT marker uses the same green accent as the heading by default and turns red while
BOOTis held - Holding
BOOTopens the simplified mode selector with top instruction text, a bottom confirm hint, and the centered active mode label still visible - Pressing the touchscreen briefly shifts the touch-feedback palette to a dark red pressed state
- Normal touch gestures show text labels such as
PRESS,TAP,DOUBLE TAP,LONG PRESS,HOLD END, and swipe directions instead of arrow glyphs - Cursor-mode touch hold keeps the dictation workflow: 400+ ms hold enables mic gate and sends
F13, release disables mic gate and releasesF13 - Cursor-mode tap sends
F14 - Cursor-mode double tap sends
Enter - Cursor-mode swipe up sends
Ctrl+AthenBackspaceto clear the field - Cursor-mode swipe down sends
Ctrl+.to toggle Cursor text mode - Cursor-mode swipe left sends
Ctrl+N - Cursor-mode swipe right sends
Enter - BOOT long press resets the selection to
Cursor - Swipe actions are edge-triggered and should fire once per gesture, not repeat while the finger is still moving
Prerequisites
- ESP-IDF v5.5 -- installation guide
- Target:
esp32s3
USB Ports
This board has a single USB-C connector that is shared between two USB peripherals:
| Port | Controller | Purpose | When available |
|---|---|---|---|
| COM4 (typical) | USB-Serial-JTAG | Flashing and early boot console | Download mode only (BOOT + RESET) |
| COM6 (typical) | USB-OTG via TinyUSB CDC | Runtime log output | After firmware boots |
The COM numbers are assigned dynamically by Windows and may differ on your machine -- check Device Manager under Ports (COM & LPT).
At runtime, TinyUSB owns the USB-OTG peripheral and the USB-Serial-JTAG port disappears. To flash new firmware, you must enter download mode first (hold BOOT, press RESET, release BOOT).
Build & Flash
Quick (using flash.ps1)
1.\flash.ps1 # build + flash (COM4 default) 2.\flash.ps1 -Port COM5 # different port 3.\flash.ps1 -BuildOnly # build without flashing 4.\flash.ps1 -FlashOnly # flash without rebuilding 5.\flash.ps1 -Clean # full clean build + flash
Manual
1idf.py set-target esp32s3 2idf.py build 3idf.py -p COM4 flash # replace COM4 with your flash port
Dependencies (waveshare/esp32_s3_touch_amoled_1_75, lvgl/lvgl 9.4.*, espressif/esp_tinyusb 2.1.1) are fetched automatically by the IDF Component Manager on first build.
Expected USB Behavior
- Windows should enumerate the board as a USB microphone input device, a USB HID keyboard, a USB mass-storage drive, and a virtual COM port (CDC ACM serial)
- The refreshed Windows-facing identity is
VID_303A/PID_4214 - The recording endpoint should appear as
Microphone/PTT Smart Mic Microphonein Windows - The CDC serial port appears under
Ports (COM & LPT)in Device Manager asWalKEY-TalKEY Serial - The USB microphone remains present even while idle
- BOOT gates microphone audio content instead of connect/disconnect behavior
USB Serial Command Interface
The CDC ACM serial port serves dual purposes:
- ESP-IDF log output --
usb_cdc_log.credirects allESP_LOGxoutput to the CDC port - JSON command channel --
usb_jtag_cmd.caccepts newline-delimited JSON commands on the same port
The command handler runs in a separate FreeRTOS task that polls tud_cdc_available() every 5 ms. Commands and log output share the same CDC endpoint; the web client filters JSON responses by looking for lines starting with {.
Protocol: Send a JSON object terminated by \n. The device responds with a JSON object (also \n-terminated) containing "ok": true or "ok": false.
Supported commands:
| Command | Description | Example |
|---|---|---|
ping | Check connectivity, returns device name and config version | {"cmd":"ping"} |
get_wifi | Read current WiFi config (passwords masked) | {"cmd":"get_wifi"} |
set_wifi | Update WiFi STA/AP/hostname settings and reload | {"cmd":"set_wifi","sta":{"ssid":"MyNet","password":"pw"}} |
Web Serial integration: The Next.js dashboard at /usb uses the browser Web Serial API to connect to the CDC port (VID_303A / PID_4214). This allows WiFi configuration when the device is not yet on a network, solving the bootstrap problem where the web portal requires WiFi but WiFi credentials haven't been set yet.
Monitoring Logs
Firmware logs are redirected to the CDC ACM virtual serial port (see the USB Ports table above).
- Plug in the USB-C cable and wait for boot to complete
- Find the CDC COM port in Device Manager under
Ports (COM & LPT)-- it shows asUSB Serial Device (COMx) - Connect with any serial terminal at 115200 baud:
1$port = New-Object System.IO.Ports.SerialPort COM6,115200 2$port.DtrEnable = $true; $port.Open() 3while($true) { if($port.BytesToRead) { Write-Host $port.ReadExisting() -NoNewline }; Start-Sleep -Milliseconds 100 }
Or use PuTTY, Tera Term, or the VS Code Serial Monitor extension.
- All
ESP_LOGxoutput appears when device activity generates log messages (touch, button press, voice, etc.)
Early boot logs (before TinyUSB initializes) are not captured.
Expected HID Behavior
- BOOT press enters temporary mode-selection state and should not send
F13 - BOOT release confirms the current mode and exits mode-selection state
- In
Cursormode, a 400+ ms touchscreen hold sendsF13down and release sendsF13up - In
Cursormode, tap sendsF14 - In
Cursormode, double tap sendsEnter - In
Cursormode, swipe up sendsCtrl+AthenBackspace, swipe down sendsCtrl+., swipe left sendsCtrl+N, and swipe right sendsEnter - Swipe gestures should execute their mapped action once per gesture
- In swipe-driven modes, left/right swipes map to mode-specific keyboard-safe actions from
main/mode_config.c - If USB is not mounted or not ready, the UI still updates and the serial log explains why HID was skipped
Expected Microphone Behavior
- The USB microphone enumerates continuously as a normal Windows input device
- Microphone transport follows a TinyUSB-style 48 kHz / 16-bit / mono full-speed profile
- When BOOT is held, live mic frames are sent over USB Audio Class
- When BOOT is released, the firmware still services the audio stream but sends silence
- Serial logs should show USB attach/detach and microphone streaming start/stop events
Partition Table
Custom layout in partitions.csv -- 8 MB factory app, 5 MB model SPIFFS, 2 MB config/docs SPIFFS:
| Name | Type | Size |
|---|---|---|
| nvs | data | 24 KB |
| phy_init | data | 4 KB |
| factory | app | 8 MB |
| model | data (spiffs) | 5 MB |
| storage | data (spiffs) | 2 MB |
The runtime mode JSON file lives at /spiffs/mode-config.json. A repo copy is provided at config/mode-config.json.
Wi-Fi Config Portal
The firmware exposes a local config portal over Wi-Fi:
- It first tries the router credentials stored in the JSON config and advertises
http://walkey-talkey.local/ - If router join succeeds, browse to
walkey-talkey.localor the IP shown on the BOOT overlay - If router join fails, it falls back to a device-hosted access point
- Fallback SSID:
walkey-talkey - Fallback password:
secretKEY - Fallback URL:
http://192.168.4.1/ - The portal serves a small web UI and REST endpoints for
GET /config,POST /config/validate,PUT /config, andPOST /config/reset - The portal also offers direct documentation downloads for
mode-config.schema.json,AI_GUIDE.md, andUSER_GUIDE.md - The portal intentionally comes up after a short startup delay of about 8 seconds
- The BOOT overlay shows
Connecting...immediately during that startup delay, then switches to the active hostname, IP, or AP label when Wi-Fi is ready SaveandResetboth reapply the Wi-Fi config immediately, so a manual reboot is no longer required after changing network settings- Reset writes the built-in firmware JSON back to the external config file, then reloads the runtime from that restored config
- The hardcoded failsafe config remains an internal safety net if both external and built-in JSON loading fail
- If
SaveorResetfails, the portal returns a detailedSTORAGE_FAILEDpayload that explains whether the failure happened while mounting SPIFFS or writing/spiffs/mode-config.json, includingstage,formatAttempted,path,partition,espError,errnoValue,errnoMessage, and suggested recovery steps
Portal/SR Coexistence Notes
- The main limiter is internal runtime RAM and
largest_internal, not the 16 MB flash size - Large portal responses such as
GET /configandGET /portalwere restored by keeping chunked/streamed sends, preferring PSRAM for temporary buffers, and enablingCONFIG_SPIRAM_ALLOW_BSS_SEG_EXTERNAL_MEMORY - Avoid reverting the PSRAM/BSS settings without re-testing
GET /configandGET /portal, especially if SR or USB audio is active
JSON Macro Model
The JSON macro model is intentionally declarative:
- Each binding is
input+trigger+ orderedactions - The
actionsarray is the macro - Tap actions already include the firmware's built-in tap gap
- Use
sleep_msonly for extra delay between macro steps - Prefer
hid_shortcut_tapwithmodifierspluskeyfor keyboard chords andhid_usage_*for media/system HID
For the full JSON authoring reference, see the User Guide.
Project Structure
├── CMakeLists.txt # Top-level project CMake
├── flash.ps1 # Build & flash helper script (PowerShell)
├── partitions.csv # Custom partition table
├── sdkconfig.defaults # Default Kconfig (PSRAM, LVGL core settings)
└── main/
├── CMakeLists.txt # Component CMake
├── README.md # Notes for the main app modules
├── action_engine.c # Executes declarative mode actions
├── action_engine.h
├── audio_input.c # ES7210 + I2S microphone capture wrapper
├── audio_input.h
├── boot_button.c # GPIO0 polling and debounce
├── boot_button.h
├── input_router.c # Normalizes raw BOOT/touch events into triggers
├── input_router.h
├── mode_config.c # Hybrid JSON/fallback mode config entry point
├── mode_config.h
├── mode_json_loader.c # JSON-to-runtime config compiler
├── mode_json_loader.h
├── mode_controller.c # Active mode and temporary boot-mode control
├── mode_controller.h
├── mode_system.Readme.md
├── mode_types.h
├── ptt_state.c # Small host-testable PTT transition state machine
├── ptt_state.h
├── usb_cdc_log.c # CDC ACM virtual serial port log redirect
├── usb_cdc_log.h
├── usb_composite.c # Composite USB HID + microphone + MSC + CDC transport
├── usb_composite.h
├── usb_jtag_cmd.c # CDC serial JSON command handler (ping, get/set WiFi)
├── usb_jtag_cmd.h
├── idf_component.yml # IDF Component Manager dependencies
├── component.mk # Legacy Make support
├── main.c # App orchestration and queued mode/input/event handling
├── ui_status.c # Current mode UI, BOOT overlay, and touch/swipe feedback
└── ui_status.h
Key Configuration (sdkconfig.defaults)
- Octal PSRAM with XIP enabled
- 32 KB instruction cache / 64 KB data cache (64-byte lines)
- LVGL refresh period: 15 ms
- 2 SW draw units for parallel rendering
- IRAM-placed fast-mem attributes for LVGL
- FreeRTOS tick rate: 1000 Hz
- LVGL demo features disabled for this custom UI app
- TinyUSB HID interface count set to 1
- TinyUSB Audio Class is enabled through project-wide compile definitions so the project does not rely on edited
managed_components - TinyUSB CDC ACM is enabled (
CFG_TUD_CDC=1) for a virtual serial port that carries ESP-IDF log output - USB audio sizing matches a Windows-friendlier TinyUSB microphone profile
- Console output is set to
none(CONFIG_ESP_CONSOLE_NONE=y) because USB-Serial-JTAG is unavailable while TinyUSB owns the USB peripheral
Manual Validation
Firmware-Side
- Build with
idf.py buildor.\flash.ps1 -BuildOnly - Flash and confirm the default screen shows the centered active mode heading with no fallback
Cursor modeplaceholder text - Confirm the in-card hint/status area is blank until populated by mode activity
- Verify there is no separate
Touch Controllertitle row - Press and hold the touchscreen briefly without swiping and confirm the touch-down palette shifts to dark red while pressed
- Perform touch gestures and confirm the in-card debug label shows text like
PRESS,TAP,DOUBLE TAP,LONG PRESS,HOLD END, and swipe directions - Confirm the BOOT-position marker is visible near the physical BOOT button, uses the green accent at idle, and turns red while
BOOTis held - Press and hold
BOOTto confirm the BOOT selector appears withSwipe to switch mode, the active network address on the next line once Wi-Fi is ready,Release BOOT = Confirm, and the centered active mode label still visible - While holding
BOOT, swipe left or right and confirm the selected mode changes - Release
BOOTand confirm the newly selected mode remains active - Watch serial logs for USB init, BOOT press/release, touch events, HID send messages, and microphone streaming start/stop
Host-Side
- Connect the board to the USB-OTG-capable port used for device mode
- Confirm Device Manager shows a USB keyboard / HID entry and Windows Sound settings show
PTT Smart Mic Microphone - Verify BOOT mode changes do not emit an
F13key event by themselves - Verify in
Cursormode that a 400+ ms stationary press sendsF13down and release sendsF13up - Verify in
Cursormode that tap sendsF14 - Verify in
Cursormode that double tap sendsEnter - Verify in
Cursormode that swipe up sendsCtrl+AthenBackspace - Verify in
Cursormode that swipe down sendsCtrl+. - Verify in
Cursormode that swipe left sendsCtrl+N - Verify in
Cursormode that swipe right sendsEnter - Verify a short tap only performs the active mode's mapped tap behavior
- Open Windows
Sound settingsormmsys.cplrecording devices and confirm the mic meter stays quiet when idle - Hold touch in
Cursormode and speak into the board to confirm the recording meter reacts only while dictation is active
AI Context
- Board BSP provided by
waveshare/esp32_s3_touch_amoled_1_75component (display init, touch, backlight). - The onboard microphone path uses the BSP audio layer plus
esp_codec_dev. - The app uses the BSP default display/touch orientation.
sdkconfigis git-ignored;sdkconfig.defaultsis the source of truth for configuration.- Mode-system behavior is split across
main/main.c,main/mode_config.c,main/mode_json_loader.c,main/mode_controller.c,main/input_router.c,main/action_engine.c, andmain/ui_status.c. - Dictation-specific behavior is still supported through
Cursormode plusmain/ptt_state.c,main/audio_input.c, andmain/usb_composite.c. ptt_state.ckeeps PTT transitions deterministic and host-testable without BSP, LVGL, or TinyUSB dependencies.usb_composite.cowns the composite TinyUSB descriptors and callbacks for keyboard HID, USB microphone streaming, MSC storage, and CDC ACM serial, including key report state forF13and any future extra keys.usb_cdc_log.credirectsESP_LOGxoutput to the CDC ACM virtual serial port viaesp_log_set_vprintf().usb_jtag_cmd.cruns a FreeRTOS task that polls the TinyUSB CDC FIFO for newline-delimited JSON commands (ping,get_wifi,set_wifi) and writes JSON responses back over the same port. Thetud_cdc_rx_cbcallback inusb_composite.cis intentionally empty because the command task handles all CDC reads.ui_status.cowns touch gesture detection, the BOOT overlay, the in-card gesture debug label, the BOOT-position marker, and reports high-level touch events back tomain.c.