Factory (Droid) recently started pushing the openai provider for GPT models instead of the old generic-chat-completion-api. Even though most guides still say to use generic for everything, I tried switching. It broke instantly.
Here’s why: Factory has three provider types. openai hits the Responses API (/v1/responses), which is what you need for GPT-5+ and Codex. anthropic is for Claude. generic-chat-completion-api is the standard Chat Completions API (/v1/chat/completions) for things like OpenRouter or Fireworks. The difference is the API format, not the models themselves.
When I flipped to provider: "openai", I started seeing 400 errors in my proxy logs: {"detail":"Unsupported parameter: prompt_cache_retention"}.
The issue is that CLIProxyAPIPlus uses your ChatGPT subscription tokens, so it routes requests through chatgpt.com/backend-api/codex/responses instead of the official api.openai.com endpoint. These backends are different. prompt_cache_retention is an OpenAI parameter that controls how long prompt prefixes stay in memory. The subscription backend doesn’t support it, but Factory sends it anyway because it assumes you’re hitting the official API.
The fix
You need to do two things.
First, strip that parameter in your proxy config. I’m using VibeProxy, but this works for any CLIProxyAPIPlus setup. Edit your config.yaml (on VibeProxy, use the one in /Applications/VibeProxy.app/Contents/Resources/config.yaml so your changes actually persist):
payload:
filter:
- models:
- name: "gpt-*"
params:
- "prompt_cache_retention"
Then update your ~/.factory/settings.json. Change provider to "openai" and move your reasoning settings to the new nested format:
{
"model": "gpt-5.4",
"baseUrl": "http://localhost:8317/v1",
"apiKey": "dummy-not-used",
"displayName": "GPT 5.4 (Low)",
"extraArgs": {
"reasoning": {"effort": "low"}
},
"provider": "openai"
}
The baseUrl stays the same. The proxy handles both endpoints on the same port.
I checked it with curl first to make sure the proxy was happy with the Responses API format:
curl -s -X POST http://localhost:8317/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy-not-used" \
-d '{
"model": "gpt-5.4",
"input": "Say hello in one word",
"reasoning": {"effort": "low"},
"stream": true
}'
Everything worked fine once prompt_cache_retention was out of the way.