Using Claude Code through Venice.ai

I have heard good things about Claude Code CLI with Opus 4.5 for coding. Since I’ve also heard about usage limits and I’ve noticed that Venice.ai now supports commercial models (including Opus 4.5) with pay-per-credit, I decided to give it a try.

You can use Venice.AI inference using DIEM tokens. One DIEM token gives you $1 of inference per day (you don’t burn them, next day you get another $1 of inference if you hold the token). You can either buy these DIEM tokens or mint them for staked VVV tokens (which pay an interest rate). I do not recommend investing in either, my position is hedged and VVV was losing it’s value pretty fast, but if you know what you are doing (i.e. hedging, or just paying for the tokens if they seem a good value), it might be a good idea.

Let’s get technical though.

Install litellm-proxy:

pip install litellm[proxy]
mkdir ~/.litellm

Make a config file called ~/.litellm/config.yaml (here is a gist):

model_list:
  - model_name: claude-sonnet-4-5-20250929
    litellm_params:
      model: openai/claude-opus-45
      api_base: https://api.venice.ai/api/v1
      api_key: "os.environ/VENICE_LITELLM_API_KEY"
      additional_drop_params: ["context_management"]

  - model_name: claude-opus-4-5-20251101
    litellm_params:
      model: openai/claude-opus-45
      api_base: https://api.venice.ai/api/v1
      api_key: "os.environ/VENICE_LITELLM_API_KEY"
      additional_drop_params: ["context_management"]

  - model_name: claude-haiku-4-5-20251001
    litellm_params:
      model: openai/gemini-3-pro-preview
      api_base: https://api.venice.ai/api/v1
      api_key: "os.environ/VENICE_LITELLM_API_KEY"
      additional_drop_params: ["context_management"]

litellm_settings:
  drop_params: True
  set_verbose: False
  return_mapped_model_name: True

(You might need to change model names, I will not update it as model names update)

This will map Haiku to Gemini 3 Pro on Venice, which is cheaper than Opus and might work better for many use cases, if Opus can’t fix a bug, give Gemini a try or vice versa. Opus stays Opus. Find current Venice model names here. Feel free to try Grok or other models as well.

Now install Claude Code, I installed it through homebrew:

brew install claude-code

Add two configs. First is ~/.claude/claude.json:

{
  "auth": {
    "accessToken": "local-proxy-session",
    "email": "local@proxy.internal"
  },
  "currentProject": {
    "onboarded": true
  }
}

Second is ~/.claude/settings.json:

{
  "model": "opus",
  "hasTrustDialogAccepted": true,
  "hasCompletedProjectOnboarding": true,
  "primaryApiKey": "sk-ant-litellm-proxy-123"
}

Set your API keys:

export VENICE_LITELLM_API_KEY="api-key-you-created-on-venice-ai"
export ANTHROPIC_BASE_URL="http://0.0.0.0:4000"
export ANTHROPIC_API_KEY="sk-litellm-any-key"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1

I put these into my shell’s config (since I use my dotfiles everywhere, it is ~/.extra, but it might be your ~/.bashrc)

Now run litellm-proxy:

litellm --config ~/.litellm/config.yaml

And run claude code:

claude

I made a nice cat farting game to test it out, three prompts, first made the game, second would render the cats close to the fart unconscious for a while. The third I would switch to haiku with /model, which switched to Gemini and it added a soundtrack. And then I fixed one small bug with Opus again.

Note: Claude Code eats tokens really fast, that’s why I recommend creating a special API key just for it and set consumption limits, so it does not eat your balance (especially if it is not DIEM, but USD). Also, compared to upstream Claude, this translation from Anthropic API to Venice’s OpenAI-compatible API does not cache tokens. That means much faster consumption of tokens than what would be consumed with Anthropic’s endpoint. On the other hand, you can easily have much higher limits than even on the $200 Max plan for a better price.

YMMV, do your own research, not an investment advice, this is punk.