Returns the tokenized representation of a text.
This function doesn’t call the API, encoding is performed locally.
encode(encoding, text, specialTokens?)
encoding
The encoding to use. Currently available encodings:
cl100k_basep50k_basep50k_editr50k_basegpt2You can get the encoding for a model using openai.encodingForModel.
For example, gpt-3.5-turbo and gpt-4 models use "cl100k_base".
text
The string to encode.
specialTokens optional
An object that maps additional special tokens to their values.
For example: { "<|endoftext|>": 50256 }
An array of numbers representing the encoded text.
Since OpenAI tokenizers don’t have a formal specification, the current encoder may differ from the one used by OpenAI in some cases.
openai.encode("cl100k_base", "hello world")
[15339, 1917]