OpenAI API Overview: ChatGPT, DALL-E, Whisper and more.

How to Leverage OpenAI with Xano
- What is OpenAI?
- What does OpenAI do?
- How can I utilize OpenAI with Xano?
- API Pricing
- Authentication / Headers
  - Requesting organization
- Making requests
- Models
  - Chat Models
  - Vector Representations and Embeddings
    - What are Embeddings?
    - What are Tokens?
    - Token Limits
  - Audio Models
    - Transcribe
    - Translate
  - Image Models
    - Generate
    - Edit
    - Variation
  - Completions
  - Edits

What is OpenAI?

OpenAI is an AI company based in San Francisco. They are best known for developing GPT-3, a powerful LLM trained on billions of words from the Internet. GPT-3 shows how LLMs can be applied in many creative ways without needing to code. With a bit of guidance, these models can build simple apps, generate content, converse, and more.

OpenAI Overview Slides:

https://photos.google.com/share/AF1QipO0JGBQp13Ts13uk8MzzDQvbIu7oDIHp9BMCW9UWv6VuHMDT6YDkZBPnAodTFyMgA?key=Y2FNeURvTzlIclNKcG9kRjFIM01OYjRMZ2l4Tll3

What does OpenAI do?

OpenAI has developed several powerful AI models that are available to the public, including GPT, a natural language processing model. GPT can write coherent and convincing text, answer questions, and even generate code. Other models developed by OpenAI include DALL-E, which can generate images from textual descriptions, and Whispers which can transcribe audio files to text. They also have an embedding API that allows for converting text/documents into vector representations which are useful for adding context to prompts with larger datasets.

How can I utilize OpenAI with Xano?

Open AI has made their models accessible via API, meaning we can directly interact with the models using Xano. These models are mostly powered by human language text inputs which makes it extremely accessible and easy to work with. With a small amount of effort, huge results can be achieved.

Getting Started

To get started you can familiarize yourself with OpenAi’s API reference docs here: API Reference

If you don’t know what you are looking at that’s ok we’ll explain how you can use the reference and models throughout this article.

OpenAI’s API Pricing

Unlike ChatGPTs web app, utilizing OpenAI’s API is a paid service. You can find a full breakdown of the API pricing here: OpenAI Pricing

API Cost Warning

Models such as GPT-4 can be expensive. Providing GPT4 with its biggest possible input prompt (32k tokens) can cost $1.92 USD for the input processing. And then if the output is at it's maximum output size (32k tokens) it will cost $3.84 for the response totaling $5.76 USD for a single API call.

Be sure to understand the costs of the model prior to using it.

Authentication / Headers

The OpenAI API uses API keys for authentication. Visit your API Keys page to retrieve the API key you'll use in your requests. You’ll need to register for an OpenAI account if you haven’t already.

Create an environment variable via the Settings section in Xano and add your OpenAI API Key.

All OpenAI API requests should include your API key in an Authorization HTTP header as follows:

Authorization: Bearer OPENAI_API_KEY

How can you do this in Xano?

When adding an external API request to your function stack you will notice there is an input field for headers.

Headers are an Array so in order to add something to the headers we can use the PUSH filter which adds an item to the end of an array.

We need to push our Authorization string through in order to Authenticate our requests you can paste the example in. Authorization: Bearer $OPENAI_API_KEY

However, we want to update the string to dynamically include our OpenAI API key which we stored as an environment variable. We can do this using the REPLACE filter.

It will look like this:

Don't forget to save your changes!

Requesting organization -

For users who belong to multiple organizations, you can pass a header to specify which organization is used for an API request. Usage from these API requests will count against the specified organization's subscription quota. (Skip this if you are only part of one organization).

Example curl command:

curl https://api.openai.com/v1/models \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -H 'OpenAI-Organization: org-c0vZYfhzt6L7XJqSl9ZysuSL'

Making requests

You can make requests to OpenAI's API endpoints using Xano. For example:

curl https://api.openai.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Say this is a test!"}
    ],
    "temperature": 0.7
  }'

These example curl requests can be found throughout the API reference docs and can be copied and imported into Xano saving a heap of time. Start by copy the example curl request from the model you would like to utilize:

When adding an external API request to your function stack you’ll see the IMPORT CURL button in the top right-hand corner. Pasting in the curl will populate the required input fields as per the API docs specifications. You will just need to update the prompt input and add your API key via the steps shown above.

Try it out for yourself using the example curl provided above. 💪🔥🤖

Models

List models

GET https://api.openai.com/v1/models

Lists the currently available models, and provides basic information about each one such as the owner and availability.

Example curl:

curl https://api.openai.com/v1/models \
  -H 'Authorization: Bearer $OPENAI_API_KEY'

Chat Models

ChatGPT 3.5 & 4

The OpenAI Chat API is a powerful tool that allows developers to integrate AI-powered conversational capabilities into their applications. The API uses models like gpt-3.5-turbo to generate responses in a chat-like format.

Here's a breakdown of how to use the API:

Endpoint: The endpoint to create a chat completion is POST <https://api.openai.com/v1/chat/completions>.
Body Parameters

Required:
- model: The ID of the model to use 'gpt-3.5-turbo' is a recommended model.
- messages: An array of message objects that describe the conversation so far. Each message object should have a role (either 'system', 'user', or 'assistant') and content (the content of the message).
  
  Optional:
- temperature and top_p: These parameters control the randomness of the model's output. You generally should alter one or the other, but not both.
- n: The number of chat completion choices to generate for each input message.
- stream: If set to true, partial message deltas will be sent as they become available. (Not supported with Xano currently)
- stop: Sequences where the API will stop generating further tokens.
- max_tokens: The maximum number of tokens to generate in the chat completion.
- presence_penalty and frequency_penalty: These parameters can be used to control the model's tendency to introduce new topics or repeat itself.
- logit_bias: This allows you to modify the likelihood of specified tokens appearing in the completion.
- user: A unique identifier representing your end-user.
Response: The response from the API will include the ID of the chat completion, the created timestamp, the generated message from the assistant, and usage information.

Here's an example curl to try yourself:

curl https://api.openai.com/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Here are four sample applications that could be built using this API:

Customer Support Chatbot: You can create a chatbot that can handle customer inquiries, provide information about products or services, and help resolve common issues.
Virtual Assistant: You can build a virtual assistant that can help users with tasks like setting reminders, sending emails, or finding information online.
Interactive Storytelling: You can create an interactive storytelling application where the user can have a back-and-forth conversation with characters in the story.
Language Learning App: You can build an app where users can practice conversing in a new language with an AI assistant.

Tutorial Video:

https://www.youtube.com/watch?v=fnH8KjiUjYc

Vector Representations (Embeddings)

Vector representations, or embeddings, are a way to represent words or pieces of text as mathematical vectors that capture their semantic meaning. OpenAI offers an embeddings API that allows you to generate vector representations for input texts. These embeddings can then be used to add context to prompts for OpenAI's models or stored in a vector database for later use.

For example, you could generate embeddings for your product documentation or knowledge base using OpenAI's API. Then you could store those embeddings in a vector database like Pinecone and use Xano to query the database to create a chatbot trained on your own data. The chatbot could handle questions about your products and documentation using the semantic information captured in the embeddings.

Why are Embeddings needed?

The reason why embeddings are important is due to limitations with the amount of text that can be give to a model or the “prompt size”. Each model has a token limit which defines how large the prompt can be.

What are tokens?

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

1 token ~= 4 chars in English
1 token ~= ¾ words
100 tokens ~= 75 words

1-2 sentence ~= 30 tokens
1 paragraph ~= 100 tokens
1,500 words ~= 2048 tokens

Model Token Limits

gpt-3.5-turbo - 4,096 tokens - (Approx 3072 Words)

gpt-4 - 8,192 tokens (Approx 6144 Words)

gpt-4-32k - 32,768 tokens (Approx 24576 Words)

Embeddings can extend the memory capabilities of models

Embeddings and the semantic search capabilities they provide are for when you need to work with data that exceeds a model's token limit. You are instead able to search a vector database for related content (context) returning only relevant or related components from your dataset effectively extending the memory capabilities of the model.

Create embeddings

POST https://api.openai.com/v1/embeddings

Creates an embedding vector representing the input text.

Request body

model: ID of the model to use.

input: Input text to embed, encoded as a string or array of tokens.

Other parameters.

user: A unique identifier for the end-user.

Example Curl:

curl https://api.openai.com/v1/embeddings \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-ada-002"
  }'

ResponseThe response will include the embedding vector for the input text.

You can store this vector in a vector database such as Pinecone.

Example App Workflow - Knowledge Base FAQ Bot

You could generate embeddings for your product documentation or knowledge base using OpenAI's API. Then you could store those embeddings in a vector database like Pinecone and use Xano to query the database to create a chatbot trained on your own data.

Step one generating embeddings and storing them in Pinecone

You would then be able to create a chatbot workflow that would leverage OpenAIs Embeddings and Chat Completions APIs via querying Pinecone for related information.

Audio Models

Whisper Audio Translation Model by OpenAI

What is Whisper?
- Whisper is a neural net developed by OpenAI that provides high-level accuracy in English speech recognition.
- Whisper is trained on 680,000 hours of multilingual and multitask supervised data collected from the web, allowing it to be robust against various accents, background noises, and technical language.
- The architecture of Whisper is a simple end-to-end approach, implemented as an encoder-decoder Transformer. It can handle tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.
Example Applications:
- Whisper could be used to develop robust voice interfaces for applications in various industries.
- It can be utilized in transcribing multilingual speeches, making it useful in global conferences, online classes, and more.

Create transcription

POST https://api.openai.com/v1/audio/transcriptions

Transcribes audio into the input language.

Request body

file: The audio file object to transcribe.

model: ID of the model to use. Only whisper-1 is currently available.

Other parameters to control the model's output.
language: The language of the input audio.

Response

The response will include the transcription of the audio file.

Example Request:

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

An example application using the transcription endpoint

You could build an audio transcription service.

Workflow Overview:

Now, let's go over the step-by-step instructions on how to build this application:

Set up your Xano account: Sign up for a Xano account and create a new project.
Create the API endpoint: Use Xano's visual API builder to create a new endpoint /transcribe. This endpoint should accept POST requests and the request body (input) should be an audio file.

You can import the following curl via your external API request, being sure to update your API key and adding the file path to be an audio file/file resource input

curl https://api.openai.com/v1/audio/transcriptions \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F file='@/path/to/file/audio.mp3' \
  -F model='whisper-1'

Transcribe the audio: When a request is received at the /transcribe endpoint, pass the audio file to the Whispers model for transcription. The Whispers model will convert the speech in the audio file to text.
Return the transcribed text: The text output from the Whispers model should be returned in the response from the /transcribe endpoint.
Set up the database: Create a database table in Xano to store the audio files and their corresponding transcriptions. You will need to create a table with columns for the audio file and the transcribed text.
Store the audio and transcriptions: After the audio has been transcribed, store both the audio file and the transcribed text in the database.
Test the application: Finally, test the application by sending an audio file to the /transcribe endpoint and checking that the transcribed text is returned in the response and stored in the database.

Create translation

POST https://api.openai.com/v1/audio/translations

Translates audio into English.

Request body

file: The audio file object to translate.

model: ID of the model to use. Only whisper-1 is currently available.

ResponseThe response will include the English translation of the audio file.

You can use Whisper to develop voice interfaces or transcribe multilingual speech.

Image Generation

What is DALL-E?

DALL-E is a neural network developed by OpenAI that generates images from text descriptions.
It is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs.
DALL-E has capabilities like creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.

Create image

POST https://api.openai.com/v1/images/generations

Creates an image given a prompt.

Request body

prompt: A text description of the desired image(s).

n: The number of images to generate. Must be between 1 and 10.

size: The size of the generated images. Must be '256x256', '512x512', or '1024x1024'.

Other parameters.
user: A unique identifier for the end-user.

Example Curl - (Generate 2 images)

curl https://api.openai.com/v1/images/generations \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -d '{
    "prompt": "A cute baby sea otter",
    "n": 2,
    "size": "1024x1024"
  }'

Be sure to update your input prompt and API key.

ResponseThe response will include URLs to the generated images. (2)

You can use DALL-E to generate images for advertising, digital art, education, and more.

Example Workflow:

Image Edit

Image Edit endpoint: POST https://api.openai.com/v1/images/edits

This endpoint accepts an original image and a prompt and generates an edited version of the image based on the prompt. For example, you could provide a picture of a red ball and a prompt of 'change the ball to blue” and get back an edited image with a blue ball.

Example Application: An ecommerce product customizer.

A user could upload a product image like a t-shirt or phone case and enter a text prompt to customize the design, color, or look of the product. The image edit endpoint would generate an edited version of the product image with the customizations, allowing the user to preview the changes before purchasing.

Example Request:

curl https://api.openai.com/v1/images/edits \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -F image='@otter.png' \
  -F mask='@mask.png' \
  -F prompt='A cute baby sea otter wearing a beret' \
  -F n=2 \
  -F size='1024x1024'

Image Variation

Image Variation endpoint: POST https://api.openai.com/v1/images/variations

This endpoint takes an input image and generates stylistic variations of that image. For example, you could provide a landscape photo and get back multiple variations that adjust the brightness, color palette, cropping, etc. The output images are creatively adapted versions of the original photo.

Example Application: A social media content generator.

With an image variation API, you could build an app to generate curated social media content for influencers or brands. The user would provide a photo they want to post, and the image variation endpoint would return multiple variations of that photo with different stylings. The user could then select the variation they like best to auto-post to their social media profiles, saving time and ensuring high quality, unique content. This type of application could work with photos of products, lifestyle shots, portraits, food, etc. The image variation endpoint is able to creatively adapt images in many domains.

Example Request

curl https://api.openai.com/v1/images/edits \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -F image='@otter.png' \
  -F mask='@mask.png' \
  -F prompt='A cute baby sea otter wearing a beret' \
  -F n=2 \
  -F size='1024x102

Completions

OpenAI's Completions endpoint generates natural language completion of prompts. It uses models like GPT-3 to continue and complete partial sentences or generate long-form text based on a prompt.

Some main uses of the Completions endpoint include:

Generating long-form text: You can provide a prompt like 'Here is a draft blog post: ' and GPT-3 will generate the rest of the blog post for you.
Story or creative writing: Give GPT-3 a prompt with some starter sentences or characters and it can generate full short stories, creative fiction pieces, or screenplays.
Conversation or question answering: Provide some initial messages or questions to simulate a conversation and GPT-3 will generate responses to continue the conversation.
Idea expansion: Give GPT-3 a one or two sentence prompt describing an idea and it can expand on points, examples, and details to build out the concept.
Paraphrasing or summarizing: Provide text you want to paraphrase or summarize and GPT-3 can rephrase it in different words with the same meaning or condense longer text into a shorter summary.

Create completion

POST https://api.openai.com/v1/completions

Creates a completion for a prompt and parameters.

Request body

model: ID of the model to use.

prompt: The prompt(s) to generate completions for.

max_tokens: The maximum number of tokens to generate.

Other parameters to control the model's output.user: A unique identifier for the end-user.

Example Curl

curl https://api.openai.com/v1/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer $OPENAI_API_KEY' \
  -d '{
    "model": "text-davinci-003",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'

Response

The response will include the completion for the prompt.

The completions endpoint is able to return answers to questions and can be used in a multi step process to provide useful results.

Example App: SEO Optimized Blog Post Generator

The user logs into the application.
The user navigates to the blog article creation page.
The user inputs the topic or initial idea for the blog article.
The application sends this initial input to Xano.
Xano sends this topic to OpenAI's Davinci Model.
OpenAI's model returns suggested keywords to Xano.
Xano returns these keywords to the blog article creation page.
The user reviews and selects the desired keywords.
The user lists the topics they'd like to cover in the blog in short sentences.
The application sends these topic sentences to Xano.
Xano sends these topics to OpenAI's Davinci Model.
OpenAI's model returns SEO-optimized suggestions to Xano.
Xano returns these suggestions to the blog article creation page.
The user reviews and incorporates these suggestions into the blog article.
The user saves the blog article, which is stored in the Xano database.
The user can then publish the blog article when ready.

This demonstrates how you can use the completions endpoint in a multi-step process to create guided user experiences.

Video Example - Create a Restaurant Review Generator:

https://www.youtube.com/watch?v=ehSnwEFh5Dc

Edits

The Edits endpoint allows you to provide an input text and instructions for how to edit that text, and it will return the edited text. For example, you could provide the input 'What day of the wek is it?' and the instruction 'Fix the spelling mistakes' and get back 'What day of the week is it?'.

This endpoint uses models like the Text Davinci Edit model which has been trained on a dataset of (input, instruction, output) examples to learn how to apply edits.

Example Application: An automated proofreading tool.

You could build an app that allows users to submit any text content like blog posts, articles, or short stories and get AI-powered edits and proofreads.

The workflow would be:

User submits their text content.
The app sends the text to the Edits endpoint with the instruction 'proofread and correct any errors'.
The Edits endpoint returns the text with all spelling, grammar, and punctuation edits.
The edited text is shown to the user. They can then choose to accept all edits, pick and choose edits, or undo any edits they don't want.
Once the user approves the final edits, the text is saved as the proofread version.

This type of automated proofreading tool could save content creators a lot of time and ensure high-quality writing. The Edits API allows you to easily build a proofreading experience with OpenAI models that have been trained on massive datasets to properly apply edits for grammar, spelling, and style.

Create edit

POST https://api.openai.com/v1/edits

Creates an edit for the provided input and instruction.

Request body

model: ID of the model to use. You can use 'text-davinci-edit-001' or 'code-davinci-edit-001'.

input: The input text to edit.

instruction: The instruction telling the model how to edit the input.

Other parameters.

ResponseThe response will include the edited input text.

Example Request:

curl https://api.openai.com/v1/edits -H 'Content-Type: application/json' -H 'Authorization: Bearer $OPENAI_API_KEY' -d '{
 'model': 'text-davinci-edit-001',
 'input': 'What day of the wek is it?',
 'instruction': 'Fix the spelling mistakes'
 }'

OpenAI API Overview: ChatGPT, DALL-E, Whisper and more.

Table of Contents

What is OpenAI?

What does OpenAI do?

How can I utilize OpenAI with Xano?

Getting Started

OpenAI’s API Pricing

API Cost Warning

Authentication / Headers

How can you do this in Xano?

Requesting organization -

Making requests

Try it out for yourself using the example curl provided above. 💪🔥🤖

Models

Chat Models

ChatGPT 3.5 & 4

Vector Representations (Embeddings)

Why are Embeddings needed?

What are tokens?

Model Token Limits

Embeddings can extend the memory capabilities of models

Example App Workflow - Knowledge Base FAQ Bot

Audio Models

Whisper Audio Translation Model by OpenAI

Create transcription

An example application using the transcription endpoint

Create translation

Image Generation

What is DALL-E?

Create image

Image Edit

Image Variation

Completions

Example App: SEO Optimized Blog Post Generator

Edits