In the process of data cleaning and processing, datasets play a crucial role. For users of Supametas.AI Cloud Service, a dataset is not only a space to store data but also the foundation for managing and calling multimodal large model APIs.
1. Dataset Overview
On the Supametas.AI platform, each dataset is an independent storage space where users can store cleaned data. Creating a dataset is the first step to using the cloud service. You can view the datasets you have created on the platform’s main page and click the “+ Click to create a dataset” button to enter the dataset creation wizard.
2. Preparations Before Creating a Dataset
Before starting to create a dataset, make sure you have prepared the multimodal large model API. Currently, the platform supports using OpenAI GPT-4 multimodal models. You can get the required API key from OpenAI API Keys. Alternatively, you can choose a third-party OpenAI service provider.
3. Detailed Steps in the Dataset Creation Wizard
During the dataset creation process, you will go through the following configuration steps:
3.1 Enter Dataset Name
- Requirement: Set a name for the new dataset, with a maximum length of 20 characters.
- Tip: The name should be concise and clear for easier management and identification.
3.2 Add Dataset Description (Optional)
- Requirement: You can add a brief description of the dataset, with a maximum of 50 characters.
- Suggestion: The description can include the dataset’s purpose, data source, or other relevant information.
3.3 Choose Model Type
In the “Model Settings” area, you need to decide which type of model to use:
- Built-in System Model: Use the default model configuration provided by the platform, but its quota is limited and will stop working once used up.
- Configure External Model: It is recommended to configure external models (e.g., OpenAI/OneAPI), giving you more flexibility in controlling usage and costs.
3.4 API Configuration (For External Models)
If you choose to use an external model, you need to configure the following information:
- API Key: Enter the API key you obtained.
- BaseUrl: Fill in the base URL of the API.
- Channel Selection: Choose the corresponding API channel from the dropdown list, such as OpenAI.
- Model Selection: Choose the specific model version you need, such as gpt-4-turbo.
3.5 Save Configuration and Create Dataset
- Action: After completing the above settings, click the “Next” button at the bottom of the page to save.
- Verification: The system will automatically check if the API configuration you entered is correct. If there are errors, it will prompt you to re-enter; once correct, the dataset will be created automatically.
Tip: The dataset will use the API model you configured. Each time you create or modify a dataset, you can specify a different API model to better control usage and costs. For the built-in system models, since the quota is limited, it is recommended to use your own API whenever possible. If you have special requirements, you can also contact Supametas.AI by email to customize your quota (Email: [email protected]).
Creating a dataset is the first step in using Supametas.AI Cloud Service and serves as the foundation for subsequent operations such as data cleaning and metadata importing. Through the detailed explanation in this article, I hope it will help you successfully create a dataset and gain a clearer understanding of key steps like API configuration. Whether you're a beginner or an experienced user, you can quickly get started and fully leverage the multimodal large model services provided by Supametas.AI.