4. How to Import Local Text Data into Supametas.AI Platform

This article provides a detailed guide on the complete process of importing local text data into the Supametas.AI cloud service platform, including task creation, file upload, task settings, parameter retrieval, and output configuration, helping you efficiently manage text data.

Supametas · 2025-02-22

In the process of data cleaning and processing, text data is one of the most common types. The Supametas.AI platform offers a convenient local text import feature that allows users to quickly upload and process local text files. This article will guide you through the various steps of importing local text data, helping you efficiently manage and utilize your text data.

Create a new task to import from local text for the dataset.png

1. Create a New Task

In the dataset detail page, select the "Local Text Import" option from the "Import Data Source" menu and click the "New Task" button. You will need to provide a task name (up to 20 characters). This name will help you quickly identify and manage the task in the task list.

2. Upload Local Text Files

After naming the task, proceed to the file upload stage:

Upload Methods:
- You can drag and drop files into the upload area or click the upload button to select local files.
Supported File Formats:
- Common text file formats such as .docx, .pdf, .txt, .md, .json, etc., are supported.
Upload Quantity and File Size Limits:
- A maximum of 50 files can be uploaded per task;
- The file size for each file must not exceed 200MB (in some cases, the CDN restriction may be around 100MB).
Helpful Tip:
- Ensure that the multiple files uploaded within the same task have similar content, as this will help improve the accuracy of parameter retrieval and output processing.

3. Task Settings

Task settings for local text import are similar to those for web import tasks, primarily ensuring that the system can correctly parse and extract data from the text files:

Choose the appropriate parsing method based on the file type.
Configure related fields to ensure the system can extract titles and main content from the files.

4. Retrieve Parameters

The parameter retrieval step is crucial to help the system identify and capture the required data from the text files:

Default Fields:
- Title: The system will attempt to automatically extract the title from the file;
- Content Details: The system will capture and save the main text content from the file.
Custom Fields:
- If you need to categorize specific data within the text, you can enable the custom field feature.
- For example, if you need to capture nickname information, you can add a custom field (field names should be in English, and detailed descriptions are recommended to improve accuracy).

5. Output Settings

After retrieving parameters, you will need to configure the output settings to determine the format in which the captured data will be saved:

Output Format Selection:
- You can choose to save the data in JSON format, which is convenient for subsequent API calls and processing;
- Or choose Markdown format, which is useful for building knowledge bases and displaying documents.

6. Save or Execute the Task Immediately

Finally, you have two options:

Save and Execute Later:
- Save the task configuration to the task list for future manual execution.
Execute Task Immediately:
- If everything is configured correctly and you are ready, click the "Execute Task Now" button, and the system will start processing the uploaded text files and import the extracted data into the specified dataset.

With intuitive task creation, file upload, parameter retrieval, and output configuration processes, handling text data becomes simple and efficient. Whether managing batch documents or processing individual text files, this feature allows you to quickly build and optimize datasets, laying a solid foundation for subsequent data cleaning and multimodal model processing.

Stop wasting time on data processing

Start your SaaS version trial, free, zero threshold, out of the box

Get Started

Private Deployment

We have already understood the data privacy needs of enterprises. In addition to the SaaS version, the Docker deployment version is also in full preparation

Coming soon..