Text Vectorizer (DP_VEC)

Transform textual data into numerical representations that are compatible with machine learning models, enabling efficient processing of text-based tasks.

Text Vectorizers are tools used to convert textual data into numerical representations suitable for machine learning models. They process text inputs and transform them into feature vectors, enabling the API to perform natural language processing and text-based tasks efficiently.

Supported Vectorizers:

  1. TF-IDF Vectorizer: Assigns weights to words based on their importance in a document and rarity across the dataset, capturing their significance for modeling.

  2. Count Vectorizer: Counts the occurrences of each word in a document, representing it as a sparse matrix with word frequencies.

  3. Hashing Vectorizer: Converts words into numerical indices using a hashing trick, providing memory-efficient representations.

These vectorizers are crucial for handling text data in the API, facilitating tasks like text classification, sentiment analysis, and other natural language processing tasks.

Sample Request

The request performs text vectorization using the TF-IDF vectorizer with specified boundaries to scale the data on the specified variables.

{
    "project_id": 1,
    "parent_id": 5,
    "block_id": 6,
    "function_code": "DP_VEC",
    "args": {
        "vectorizer": "tfidf",
        "boundariestoscale": ":, :",
        "dataset": false,
        "xtrain": true,
        "xtest": true,
        "x": true,
        "ytrain": false,
        "ytest": false,
        "y": false
    }
}

Parameter Details

Principal Component Analysis

POST https://autogon.ai/api/v1/engine/start

Request Body

Name
Type
Description

project_id*

int

current project ID

parent_id*

int

parent block ID

block_id*

int

current block ID

function_code*

String

block's function code

args*

object

block arguments

boundariestoscale

String

boundaries to vectorize

dataset/x/y/xtrain/ytrain/xtest/ytest

bool

variables to apply vectorizer

vectorizer

String

Type of vectorizer to apply:

tfidf : Converts text data into numerical features based on term frequency-inverse document frequency, capturing word importance in documents and across the corpus.

count: Transforms text data into numerical features by counting the occurrences of words

hashing: Uses a hashing trick to map words into fixed-size feature vectors

{
    "status": "true",
    "message": {
        "id": 3,
        "project": 1,
        "block_id": 7,
        "parent_id": 6,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
// Some code

Last updated