Autogon Docs
  • Change Log
  • Get Started
  • Libraries
  • Slicing & Indexing
  • Autogon Engine (Studio)
    • Data Processing
      • Data Input (DP_1)
      • Automated Data Processing (DP_ADP)
      • Missing Data (DP_2)
      • Data Encoding (DP_3)
      • Data Split (DP_4)
      • Feature Scaling (DP_5)
      • Drop Columns (DP_6)
      • Time Stepper (DP_7)
      • Parse Datetime (DP_PDT)
      • Reorder Columns (DP_ROC)
      • Feature Sampling (DP_FSP)
      • Reshape Array (DP_RSH)
      • Column Astype (DP_ASP)
      • Show Duplicates (DP_SDC)
      • Drop Duplicates (DP_DRD)
      • Scalar to Ndarray (DP_STN)
      • Image to Ndarray (DP_ITN)
      • Dataset Info (DP_INF)
      • Dataset Correlations (DP_CRR)
      • Dataset Description (DP_DSC)
      • Dataset Datatypes (DP_DTY)
      • Dataset Uniques (DP_UNQ)
      • Dataset Stats Counts (DP_STC)
      • Principal Component Analysis (DP_PCA)
      • Text Vectorizer (DP_VEC)
      • Resampler (DP_RES)
    • Data Visualization
      • Scatter Plots (DP_SCP)
      • Ordinary Plots (DP_ORD)
      • Compare Scatter Plots (DP_CSP)
      • Pie Plots (DP_PIE)
      • Heatmap Plots (DP_HMP)
    • Machine Learning
      • Simple Linear Regression (ML_R_1)
      • Multiple Linear Regression (ML_R_2)
      • Polynomial Linear Regression (ML_R_3)
      • Support Vector Regression (ML_R_4)
      • Decision Tree Regression (ML_R_5)
      • Random Forest Regression (ML_R_6)
      • Logistic Regression (ML_CN_1)
      • K-Nearest Neighbors - KNN (ML_CN_2)
      • Support Vector Machine (ML_CN_3)
      • Kernel SVM (ML_CN_4)
      • Naive Bayes (ML_CN_5)
      • Decision Tree Classification (ML_CN_6)
      • Random Forest Classification (ML_CN_7)
      • Hierarchical Clustering (ML_CG_1)
      • K-Means Clustering (ML_CG_2)
      • XGBoost (MS_XGBOOST)
      • Grid Search (ML_GRID)
      • Shap Explain (ML_SHAP)
      • Isolation Forest (ML_ISF)
      • (ML_DBS)
    • Automated Machine Learning
      • AutoRegression (AUTO_R_1)
      • AutoClassification (AUTO_CN_1)
      • AutoRegression II (AUTO_R_2)
    • Deep Learning
      • Artificial Neural Network (DL_ANN)
      • Self Organizing Maps (DL_SOM)
      • Restricted Boltzmann Machine (DL_RBM)
    • Automated Deep Learning
      • Auto Image Classification (A_DL_IMC)
      • Auto Image Regression (A_DL_IMR)
      • Auto Text Classification (A_DL_TXC)
      • Auto Text Regression (A_DL_TXR)
      • Auto Structured Data Classification (A_DL_SDC)
      • Auto Structured Data Regression (A_DL_SDR)
      • General AutoDL Blocks (A_DL_ALL)
  • LabelCraft
    • Images, Annotations and Augmentation
    • Import and Export
    • Model Training and Prediction
  • Production APIs
    • Production Pipelines
  • Autogon Qore
    • Vision AI
    • Natural Language AI
      • Text Classification (Deprecated)
      • Text Summary (Deprecated)
      • Ask Your Data
      • Generate Synthetic Data
      • Speech To Text
      • Text To Speech
      • Sentiment Analyzer (Deprecated)
      • Conversation with Chatbot Agent
      • Conversational Interaction with GPT-4
      • Essay Marker
      • Resume Ranker
      • Translator
    • Voice Cloning
      • Create a Voice
      • Get Voices
      • Text-To-Speech
  • Other APIs
    • Project
      • List all projects
      • Create a New Project
      • Get Project Details
      • Delete a Project
    • Dataset
      • List all datasets
      • Create a Dataset
      • Get a Dataset
      • Updating a dataset
      • Delete a Dataset
      • Dataset Connection
      • Visualize Dataset
Powered by GitBook
On this page
  • Sample Request
  • Parameter Details
  • Principal Component Analysis

Was this helpful?

  1. Autogon Engine (Studio)
  2. Data Processing

Text Vectorizer (DP_VEC)

Transform textual data into numerical representations that are compatible with machine learning models, enabling efficient processing of text-based tasks.

Text Vectorizers are tools used to convert textual data into numerical representations suitable for machine learning models. They process text inputs and transform them into feature vectors, enabling the API to perform natural language processing and text-based tasks efficiently.

Supported Vectorizers:

  1. TF-IDF Vectorizer: Assigns weights to words based on their importance in a document and rarity across the dataset, capturing their significance for modeling.

  2. Count Vectorizer: Counts the occurrences of each word in a document, representing it as a sparse matrix with word frequencies.

  3. Hashing Vectorizer: Converts words into numerical indices using a hashing trick, providing memory-efficient representations.

These vectorizers are crucial for handling text data in the API, facilitating tasks like text classification, sentiment analysis, and other natural language processing tasks.

Sample Request

The request performs text vectorization using the TF-IDF vectorizer with specified boundaries to scale the data on the specified variables.

{
    "project_id": 1,
    "parent_id": 5,
    "block_id": 6,
    "function_code": "DP_VEC",
    "args": {
        "vectorizer": "tfidf",
        "boundariestoscale": ":, :",
        "dataset": false,
        "xtrain": true,
        "xtest": true,
        "x": true,
        "ytrain": false,
        "ytest": false,
        "y": false
    }
}

Parameter Details

Principal Component Analysis

POST https://autogon.ai/api/v1/engine/start

Request Body

Name
Type
Description

project_id*

int

current project ID

parent_id*

int

parent block ID

block_id*

int

current block ID

function_code*

String

block's function code

args*

object

block arguments

boundariestoscale

String

boundaries to vectorize

dataset/x/y/xtrain/ytrain/xtest/ytest

bool

variables to apply vectorizer

vectorizer

String

Type of vectorizer to apply:

tfidf : Converts text data into numerical features based on term frequency-inverse document frequency, capturing word importance in documents and across the corpus.

count: Transforms text data into numerical features by counting the occurrences of words

hashing: Uses a hashing trick to map words into fixed-size feature vectors

{
    "status": "true",
    "message": {
        "id": 3,
        "project": 1,
        "block_id": 7,
        "parent_id": 6,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
// Some code
projectId = 1
parentId = 6
blockId = 7

client.array_reshaping(projectId, parentId, blockId, {
PreviousPrincipal Component Analysis (DP_PCA)NextResampler (DP_RES)

Last updated 1 year ago

Was this helpful?