Autogon Docs
  • Change Log
  • Get Started
  • Libraries
  • Slicing & Indexing
  • Autogon Engine (Studio)
    • Data Processing
      • Data Input (DP_1)
      • Automated Data Processing (DP_ADP)
      • Missing Data (DP_2)
      • Data Encoding (DP_3)
      • Data Split (DP_4)
      • Feature Scaling (DP_5)
      • Drop Columns (DP_6)
      • Time Stepper (DP_7)
      • Parse Datetime (DP_PDT)
      • Reorder Columns (DP_ROC)
      • Feature Sampling (DP_FSP)
      • Reshape Array (DP_RSH)
      • Column Astype (DP_ASP)
      • Show Duplicates (DP_SDC)
      • Drop Duplicates (DP_DRD)
      • Scalar to Ndarray (DP_STN)
      • Image to Ndarray (DP_ITN)
      • Dataset Info (DP_INF)
      • Dataset Correlations (DP_CRR)
      • Dataset Description (DP_DSC)
      • Dataset Datatypes (DP_DTY)
      • Dataset Uniques (DP_UNQ)
      • Dataset Stats Counts (DP_STC)
      • Principal Component Analysis (DP_PCA)
      • Text Vectorizer (DP_VEC)
      • Resampler (DP_RES)
    • Data Visualization
      • Scatter Plots (DP_SCP)
      • Ordinary Plots (DP_ORD)
      • Compare Scatter Plots (DP_CSP)
      • Pie Plots (DP_PIE)
      • Heatmap Plots (DP_HMP)
    • Machine Learning
      • Simple Linear Regression (ML_R_1)
      • Multiple Linear Regression (ML_R_2)
      • Polynomial Linear Regression (ML_R_3)
      • Support Vector Regression (ML_R_4)
      • Decision Tree Regression (ML_R_5)
      • Random Forest Regression (ML_R_6)
      • Logistic Regression (ML_CN_1)
      • K-Nearest Neighbors - KNN (ML_CN_2)
      • Support Vector Machine (ML_CN_3)
      • Kernel SVM (ML_CN_4)
      • Naive Bayes (ML_CN_5)
      • Decision Tree Classification (ML_CN_6)
      • Random Forest Classification (ML_CN_7)
      • Hierarchical Clustering (ML_CG_1)
      • K-Means Clustering (ML_CG_2)
      • XGBoost (MS_XGBOOST)
      • Grid Search (ML_GRID)
      • Shap Explain (ML_SHAP)
      • Isolation Forest (ML_ISF)
      • (ML_DBS)
    • Automated Machine Learning
      • AutoRegression (AUTO_R_1)
      • AutoClassification (AUTO_CN_1)
      • AutoRegression II (AUTO_R_2)
    • Deep Learning
      • Artificial Neural Network (DL_ANN)
      • Self Organizing Maps (DL_SOM)
      • Restricted Boltzmann Machine (DL_RBM)
    • Automated Deep Learning
      • Auto Image Classification (A_DL_IMC)
      • Auto Image Regression (A_DL_IMR)
      • Auto Text Classification (A_DL_TXC)
      • Auto Text Regression (A_DL_TXR)
      • Auto Structured Data Classification (A_DL_SDC)
      • Auto Structured Data Regression (A_DL_SDR)
      • General AutoDL Blocks (A_DL_ALL)
  • LabelCraft
    • Images, Annotations and Augmentation
    • Import and Export
    • Model Training and Prediction
  • Production APIs
    • Production Pipelines
  • Autogon Qore
    • Vision AI
    • Natural Language AI
      • Text Classification (Deprecated)
      • Text Summary (Deprecated)
      • Ask Your Data
      • Generate Synthetic Data
      • Speech To Text
      • Text To Speech
      • Sentiment Analyzer (Deprecated)
      • Conversation with Chatbot Agent
      • Conversational Interaction with GPT-4
      • Essay Marker
      • Resume Ranker
      • Translator
    • Voice Cloning
      • Create a Voice
      • Get Voices
      • Text-To-Speech
  • Other APIs
    • Project
      • List all projects
      • Create a New Project
      • Get Project Details
      • Delete a Project
    • Dataset
      • List all datasets
      • Create a Dataset
      • Get a Dataset
      • Updating a dataset
      • Delete a Dataset
      • Dataset Connection
      • Visualize Dataset
Powered by GitBook
On this page
  • Sample Request
  • Encoding Data
  • Encode categorical values

Was this helpful?

  1. Autogon Engine (Studio)
  2. Data Processing

Data Encoding (DP_3)

This functionality converts data to a recognizable format through encoding. Supported techniques including, but are not limited to, one-hot, label and categorical encoding.

This process involves converting data into a format that can be understood by a computer. This can include converting text into numerical values, or categorizing data into discrete groups. The goal of encoding is to make it possible for a machine learning algorithm to interpret and learn from the data.

There are many different types of encoding techniques, such as one-hot encoding, which converts categorical data into a binary format, and label encoding, which assigns a unique numerical value to each category in a categorical variable. The appropriate encoding technique depends on the type of data and the machine learning algorithm being used.

Sample Request

This request encodes categorical values in the X variable with one-hot method, ignoring values in the Y variable.

{
    "project_id": 1,
    "parent_id": 2,
    "block_id": 3,
    "function_code": "DP_3",
    "args": {
        "xvalue": {
            "encode": true,
            "encoding_type": "onehot",
            "remainder": "passthrough",
            "index": 0
        },
        "yvalue": {
            "encode": false,
            "encoding_type": "categorical",
            "remainder": "drop",
            "index": 2
        }
        "save_name": "testweights",
        "load_name": "testweights"
    }
}

Encoding Data

Encode categorical values

POST https://autogon.ai/api/v1/engine/start

Encodes categorical data on specific columns with specified boundaries

Request Body

Name
Type
Description

project_id*

int

current project ID

parent_id*

int

parent block ID

block_id*

int

current block ID

function_code*

String

block's function code

xvalue/yvalue*

object

arguments for X or Y variables

encode*

bool

specify if variable is encoded

args

object

block arguments

encoding_type*

String

One-Hot Encoding: Converts categories into binary columns. Label Encoding: Assigns numbers to categories.

Binary Encoding: Represents categories as binary codes.

Target Encoding: Replaces categories with target stats.

String to Hash Encoding: Hashes strings to numbers.

Extract Numbers Encoding: Converts text numbers to digits.

remainder*

String

applied method to none specified columns; drop drops the unspecified columns for encoding, passthrough ignores unspecified columns

index*

int

column index to apply encoding technique

save_name

String

name to save processing models with.

load_name

String

name to load processing models with. Used to switch to loading mode

{
    "status": "true",
    "message": {
        "id": 3,
        "project": 1,
        "block_id": 3,
        "parent_id": 2,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
// Some code
// Some code
PreviousMissing Data (DP_2)NextData Split (DP_4)

Last updated 1 year ago

Was this helpful?