Autogon Docs
  • Change Log
  • Get Started
  • Libraries
  • Slicing & Indexing
  • Autogon Engine (Studio)
    • Data Processing
      • Data Input (DP_1)
      • Automated Data Processing (DP_ADP)
      • Missing Data (DP_2)
      • Data Encoding (DP_3)
      • Data Split (DP_4)
      • Feature Scaling (DP_5)
      • Drop Columns (DP_6)
      • Time Stepper (DP_7)
      • Parse Datetime (DP_PDT)
      • Reorder Columns (DP_ROC)
      • Feature Sampling (DP_FSP)
      • Reshape Array (DP_RSH)
      • Column Astype (DP_ASP)
      • Show Duplicates (DP_SDC)
      • Drop Duplicates (DP_DRD)
      • Scalar to Ndarray (DP_STN)
      • Image to Ndarray (DP_ITN)
      • Dataset Info (DP_INF)
      • Dataset Correlations (DP_CRR)
      • Dataset Description (DP_DSC)
      • Dataset Datatypes (DP_DTY)
      • Dataset Uniques (DP_UNQ)
      • Dataset Stats Counts (DP_STC)
      • Principal Component Analysis (DP_PCA)
      • Text Vectorizer (DP_VEC)
      • Resampler (DP_RES)
    • Data Visualization
      • Scatter Plots (DP_SCP)
      • Ordinary Plots (DP_ORD)
      • Compare Scatter Plots (DP_CSP)
      • Pie Plots (DP_PIE)
      • Heatmap Plots (DP_HMP)
    • Machine Learning
      • Simple Linear Regression (ML_R_1)
      • Multiple Linear Regression (ML_R_2)
      • Polynomial Linear Regression (ML_R_3)
      • Support Vector Regression (ML_R_4)
      • Decision Tree Regression (ML_R_5)
      • Random Forest Regression (ML_R_6)
      • Logistic Regression (ML_CN_1)
      • K-Nearest Neighbors - KNN (ML_CN_2)
      • Support Vector Machine (ML_CN_3)
      • Kernel SVM (ML_CN_4)
      • Naive Bayes (ML_CN_5)
      • Decision Tree Classification (ML_CN_6)
      • Random Forest Classification (ML_CN_7)
      • Hierarchical Clustering (ML_CG_1)
      • K-Means Clustering (ML_CG_2)
      • XGBoost (MS_XGBOOST)
      • Grid Search (ML_GRID)
      • Shap Explain (ML_SHAP)
      • Isolation Forest (ML_ISF)
      • (ML_DBS)
    • Automated Machine Learning
      • AutoRegression (AUTO_R_1)
      • AutoClassification (AUTO_CN_1)
      • AutoRegression II (AUTO_R_2)
    • Deep Learning
      • Artificial Neural Network (DL_ANN)
      • Self Organizing Maps (DL_SOM)
      • Restricted Boltzmann Machine (DL_RBM)
    • Automated Deep Learning
      • Auto Image Classification (A_DL_IMC)
      • Auto Image Regression (A_DL_IMR)
      • Auto Text Classification (A_DL_TXC)
      • Auto Text Regression (A_DL_TXR)
      • Auto Structured Data Classification (A_DL_SDC)
      • Auto Structured Data Regression (A_DL_SDR)
      • General AutoDL Blocks (A_DL_ALL)
  • LabelCraft
    • Images, Annotations and Augmentation
    • Import and Export
    • Model Training and Prediction
  • Production APIs
    • Production Pipelines
  • Autogon Qore
    • Vision AI
    • Natural Language AI
      • Text Classification (Deprecated)
      • Text Summary (Deprecated)
      • Ask Your Data
      • Generate Synthetic Data
      • Speech To Text
      • Text To Speech
      • Sentiment Analyzer (Deprecated)
      • Conversation with Chatbot Agent
      • Conversational Interaction with GPT-4
      • Essay Marker
      • Resume Ranker
      • Translator
    • Voice Cloning
      • Create a Voice
      • Get Voices
      • Text-To-Speech
  • Other APIs
    • Project
      • List all projects
      • Create a New Project
      • Get Project Details
      • Delete a Project
    • Dataset
      • List all datasets
      • Create a Dataset
      • Get a Dataset
      • Updating a dataset
      • Delete a Dataset
      • Dataset Connection
      • Visualize Dataset
Powered by GitBook
On this page
  • Sample Request
  • Automated Data Preprocessing

Was this helpful?

  1. Autogon Engine (Studio)
  2. Data Processing

Automated Data Processing (DP_ADP)

This function automatically cleans and encodes supported data.

Automated data cleaning and pre-processing streamline the preparation of data for machine learning training. These techniques involve identifying and addressing missing values, outliers, and inconsistencies in the dataset, as well as standardizing and transforming features. By automating these tasks, data scientists can save time, ensure data quality, and enhance the performance and reliability of machine learning models.

Sample Request

{
    "project_id": 1,
    "parent_id": 0,
    "block_id": 1,
    "function_code": "DP_ADP",
    "args": {
        "clean": true,
        "dataset_type": "any",
        "le_thresh": 2
        "load_name": "generated",
        "ohe_thresh": 10,
        "save_name": "generated",
        "strategy_value": "mean",
        "test_size_value": 0.25,
        "x_slice": ":-1",
        "y_slice": "-1"
    }
}

Automated Data Preprocessing

POST https://api.autogon.ai/api/v1/engine/start

Request Body

Name
Type
Description

x_slice*

String | array

boundaries for the x dataset

y_slice*

String | array

boundaries for the y dataset

strategy_value

String

method of handling missing values. Check the Missing Data block

le_thresh

int

uniques threshold for label encoding

ohe_thresh

int

uniques threshold for one hot encoding

project_id*

int

current project ID

block_id*

int

current block ID

function_code*

String

block's function code

args*

object

block arguments

parent_id

int

previous block ID

excluded_columns*

array

Columns to ignore entirely

excluded_fillmissing_columns

array

Columns to ignore for filling in missing data only

excluded_encoding_columns

array

Columns to ignore for encoding only

excluded_scaling_columns

array

Columns to ignore for scaling only

save_name*

String

name to save processing models with

load_name*

String

name to load processing models with. Used to switch to loading mode

dataset_type

String

type of dataset being processed with loaded weights. load_name required

clean

bool

set's wether or not to drop duplicates during loading mode load_name required

{
    "status": "true",
    "message": {
        "id": 1,
        "project": 2,
        "block_id": 1,
        "parent_id": 0,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
// Some code
await client.data_input(1, 0, 1, {
    dburl: "https://raw.githubusercontent.com/autogonai/autogon-public-datasets/main/mobile_price_prediction.csv" ,
    file_type: "csv"
})

Good to know: Unlike other block requests, the Data Input block isn't permitted to have parent blocks, hence its null value.

PreviousData Input (DP_1)NextMissing Data (DP_2)

Last updated 1 year ago

Was this helpful?