Automated Data Processing (DP_ADP)

This function automatically cleans and encodes supported data.

Automated data cleaning and pre-processing streamline the preparation of data for machine learning training. These techniques involve identifying and addressing missing values, outliers, and inconsistencies in the dataset, as well as standardizing and transforming features. By automating these tasks, data scientists can save time, ensure data quality, and enhance the performance and reliability of machine learning models.

Sample Request

{
    "project_id": 1,
    "parent_id": 0,
    "block_id": 1,
    "function_code": "DP_ADP",
    "args": {
        "clean": true,
        "dataset_type": "any",
        "le_thresh": 2
        "load_name": "generated",
        "ohe_thresh": 10,
        "save_name": "generated",
        "strategy_value": "mean",
        "test_size_value": 0.25,
        "x_slice": ":-1",
        "y_slice": "-1"
    }
}

Automated Data Preprocessing

POST https://api.autogon.ai/api/v1/engine/start

Request Body

Name

Type

Description

x_slice*

String | array

boundaries for the x dataset

y_slice*

String | array

boundaries for the y dataset

strategy_value

String

method of handling missing values. Check the Missing Data block

le_thresh

int

uniques threshold for label encoding

ohe_thresh

int

uniques threshold for one hot encoding

project_id*

int

current project ID

block_id*

int

current block ID

function_code*

String

block's function code

args*

object

block arguments

parent_id

int

previous block ID

excluded_columns*

array

Columns to ignore entirely

excluded_fillmissing_columns

array

Columns to ignore for filling in missing data only

excluded_encoding_columns

array

Columns to ignore for encoding only

excluded_scaling_columns

array

Columns to ignore for scaling only

save_name*

String

name to save processing models with

load_name*

String

name to load processing models with. Used to switch to loading mode

dataset_type

String

type of dataset being processed with loaded weights. load_name required

clean

bool

set's wether or not to drop duplicates during loading mode load_name required

{
    "status": "true",
    "message": {
        "id": 1,
        "project": 2,
        "block_id": 1,
        "parent_id": 0,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}

// Some code

await client.data_input(1, 0, 1, {
    dburl: "https://raw.githubusercontent.com/autogonai/autogon-public-datasets/main/mobile_price_prediction.csv" ,
    file_type: "csv"
})

Good to know: Unlike other block requests, the Data Input block isn't permitted to have parent blocks, hence its null value.

PreviousData Input (DP_1)NextMissing Data (DP_2)

Last updated 1 year ago

Was this helpful?