# Automated Data Processing (DP\_ADP)

Automated data cleaning and pre-processing streamline the preparation of data for machine learning training. These techniques involve identifying and addressing missing values, outliers, and inconsistencies in the dataset, as well as standardizing and transforming features. By automating these tasks, data scientists can save time, ensure data quality, and enhance the performance and reliability of machine learning models.

## Sample Request

```javascript
{
    "project_id": 1,
    "parent_id": 0,
    "block_id": 1,
    "function_code": "DP_ADP",
    "args": {
        "clean": true,
        "dataset_type": "any",
        "le_thresh": 2
        "load_name": "generated",
        "ohe_thresh": 10,
        "save_name": "generated",
        "strategy_value": "mean",
        "test_size_value": 0.25,
        "x_slice": ":-1",
        "y_slice": "-1"
    }
}
```

## Automated Data Preprocessing

<mark style="color:green;">`POST`</mark> `https://api.autogon.ai/api/v1/engine/start`

#### Request Body

| Name                                                | Type            | Description                                                                                               |
| --------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------- |
| x\_slice<mark style="color:red;">\*</mark>          | String \| array | boundaries for the x dataset                                                                              |
| y\_slice<mark style="color:red;">\*</mark>          | String \| array | boundaries for the y dataset                                                                              |
| strategy\_value                                     | String          | method of handling missing values. Check the Missing Data block                                           |
| le\_thresh                                          | int             | uniques threshold for label encoding                                                                      |
| ohe\_thresh                                         | int             | uniques threshold for one hot encoding                                                                    |
| project\_id<mark style="color:red;">\*</mark>       | int             | current project ID                                                                                        |
| block\_id<mark style="color:red;">\*</mark>         | int             | current block ID                                                                                          |
| function\_code<mark style="color:red;">\*</mark>    | String          | block's function code                                                                                     |
| args<mark style="color:red;">\*</mark>              | object          | block arguments                                                                                           |
| parent\_id                                          | int             | previous block ID                                                                                         |
| excluded\_columns<mark style="color:red;">\*</mark> | array           | Columns to ignore entirely                                                                                |
| excluded\_fillmissing\_columns                      | array           | Columns to ignore for filling in missing data only                                                        |
| excluded\_encoding\_columns                         | array           | Columns to ignore for encoding only                                                                       |
| excluded\_scaling\_columns                          | array           | Columns to ignore for scaling only                                                                        |
| save\_name<mark style="color:red;">\*</mark>        | String          | name to save processing models with                                                                       |
| load\_name<mark style="color:red;">\*</mark>        | String          | name to load processing models with. Used to switch to loading mode                                       |
| dataset\_type                                       | String          | <p>type of dataset being processed with loaded weights.<br><br><code>load\_name</code> required</p>       |
| clean                                               | bool            | <p>set's wether or not to drop duplicates during loading mode<br><br><code>load\_name</code> required</p> |

{% tabs %}
{% tab title="200: OK Data Input Successful" %}

```javascript
{
    "status": "true",
    "message": {
        "id": 1,
        "project": 2,
        "block_id": 1,
        "parent_id": 0,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Python" %}

```
// Some code
```

{% endtab %}

{% tab title="Node" %}

```javascript
await client.data_input(1, 0, 1, {
    dburl: "https://raw.githubusercontent.com/autogonai/autogon-public-datasets/main/mobile_price_prediction.csv" ,
    file_type: "csv"
})
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
**Good to know:** Unlike other block requests, the **Data Input** block isn't permitted to have parent blocks, hence its `null` value.
{% endhint %}
