# Automated Data Processing (DP\_ADP)

Automated data cleaning and pre-processing streamline the preparation of data for machine learning training. These techniques involve identifying and addressing missing values, outliers, and inconsistencies in the dataset, as well as standardizing and transforming features. By automating these tasks, data scientists can save time, ensure data quality, and enhance the performance and reliability of machine learning models.

## Sample Request

```javascript
{
    "project_id": 1,
    "parent_id": 0,
    "block_id": 1,
    "function_code": "DP_ADP",
    "args": {
        "clean": true,
        "dataset_type": "any",
        "le_thresh": 2
        "load_name": "generated",
        "ohe_thresh": 10,
        "save_name": "generated",
        "strategy_value": "mean",
        "test_size_value": 0.25,
        "x_slice": ":-1",
        "y_slice": "-1"
    }
}
```

## Automated Data Preprocessing

<mark style="color:green;">`POST`</mark> `https://api.autogon.ai/api/v1/engine/start`

#### Request Body

| Name                                                | Type            | Description                                                                                               |
| --------------------------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------- |
| x\_slice<mark style="color:red;">\*</mark>          | String \| array | boundaries for the x dataset                                                                              |
| y\_slice<mark style="color:red;">\*</mark>          | String \| array | boundaries for the y dataset                                                                              |
| strategy\_value                                     | String          | method of handling missing values. Check the Missing Data block                                           |
| le\_thresh                                          | int             | uniques threshold for label encoding                                                                      |
| ohe\_thresh                                         | int             | uniques threshold for one hot encoding                                                                    |
| project\_id<mark style="color:red;">\*</mark>       | int             | current project ID                                                                                        |
| block\_id<mark style="color:red;">\*</mark>         | int             | current block ID                                                                                          |
| function\_code<mark style="color:red;">\*</mark>    | String          | block's function code                                                                                     |
| args<mark style="color:red;">\*</mark>              | object          | block arguments                                                                                           |
| parent\_id                                          | int             | previous block ID                                                                                         |
| excluded\_columns<mark style="color:red;">\*</mark> | array           | Columns to ignore entirely                                                                                |
| excluded\_fillmissing\_columns                      | array           | Columns to ignore for filling in missing data only                                                        |
| excluded\_encoding\_columns                         | array           | Columns to ignore for encoding only                                                                       |
| excluded\_scaling\_columns                          | array           | Columns to ignore for scaling only                                                                        |
| save\_name<mark style="color:red;">\*</mark>        | String          | name to save processing models with                                                                       |
| load\_name<mark style="color:red;">\*</mark>        | String          | name to load processing models with. Used to switch to loading mode                                       |
| dataset\_type                                       | String          | <p>type of dataset being processed with loaded weights.<br><br><code>load\_name</code> required</p>       |
| clean                                               | bool            | <p>set's wether or not to drop duplicates during loading mode<br><br><code>load\_name</code> required</p> |

{% tabs %}
{% tab title="200: OK Data Input Successful" %}

```javascript
{
    "status": "true",
    "message": {
        "id": 1,
        "project": 2,
        "block_id": 1,
        "parent_id": 0,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": ""
    }
}
```

{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Python" %}

```
// Some code
```

{% endtab %}

{% tab title="Node" %}

```javascript
await client.data_input(1, 0, 1, {
    dburl: "https://raw.githubusercontent.com/autogonai/autogon-public-datasets/main/mobile_price_prediction.csv" ,
    file_type: "csv"
})
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
**Good to know:** Unlike other block requests, the **Data Input** block isn't permitted to have parent blocks, hence its `null` value.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.autogon.ai/autogon-engine-studio/data-processing/automated-data-processing-dp_adp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
