Autogon Docs
  • Change Log
  • Get Started
  • Libraries
  • Slicing & Indexing
  • Autogon Engine (Studio)
    • Data Processing
      • Data Input (DP_1)
      • Automated Data Processing (DP_ADP)
      • Missing Data (DP_2)
      • Data Encoding (DP_3)
      • Data Split (DP_4)
      • Feature Scaling (DP_5)
      • Drop Columns (DP_6)
      • Time Stepper (DP_7)
      • Parse Datetime (DP_PDT)
      • Reorder Columns (DP_ROC)
      • Feature Sampling (DP_FSP)
      • Reshape Array (DP_RSH)
      • Column Astype (DP_ASP)
      • Show Duplicates (DP_SDC)
      • Drop Duplicates (DP_DRD)
      • Scalar to Ndarray (DP_STN)
      • Image to Ndarray (DP_ITN)
      • Dataset Info (DP_INF)
      • Dataset Correlations (DP_CRR)
      • Dataset Description (DP_DSC)
      • Dataset Datatypes (DP_DTY)
      • Dataset Uniques (DP_UNQ)
      • Dataset Stats Counts (DP_STC)
      • Principal Component Analysis (DP_PCA)
      • Text Vectorizer (DP_VEC)
      • Resampler (DP_RES)
    • Data Visualization
      • Scatter Plots (DP_SCP)
      • Ordinary Plots (DP_ORD)
      • Compare Scatter Plots (DP_CSP)
      • Pie Plots (DP_PIE)
      • Heatmap Plots (DP_HMP)
    • Machine Learning
      • Simple Linear Regression (ML_R_1)
      • Multiple Linear Regression (ML_R_2)
      • Polynomial Linear Regression (ML_R_3)
      • Support Vector Regression (ML_R_4)
      • Decision Tree Regression (ML_R_5)
      • Random Forest Regression (ML_R_6)
      • Logistic Regression (ML_CN_1)
      • K-Nearest Neighbors - KNN (ML_CN_2)
      • Support Vector Machine (ML_CN_3)
      • Kernel SVM (ML_CN_4)
      • Naive Bayes (ML_CN_5)
      • Decision Tree Classification (ML_CN_6)
      • Random Forest Classification (ML_CN_7)
      • Hierarchical Clustering (ML_CG_1)
      • K-Means Clustering (ML_CG_2)
      • XGBoost (MS_XGBOOST)
      • Grid Search (ML_GRID)
      • Shap Explain (ML_SHAP)
      • Isolation Forest (ML_ISF)
      • (ML_DBS)
    • Automated Machine Learning
      • AutoRegression (AUTO_R_1)
      • AutoClassification (AUTO_CN_1)
      • AutoRegression II (AUTO_R_2)
    • Deep Learning
      • Artificial Neural Network (DL_ANN)
      • Self Organizing Maps (DL_SOM)
      • Restricted Boltzmann Machine (DL_RBM)
    • Automated Deep Learning
      • Auto Image Classification (A_DL_IMC)
      • Auto Image Regression (A_DL_IMR)
      • Auto Text Classification (A_DL_TXC)
      • Auto Text Regression (A_DL_TXR)
      • Auto Structured Data Classification (A_DL_SDC)
      • Auto Structured Data Regression (A_DL_SDR)
      • General AutoDL Blocks (A_DL_ALL)
  • LabelCraft
    • Images, Annotations and Augmentation
    • Import and Export
    • Model Training and Prediction
  • Production APIs
    • Production Pipelines
  • Autogon Qore
    • Vision AI
    • Natural Language AI
      • Text Classification (Deprecated)
      • Text Summary (Deprecated)
      • Ask Your Data
      • Generate Synthetic Data
      • Speech To Text
      • Text To Speech
      • Sentiment Analyzer (Deprecated)
      • Conversation with Chatbot Agent
      • Conversational Interaction with GPT-4
      • Essay Marker
      • Resume Ranker
      • Translator
    • Voice Cloning
      • Create a Voice
      • Get Voices
      • Text-To-Speech
  • Other APIs
    • Project
      • List all projects
      • Create a New Project
      • Get Project Details
      • Delete a Project
    • Dataset
      • List all datasets
      • Create a Dataset
      • Get a Dataset
      • Updating a dataset
      • Delete a Dataset
      • Dataset Connection
      • Visualize Dataset
Powered by GitBook
On this page
  • Sample Request
  • Splitting Data
  • Splits data

Was this helpful?

  1. Autogon Engine (Studio)
  2. Data Processing

Data Split (DP_4)

This functionality splits data into two subsets: a training set and a test set. The training set is used to train a model, while the test set is used to evaluate its performance.

This process of separating data into two sets is a crucial step in the process of developing and evaluating machine learning models. It ensures that the model is able to generalize well to new, unseen data, and it also allows for a more accurate assessment of the model's performance.

This functionality of splitting data into training and test sets is widely used in the field of machine learning and data science.

Sample Request

This request uses the mean strategy to fill in missing values in the second column to the end with the mean values of the X variable.

{
    "project_id": 1,
    "parent_id": 3,
    "block_id": 4,
    "function_code": "DP_4",
    "args": {
        "test_size": 0.3,
        "random_state": 0
    }
}

Splitting Data

Splits data

POST https://autogon.ai/api/v1/engine/start

Splitting data into training and test data.

Request Body

Name
Type
Description

project_id*

int

current project ID

parent_id*

int

parent block ID

block_id*

int

current block ID

function_code*

String

block's function code

args

object

block arguments

test_size

float/int

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size.

random_state

int

Controls the shuffling applied to the data before applying the split.

{
    "status": "true",
    "message": {
        "id": 4,
        "project": 1,
        "block_id": 4,
        "parent_id": 3,
        "dataset_url": "",
        "x_value_url": "",
        "y_value_url": "",
        "x_train_url": "",
        "y_train_url": "",
        "x_test_url": "",
        "y_test_url": ""
    }
}

// Some code
// Some code
PreviousData Encoding (DP_3)NextFeature Scaling (DP_5)

Last updated 2 years ago

Was this helpful?