[Python] 데이터 분할 :: train_test_split()

1 분 소요

데이터 분할 :: train_test_split( )

from sklearn.model_selection import train_test_split
train_test_split(
	arrays,                             # list, Numpy array, dataframe, etc...
  test_size = 0.25,                   # float : 0.0 ~ 1.0 | int : 샘플의 개수
  train_size = test_size를 제외한 나머지, # float : 0.0 ~ 1.0 | int : 샘플의 개수
  random_state = None,
  shuffle = True,
  stratify : None
)

Option
- arrays : 분할시킬 데이터를 입력
- test_size : 테스트 데이터셋의 비율(float) or 갯수(int)
- train_size : 학습 데이터셋의 비율(float) or 갯수(int)
- random_state : 데이터 분할시 셔플이 이루어지는데 이를 위한 시드값
- shuffle : 분할 하기 전 데이터 섞을지 여부
- stratify : 지정한 Data의 비율을 유지

import numpy as np
from sklearn.model_selection import train_test_split # 데이터 분할을 위한 임포트
# 데이터 생성
X, y = np.arange(10).reshape((5, 2)), range(5)

print(f"X :\n{X}\n--------\ny : \n{list(y)}\n")

# 데이터 분할
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,    # test 비율을 20%
    random_state=42)

# 결과 확인
print(f"X_train :\n{X_train}\n--------\nX_test : \n{X_test}\n--------\ny_train : \n{y_train}\n--------\ny_test : \n{y_test}")

X :
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
--------
y : 
[0, 1, 2, 3, 4]

X_train :
[[8 9]
 [4 5]
 [0 1]
 [6 7]]
--------
X_test : 
[[2 3]]
--------
y_train : 
[4, 2, 0, 3]
--------
y_test : 
[1]

Twitter Facebook LinkedIn

Nada

[Python] 데이터 분할 :: train_test_split()

데이터 분할 :: train_test_split( )

공유하기

참고

[Python] df 행 반환

df 유니크 값 확인 :: df.nunique( )

[Python] df 열 반환

[Python] Series 차원 확인 :: Series.shape