[Sklearn] 데이터 셋 (dataset)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

라일락 꽃이 피는 날

[Sklearn] 데이터 셋 (dataset) 본문

데이터 분석/Python

[Sklearn] 데이터 셋 (dataset)

eunki 2021. 5. 13. 19:22

728x90

데이터 셋 (dataset)

DESCR: dataset 정보

data: feature data

feature_names: feature data의 컬럼 이름

target: label data (수치형)

target_names: label의 이름 (문자형)

from sklearn.datasets import load_iris
iris = load_iris()

data = iris['data']
feature_names = iris['feature_names']
target = iris['target']

데이터프레임 생성

df_iris = pd.DataFrame(data, columns=feature_names)
df_iris['target'] = target

train / validation 세트 나누기

from sklearn.model_selection import train_test_split
x_train, x_valid, y_train, y_valid = train_test_split(df_iris.drop('target', 1), df_iris['target'])

x_train.shape, y_train.shape  # ((112, 4), (112,))
x_valid.shape, y_valid.shape  # ((38, 4), (38,))

stratify: label 클래스의 분포를 균등하게 배분

x_train, x_valid, y_train, y_valid = train_test_split(df_iris.drop('target', 1), df_iris['target'], stratify=df_iris['target'])

728x90

'데이터 분석 > Python' 카테고리의 다른 글

[Sklearn] 오차 (Error) (0)	2021.05.13
[Sklearn] 분류 (classification) (0)	2021.05.13
[Sklearn] 전처리 (pre-processing) (0)	2021.05.11
[Sklearn] Training Set, Test Set (0)	2021.05.11
인공지능, 머신러닝, 딥러닝 (0)	2021.05.11

'데이터 분석/Python' Related Articles

라일락 꽃이 피는 날

[Sklearn] 데이터 셋 (dataset) 본문

[Sklearn] 데이터 셋 (dataset)

'데이터 분석 > Python' 카테고리의 다른 글

티스토리툴바