<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>라일락 꽃이 피는 날</title>
    <link>https://eunki.tistory.com/</link>
    <description></description>
    <language>ko</language>
    <pubDate>Wed, 6 May 2026 17:29:27 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>eunki</managingEditor>
    <image>
      <title>라일락 꽃이 피는 날</title>
      <url>https://tistory1.daumcdn.net/tistory/4693414/attach/a560f9ae581648dfb6640a3306566981</url>
      <link>https://eunki.tistory.com</link>
    </image>
    <item>
      <title>[빅분기 실기] 연관규칙분석 (Association Rule)</title>
      <link>https://eunki.tistory.com/333</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;연관규칙분석 (Association Rule, Apriori Algorithm)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;대용량의 트랜잭션 데이터로부터 'X이면 Y이다' 라는 형식의 연관관계를 발견하는 기법이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;어떤 두 아이템 집합이 빈번히 발생하는가를 알려주는 일련의 규칙들을 생성하는 알고리즘이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;흔히 장바구니 분석(Market Basket Analysis) 이라고도 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;연관규칙을 수행하기 위해서는 거래 데이터의 형식으로 되어 있어야 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;지지도 (Support) : 전체 거래 건수 중에서 항목집합 X와 Y를 모두 포함하는 거래 건수의 비율&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; X와 Y를 모두 포함하는 거래 수 / 전체 거래 수 = n(X&amp;cap;Y) / N&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;신뢰도 (Confidence) : 항목집합 X를 포함하는 거래 중에서 항목집합 Y도 포함하는 거래 비율&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; X와 Y를 모두 포함하는 거래 수 / X가 포함된 거래 수 = n(X&amp;cap;Y) / n(X)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;향상도 (Lift) : 신뢰도 / P(Y)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 향상도가 절대값 1보다 크면 우수함을 의미하고, 1이면 X와 Y는 독립적이라는 것을 의미한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- min_support : 최소 지지도, 기준값 이상만 제시&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- min_confidence : 최소 신뢰도, 기준값 이상만 제시&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- min_lift : 최소 향상도, 기준값 이상만 제시&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653851791&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np 
import pandas as pd 

data = pd.read_csv('Market_Basket.csv', header = None)
data.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;211&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bXfPpy/btrE9Q5JCu6/nevzk7N9zzojnhWAOAOpDk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bXfPpy/btrE9Q5JCu6/nevzk7N9zzojnhWAOAOpDk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bXfPpy/btrE9Q5JCu6/nevzk7N9zzojnhWAOAOpDk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbXfPpy%2FbtrE9Q5JCu6%2Fnevzk7N9zzojnhWAOAOpDk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;211&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;211&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655653866739&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;transactions = []

for i in range(data.shape[0]):
    transactions.append([str(data[j][i]) for j in
                         range(data.shape[1]-data.isnull().sum(axis=1)[i])])&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;transaction data로 변환하기 위해서 transactions 이라는 빈 리스트를 만들어 놓고 케이스(행) 수만큼 for문을 수행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;transactions에 추가하면서 data에 있는 제품 품목을 담는다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;2. 모델 적용&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653897892&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from apyori import apriori

rules = apriori(transactions, min_support = 0.015, min_confidence = 0.2,  
                min_lift = 1, min_length = 1)
results = list(rules)

df=pd.DataFrame(results)
df&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;359&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/FwMoG/btrE8NVKqek/hQvgKpYxUXqO1wjVIiFUr0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/FwMoG/btrE8NVKqek/hQvgKpYxUXqO1wjVIiFUr0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/FwMoG/btrE8NVKqek/hQvgKpYxUXqO1wjVIiFUr0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FFwMoG%2FbtrE8NVKqek%2FhQvgKpYxUXqO1wjVIiFUr0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;607&quot; height=&quot;359&quot; data-origin-width=&quot;607&quot; data-origin-height=&quot;359&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;b&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;3. 연관품목의 시각화&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653996849&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;ar=(df.iloc[1:74]['items'])
ar&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;287&quot; data-origin-height=&quot;215&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cvuOhp/btrE8NIbKEP/y4iB8M2RKVnsVaVyPbboz1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cvuOhp/btrE8NIbKEP/y4iB8M2RKVnsVaVyPbboz1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cvuOhp/btrE8NIbKEP/y4iB8M2RKVnsVaVyPbboz1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcvuOhp%2FbtrE8NIbKEP%2Fy4iB8M2RKVnsVaVyPbboz1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;287&quot; height=&quot;215&quot; data-origin-width=&quot;287&quot; data-origin-height=&quot;215&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;78개 규칙 중 74개만 뽑아 그래프로 표현한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655654068500&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import matplotlib.pyplot as plt
from matplotlib import font_manager
import networkx as nx
from networkx.drawing.nx_pydot import graphviz_layout

df = pd.DataFrame(list(ar), columns=['FROM', 'TO'])
G = nx.from_pandas_edgelist(df, source = 'FROM', target = 'TO')

# 한글 폰트 설정
ko_font_location = &quot;C:/Windows/Fonts/malgun.ttf&quot;
ko_font_name = font_manager.FontProperties(fname=ko_font_location).get_name()&lt;/code&gt;&lt;/pre&gt;
&lt;pre id=&quot;code_1655654080055&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 품목 연관 시각화
plt.figure(figsize=(10,10)) 
nx.draw_kamada_kawai(G)
pos=nx.kamada_kawai_layout(G)

nx.draw_networkx_labels(G, pos, font_family=ko_font_name, font_size=10, font_color='black')
nx.draw_networkx_nodes(G, pos, node_color='orange', node_size=2000, alpha=1)

plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;596&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cIS3ec/btrE7xdt0ti/P9KYTY59pK3NPwnDcBplzk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cIS3ec/btrE7xdt0ti/P9KYTY59pK3NPwnDcBplzk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cIS3ec/btrE7xdt0ti/P9KYTY59pK3NPwnDcBplzk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcIS3ec%2FbtrE7xdt0ti%2FP9KYTY59pK3NPwnDcBplzk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;596&quot; height=&quot;596&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;596&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/333</guid>
      <comments>https://eunki.tistory.com/333#entry333comment</comments>
      <pubDate>Mon, 20 Jun 2022 00:55:05 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] DBSCAN</title>
      <link>https://eunki.tistory.com/332</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;DBSCAN&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;밀도기반 클러스터링 기법&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;케이스가 집중되어 있는 밀도에 초점을 두어 밀도가 높은 그룹을 클러스터링 하는 방식이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;중심점을 기준으로 특정한 반경 이내에 케이스가 n개 이상 있을 경우 하나의 군집을 형성한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;이상값을 탐지하는 데에 많이 사용된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- eps (&amp;epsilon;, epsilon) : 근접 이웃점을 찾기 위해 정의 내려야 하는 반경 거리&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- min_samples (minimum amount of points) : 하나의 군집을 형성하기 위해 필요한 최소 케이스 수&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[데이터의 케이스(포인트)]&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- Core point : &amp;epsilon; 반경 내에 최소점(minPts) 이상을 갖는 점&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- Border point : Core point 의 &amp;epsilon; 반경 내에 있으나, 그 자체로는 최소점(minPts)&amp;nbsp;을 갖지 못하는 점&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- Noise point : Core point 도 아니고 Border point 도 아닌 점&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653479898&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

iris = pd.read_csv(&quot;iris.csv&quot;)
iris_data=iris[iris.columns[0:4]]
iris_data.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;360&quot; data-origin-height=&quot;162&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdGUkV/btrFaLWBmLp/ukPTcJ7mRBDYSra80bd621/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdGUkV/btrFaLWBmLp/ukPTcJ7mRBDYSra80bd621/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdGUkV/btrFaLWBmLp/ukPTcJ7mRBDYSra80bd621/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdGUkV%2FbtrFaLWBmLp%2FukPTcJ7mRBDYSra80bd621%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;360&quot; height=&quot;162&quot; data-origin-width=&quot;360&quot; data-origin-height=&quot;162&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653551761&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, metric='euclidean', min_samples=5)
dbscan.fit(iris_data)&lt;/code&gt;&lt;/pre&gt;
&lt;pre id=&quot;code_1655653564903&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;dbscan.labels_&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;529&quot; data-origin-height=&quot;178&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bbciTQ/btrE768632i/kAmJ4lJzZ8x0cPl2sKyVLk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bbciTQ/btrE768632i/kAmJ4lJzZ8x0cPl2sKyVLk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bbciTQ/btrE768632i/kAmJ4lJzZ8x0cPl2sKyVLk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbbciTQ%2FbtrE768632i%2FkAmJ4lJzZ8x0cPl2sKyVLk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;529&quot; height=&quot;178&quot; data-origin-width=&quot;529&quot; data-origin-height=&quot;178&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;설정한 기준에 따라 2개의 군집(0, 1) 으로 나타났다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;모델 기준에서 이상치로 파악된 것은 -1 로 표현된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655653583464&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred=dbscan.fit_predict(iris_data)
pred=pd.DataFrame(pred)
pred.columns=['predict']

match_data = pd.concat([iris,pred],axis=1)
match_data.tail()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;495&quot; data-origin-height=&quot;160&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/u1Pc4/btrE75vxgpk/aGcKwudkMpTryxsvXAtis0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/u1Pc4/btrE75vxgpk/aGcKwudkMpTryxsvXAtis0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/u1Pc4/btrE75vxgpk/aGcKwudkMpTryxsvXAtis0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fu1Pc4%2FbtrE75vxgpk%2FaGcKwudkMpTryxsvXAtis0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;495&quot; height=&quot;160&quot; data-origin-width=&quot;495&quot; data-origin-height=&quot;160&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655653653929&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;cross = pd.crosstab(match_data['class'],match_data['predict'])
cross&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;185&quot; data-origin-height=&quot;140&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pcCUJ/btrFaNOruf4/eqMgtZICPNUEiXRbuJfHEk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pcCUJ/btrFaNOruf4/eqMgtZICPNUEiXRbuJfHEk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pcCUJ/btrFaNOruf4/eqMgtZICPNUEiXRbuJfHEk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpcCUJ%2FbtrFaNOruf4%2FeqMgtZICPNUEiXRbuJfHEk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;185&quot; height=&quot;140&quot; data-origin-width=&quot;185&quot; data-origin-height=&quot;140&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;setosa 에서 1개, versicolor 에서 6개, virginica 에서 10개의 이상치가 발견되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;반경을 좀 더 넓히거나 단위의 표준화 과정을 통해 이상치를 줄여야 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 시각화 (Visualization)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655653673727&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.decomposition import PCA

pca = PCA(n_components=2).fit(iris_data)
pca_2d = pca.transform(iris_data)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;변수가 4개이므로 4차원의 공간에 데이터가 위치해 있겠지만 4차원을 표현하지 못한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;따라서 2차원으로 차원을 축소해서 보아야 한다.&amp;nbsp; (n_components=2)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655653679264&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;for i in range(0, pca_2d.shape[0]):
    if dbscan.labels_[i] == 0:
        c1 = plt.scatter(pca_2d[i,0],pca_2d[i,1],c='r', marker='+')
    elif dbscan.labels_[i] == 1:
        c2 = plt.scatter(pca_2d[i,0],pca_2d[i,1],c='g', marker='o')
    elif dbscan.labels_[i] == -1:
        c3 = plt.scatter(pca_2d[i,0],pca_2d[i,1],c='b', marker='*')
        
plt.legend([c1, c2, c3], ['Cluster 1', 'Cluster 2', 'Noise'])
plt.title('DBSCAN finds 2 clusters and noise')
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;381&quot; data-origin-height=&quot;265&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bHTSm1/btrE7CUFQND/yK6r5MhpogPtS41Km6Kok0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bHTSm1/btrE7CUFQND/yK6r5MhpogPtS41Km6Kok0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bHTSm1/btrE7CUFQND/yK6r5MhpogPtS41Km6Kok0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbHTSm1%2FbtrE7CUFQND%2FyK6r5MhpogPtS41Km6Kok0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;381&quot; height=&quot;265&quot; data-origin-width=&quot;381&quot; data-origin-height=&quot;265&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/332</guid>
      <comments>https://eunki.tistory.com/332#entry332comment</comments>
      <pubDate>Mon, 20 Jun 2022 00:48:20 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 군집 분석 (Cluster Analysis)</title>
      <link>https://eunki.tistory.com/331</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;군집&amp;nbsp;분석&amp;nbsp;(Cluster&amp;nbsp;Analysis)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들의&amp;nbsp;특성을&amp;nbsp;대표하는&amp;nbsp;몇&amp;nbsp;개의&amp;nbsp;변수들을&amp;nbsp;기준으로&amp;nbsp;몇&amp;nbsp;개의&amp;nbsp;군집으로&amp;nbsp;세분화하는&amp;nbsp;방법&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들을&amp;nbsp;다양한&amp;nbsp;변수를&amp;nbsp;기준으로&amp;nbsp;다차원&amp;nbsp;공간에서&amp;nbsp;유사한&amp;nbsp;특성을&amp;nbsp;가진&amp;nbsp;개체로&amp;nbsp;묶는다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들&amp;nbsp;간의&amp;nbsp;유사성은&amp;nbsp;개체&amp;nbsp;간의&amp;nbsp;거리를&amp;nbsp;사용하고,&amp;nbsp;거리가&amp;nbsp;상대적으로&amp;nbsp;가까운&amp;nbsp;개체들을&amp;nbsp;동일&amp;nbsp;군집으로&amp;nbsp;묶는다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체&amp;nbsp;간의&amp;nbsp;거리를&amp;nbsp;행렬을&amp;nbsp;이용하여&amp;nbsp;계산한다.&amp;nbsp;대표적으로&amp;nbsp;유클리디안&amp;nbsp;거리를&amp;nbsp;계산한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- n_cluster : 군집의 수&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 군집분석 (Clustering)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655651705355&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm
from sklearn.cluster import KMeans

plt.style.use('seaborn-deep')
cmap = matplotlib.cm.get_cmap('plasma')&lt;/code&gt;&lt;/pre&gt;
&lt;pre id=&quot;code_1655651790443&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;data = pd.read_csv('Mall_Customers.csv')
X = data.iloc[:, [3,4]]
X.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;138&quot; data-origin-height=&quot;163&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ba5Z4o/btrE9RwOb0S/DBKNTkVZstH9vD0F51VjJK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ba5Z4o/btrE9RwOb0S/DBKNTkVZstH9vD0F51VjJK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ba5Z4o/btrE9RwOb0S/DBKNTkVZstH9vD0F51VjJK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fba5Z4o%2FbtrE9RwOb0S%2FDBKNTkVZstH9vD0F51VjJK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;138&quot; height=&quot;163&quot; data-origin-width=&quot;138&quot; data-origin-height=&quot;163&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655651874610&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;wcss = []

for i in range(1,21):
    kmeans = KMeans(n_clusters=i)
    kmeans.fit_transform(X)
    wcss.append(kmeans.inertia_)&lt;/code&gt;&lt;/pre&gt;
&lt;pre id=&quot;code_1655651882803&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;wcss&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;146&quot; data-origin-height=&quot;347&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c2x3Rf/btrFe4PbCxN/v6OgGPgknUuqHcgBsSpeS0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c2x3Rf/btrFe4PbCxN/v6OgGPgknUuqHcgBsSpeS0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c2x3Rf/btrFe4PbCxN/v6OgGPgknUuqHcgBsSpeS0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc2x3Rf%2FbtrFe4PbCxN%2Fv6OgGPgknUuqHcgBsSpeS0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;146&quot; height=&quot;347&quot; data-origin-width=&quot;146&quot; data-origin-height=&quot;347&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;통계 기준으로 최적의 군집수를 찾기 위한 과정이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;kmeans.inertia_ : 군집의 중심과 각 케이스(개체) 간에 거리를 계산한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;중심과 개체 간의 거리가 작아진다는 것은 그만큼 군집이 잘 형성이 되었다는 것이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652091943&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;plt.figure()
plt.plot(range(1,21), wcss)

plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')

plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;408&quot; data-origin-height=&quot;279&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ccYWJu/btrE9RXPsLD/qQuBVfrglE0OLHXt96WvY1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ccYWJu/btrE9RXPsLD/qQuBVfrglE0OLHXt96WvY1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ccYWJu/btrE9RXPsLD/qQuBVfrglE0OLHXt96WvY1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FccYWJu%2FbtrE9RXPsLD%2FqQuBVfrglE0OLHXt96WvY1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;408&quot; height=&quot;279&quot; data-origin-width=&quot;408&quot; data-origin-height=&quot;279&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집수가 늘어날수록 수치가 작아지고 있다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;수치가 크게 감소하다가 변화가 없는 지접에서 최적의 군집수 k를 결정한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652145074&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;k = 5
kmeans = KMeans(n_clusters = k)

y_kmeans = kmeans.fit_predict(X)
y_kmeans&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;516&quot; data-origin-height=&quot;178&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bk5XWF/btrE7xR4QiD/KkPdfsdE7lkEjCJwMiMC71/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bk5XWF/btrE7xR4QiD/KkPdfsdE7lkEjCJwMiMC71/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bk5XWF/btrE7xR4QiD/KkPdfsdE7lkEjCJwMiMC71/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbk5XWF%2FbtrE7xR4QiD%2FKkPdfsdE7lkEjCJwMiMC71%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;516&quot; height=&quot;178&quot; data-origin-width=&quot;516&quot; data-origin-height=&quot;178&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652218360&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;Group_cluster=pd.DataFrame(y_kmeans)
Group_cluster.columns=['Group']

full_data=pd.concat([data, Group_cluster], axis=1)
full_data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;322&quot; data-origin-height=&quot;348&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nrt4v/btrE8SPeDJh/AS4tS2SOeenEiaYotQ8MKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nrt4v/btrE8SPeDJh/AS4tS2SOeenEiaYotQ8MKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nrt4v/btrE8SPeDJh/AS4tS2SOeenEiaYotQ8MKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fnrt4v%2FbtrE8SPeDJh%2FAS4tS2SOeenEiaYotQ8MKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;322&quot; height=&quot;348&quot; data-origin-width=&quot;322&quot; data-origin-height=&quot;348&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652351363&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;kmeans_pred = KMeans(n_clusters=k, random_state=42).fit(X)
kmeans_pred.cluster_centers_&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;252&quot; data-origin-height=&quot;89&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ccSKGh/btrE8Tm4LRt/zFSuR5z6pp0PrO4piZNsp0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ccSKGh/btrE8Tm4LRt/zFSuR5z6pp0PrO4piZNsp0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ccSKGh/btrE8Tm4LRt/zFSuR5z6pp0PrO4piZNsp0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FccSKGh%2FbtrE8Tm4LRt%2FzFSuR5z6pp0PrO4piZNsp0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;252&quot; height=&quot;89&quot; data-origin-width=&quot;252&quot; data-origin-height=&quot;89&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;cluster_centers_ : 각 좌표의 중심점 결과가 담겨져 있는 객체&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집의 중심좌표로 군집의 특성을 살펴보고, 이들 간에 직관적으로 명확한 분리가 되어야 제대로 된 군집수를 나눈 것으로 본다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652488433&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;labels = [('Cluster ' + str(i+1)) for i in range(k)]
X=np.array(X)

plt.figure()
for i in range(k):
    plt.scatter(X[y_kmeans == i, 0], X[y_kmeans == i, 1], s = 20,
                 c = cmap(i/k), label = labels[i])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;377&quot; data-origin-height=&quot;249&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rQ4ke/btrFchBvXaK/Ligq1HCkalkwDL5uc45pP0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rQ4ke/btrFchBvXaK/Ligq1HCkalkwDL5uc45pP0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rQ4ke/btrFchBvXaK/Ligq1HCkalkwDL5uc45pP0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrQ4ke%2FbtrFchBvXaK%2FLigq1HCkalkwDL5uc45pP0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;377&quot; height=&quot;249&quot; data-origin-width=&quot;377&quot; data-origin-height=&quot;249&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;2차원&amp;nbsp;도표에&amp;nbsp;산점도(scatter&amp;nbsp;plot)를&amp;nbsp;그린다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;for문으로&amp;nbsp;각&amp;nbsp;개체만큼&amp;nbsp;하나씩&amp;nbsp;y_kmeans에&amp;nbsp;분석하여&amp;nbsp;저장된&amp;nbsp;좌표값을&amp;nbsp;찍는다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;라벨에&amp;nbsp;따라&amp;nbsp;색을&amp;nbsp;다르게하기&amp;nbsp;위해&amp;nbsp;label&amp;nbsp;=&amp;nbsp;labels[i]&amp;nbsp;로&amp;nbsp;설정한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652553321&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
            s = 100, c = 'black', label = 'Centroids', marker = 'X')
            
plt.xlabel('Income')
plt.ylabel('Spend')
plt.title('Kmeans cluster plot')

plt.legend()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;383&quot; data-origin-height=&quot;279&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bk5axj/btrE7DeV5i9/YCl5ZtGxc84y1RhK5qYfjk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bk5axj/btrE7DeV5i9/YCl5ZtGxc84y1RhK5qYfjk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bk5axj/btrE7DeV5i9/YCl5ZtGxc84y1RhK5qYfjk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbk5axj%2FbtrE7DeV5i9%2FYCl5ZtGxc84y1RhK5qYfjk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;383&quot; height=&quot;279&quot; data-origin-width=&quot;383&quot; data-origin-height=&quot;279&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;중심좌표&amp;nbsp;결과(kmeans.cluster_centers_)를&amp;nbsp;이용하여&amp;nbsp;각&amp;nbsp;군집의&amp;nbsp;중심점만&amp;nbsp;찍는다. &lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652570278&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;plt.figure()
for i in range(k):
    plt.scatter(X[y_kmeans == i, 0], X[y_kmeans == i, 1], s = 20,
                 c = cmap(i/k), label = labels[i])
                 
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
            s = 100, c = 'black', label = 'Centroids', marker = 'X')
            
plt.xlabel('Income')
plt.ylabel('Spend')
plt.title('Kmeans cluster plot')

plt.legend()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cVO9AI/btrFcg3F6Te/Zw13oWCGoYNDXvZKtyeKDk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cVO9AI/btrFcg3F6Te/Zw13oWCGoYNDXvZKtyeKDk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cVO9AI/btrFcg3F6Te/Zw13oWCGoYNDXvZKtyeKDk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcVO9AI%2FbtrFcg3F6Te%2FZw13oWCGoYNDXvZKtyeKDk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;391&quot; height=&quot;279&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;각&amp;nbsp;개체의&amp;nbsp;좌표와&amp;nbsp;군집&amp;nbsp;중심의&amp;nbsp;좌표를&amp;nbsp;한&amp;nbsp;도표에&amp;nbsp;표현한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. K-mean Clustering&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;분석 속도가 빠르고 군집을 형성하면서 유연하게 다른 군집으로 다시 재군집화가 가능하다.&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655652619737&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from scipy.spatial.distance import cdist, pdist
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

iris = pd.read_csv(&quot;iris.csv&quot;)
print (iris.head())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;494&quot; data-origin-height=&quot;109&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cmSkXA/btrFdHGI2rI/zUglqALkekSFC6VM80UlUk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cmSkXA/btrFdHGI2rI/zUglqALkekSFC6VM80UlUk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cmSkXA/btrFdHGI2rI/zUglqALkekSFC6VM80UlUk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcmSkXA%2FbtrFdHGI2rI%2FzUglqALkekSFC6VM80UlUk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;494&quot; height=&quot;109&quot; data-origin-width=&quot;494&quot; data-origin-height=&quot;109&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;sepal_length : 꽃받침의 길이 / sepal_width&amp;nbsp;:&amp;nbsp;꽃받침의&amp;nbsp;넓이 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;petal_length : 꽃잎의 길이 / petal_width&amp;nbsp;:&amp;nbsp;꽃잎의&amp;nbsp;넓이 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;class&amp;nbsp;:&amp;nbsp;꽃의&amp;nbsp;종류&amp;nbsp;(3종류)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652640005&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;x_iris = iris.drop(['class'],axis=1)
y_iris = iris[&quot;class&quot;]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집분석은&amp;nbsp;레이블이&amp;nbsp;있으면&amp;nbsp;안&amp;nbsp;되기&amp;nbsp;때문에&amp;nbsp;class&amp;nbsp;변수는&amp;nbsp;drop&amp;nbsp;한다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;추후&amp;nbsp;분류가&amp;nbsp;잘&amp;nbsp;되어있는지&amp;nbsp;확인하기&amp;nbsp;위해&amp;nbsp;y_iris&amp;nbsp;에&amp;nbsp;담는다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652675075&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;x_iris.describe()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;383&quot; data-origin-height=&quot;240&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bdRbHO/btrFbNtWkzO/hgXlTkfnWRJP3OXxdBFol1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bdRbHO/btrFbNtWkzO/hgXlTkfnWRJP3OXxdBFol1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bdRbHO/btrFbNtWkzO/hgXlTkfnWRJP3OXxdBFol1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbdRbHO%2FbtrFbNtWkzO%2FhgXlTkfnWRJP3OXxdBFol1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;383&quot; height=&quot;240&quot; data-origin-width=&quot;383&quot; data-origin-height=&quot;240&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;4개의&amp;nbsp;변수&amp;nbsp;단위가&amp;nbsp;달라서&amp;nbsp;평균에&amp;nbsp;차이가&amp;nbsp;있고,&amp;nbsp;최댓값과&amp;nbsp;최솟값도&amp;nbsp;다르다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들&amp;nbsp;간의&amp;nbsp;거리를&amp;nbsp;계산할&amp;nbsp;때&amp;nbsp;단위&amp;nbsp;때문에&amp;nbsp;더&amp;nbsp;다르거나&amp;nbsp;비슷하게&amp;nbsp;될&amp;nbsp;수&amp;nbsp;있다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;따라서&amp;nbsp;데이터&amp;nbsp;단위의&amp;nbsp;정규화가&amp;nbsp;필요하다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652694779&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import StandardScaler

scale=StandardScaler()
scale.fit(x_iris)

X_scale=scale.transform(x_iris)
pd.DataFrame(X_scale).head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;299&quot; data-origin-height=&quot;162&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dxK16N/btrE7DsvLWs/2yQ1YUkuyIxvgWJgg41aL1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dxK16N/btrE7DsvLWs/2yQ1YUkuyIxvgWJgg41aL1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dxK16N/btrE7DsvLWs/2yQ1YUkuyIxvgWJgg41aL1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdxK16N%2FbtrE7DsvLWs%2F2yQ1YUkuyIxvgWJgg41aL1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;299&quot; height=&quot;162&quot; data-origin-width=&quot;299&quot; data-origin-height=&quot;162&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;데이터를&amp;nbsp;표준화하여&amp;nbsp;X_scale&amp;nbsp;이라는&amp;nbsp;데이터셋에&amp;nbsp;담고,&amp;nbsp;다시&amp;nbsp;기술&amp;nbsp;통계를&amp;nbsp;확인한다. &lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652717211&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;K = range(1,10)
KM = [KMeans(n_clusters=k).fit(X_scale) for k in K]

centroids = [k.cluster_centers_ for k in KM]
D_k = [cdist(x_iris, centrds, 'euclidean') for centrds in centroids]

cIdx = [np.argmin(D,axis=1) for D in D_k]
dist = [np.min(D,axis=1) for D in D_k]
avgWithinSS = [sum(d)/X_scale.shape[0] for d in dist]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;유클리디안(euclidean)&amp;nbsp;방법으로&amp;nbsp;거리를&amp;nbsp;계산하여&amp;nbsp;D_k에&amp;nbsp;담는다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;계산한&amp;nbsp;거리&amp;nbsp;중&amp;nbsp;최솟값(np.argmin)&amp;nbsp;을&amp;nbsp;cIdx&amp;nbsp;에&amp;nbsp;담고,&amp;nbsp;dist&amp;nbsp;로&amp;nbsp;최솟값을&amp;nbsp;구한다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;argWithinSS&amp;nbsp;은&amp;nbsp;군집의&amp;nbsp;중심과&amp;nbsp;개체들&amp;nbsp;간의&amp;nbsp;최소&amp;nbsp;거리들의&amp;nbsp;평균을&amp;nbsp;구하여&amp;nbsp;저장한&amp;nbsp;값이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652744168&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;wcss = [sum(d**2) for d in dist]
tss = sum(pdist(X_scale)**2)/X_scale.shape[0]
bss = tss-wcss&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;각&amp;nbsp;군집&amp;nbsp;내에&amp;nbsp;wcss&amp;nbsp;는&amp;nbsp;개체들간의&amp;nbsp;거리를&amp;nbsp;제곱하여&amp;nbsp;더한&amp;nbsp;값이다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;tss&amp;nbsp;는&amp;nbsp;전체&amp;nbsp;개체들간의&amp;nbsp;거리를&amp;nbsp;제곱하여&amp;nbsp;개체수로&amp;nbsp;나눈&amp;nbsp;값이다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;bss&amp;nbsp;로&amp;nbsp;wcss&amp;nbsp;와&amp;nbsp;tss&amp;nbsp;의&amp;nbsp;차이를&amp;nbsp;계산한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652762480&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;fig = plt.figure()
ax = fig.add_subplot(111)

ax.plot(K, avgWithinSS, 'b*-')

plt.grid(True)
plt.xlabel('Number of clusters')
plt.ylabel('Average within-cluster sum of squares')
plt.title('Elbow for KMeans clustering')&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;263&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cSyMAX/btrFaNHGjeP/kx6XoMAQD1HaQyThvwLkx0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cSyMAX/btrFaNHGjeP/kx6XoMAQD1HaQyThvwLkx0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cSyMAX/btrFaNHGjeP/kx6XoMAQD1HaQyThvwLkx0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcSyMAX%2FbtrFaNHGjeP%2Fkx6XoMAQD1HaQyThvwLkx0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;387&quot; height=&quot;263&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;263&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의&amp;nbsp;군집수를&amp;nbsp;구하기&amp;nbsp;위해&amp;nbsp;군집을&amp;nbsp;1~10&amp;nbsp;까지&amp;nbsp;변화할&amp;nbsp;때의&amp;nbsp;평균&amp;nbsp;군집내&amp;nbsp;거리(avgWithinSS)를&amp;nbsp;그린다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집수를&amp;nbsp;4&amp;nbsp;또는&amp;nbsp;6으로&amp;nbsp;하는&amp;nbsp;것이&amp;nbsp;적합해보인다. &lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652782557&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;fig = plt.figure()
ax = fig.add_subplot(111)

ax.plot(K, bss/tss*100, 'b*-')

plt.grid(True)
plt.xlabel('Number of clusters')
plt.ylabel('Percentage of variance explained')
plt.title('Elbow for KMeans clustering')&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;404&quot; data-origin-height=&quot;263&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/wIJ6m/btrFaOs2Edy/Ya9RTJOpu8FNFIZg9Lxj10/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/wIJ6m/btrFaOs2Edy/Ya9RTJOpu8FNFIZg9Lxj10/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/wIJ6m/btrFaOs2Edy/Ya9RTJOpu8FNFIZg9Lxj10/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwIJ6m%2FbtrFaOs2Edy%2FYa9RTJOpu8FNFIZg9Lxj10%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;404&quot; height=&quot;263&quot; data-origin-width=&quot;404&quot; data-origin-height=&quot;263&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들&amp;nbsp;간의&amp;nbsp;분산비율을&amp;nbsp;군집수를&amp;nbsp;달리하면서&amp;nbsp;본다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;Y축이&amp;nbsp;음수이기&amp;nbsp;때문에&amp;nbsp;0에&amp;nbsp;가까울수록&amp;nbsp;적합한&amp;nbsp;k&amp;nbsp;이다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;4~6&amp;nbsp;사이에서&amp;nbsp;군집을&amp;nbsp;결정하는&amp;nbsp;것이&amp;nbsp;적합해보인다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652796265&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np

w,v = np.linalg.eig(np.array([[ 0.91335 ,0.75969 ],[ 0.75969,0.69702]]))

print (&quot;\nEigen Values\n&quot;,w) 
print (&quot;\nEigen Vectors\n&quot;,v)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;117&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bftHxs/btrFe3W5fhk/5ajL6AFid8GRlqloeKv181/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bftHxs/btrFe3W5fhk/5ajL6AFid8GRlqloeKv181/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bftHxs/btrFe3W5fhk/5ajL6AFid8GRlqloeKv181/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbftHxs%2FbtrFe3W5fhk%2F5ajL6AFid8GRlqloeKv181%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;207&quot; height=&quot;117&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;117&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652812130&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;k_means_fit = KMeans(n_clusters=4, max_iter=300)
k_means_fit.fit(X_scale)

k_means_fit.cluster_centers_&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;436&quot; data-origin-height=&quot;78&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dFypNC/btrFaNubvIc/J65ubnMd1MEQhrCkBQr4r1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dFypNC/btrFaNubvIc/J65ubnMd1MEQhrCkBQr4r1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dFypNC/btrFaNubvIc/J65ubnMd1MEQhrCkBQr4r1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdFypNC%2FbtrFaNubvIc%2FJ65ubnMd1MEQhrCkBQr4r1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;436&quot; height=&quot;78&quot; data-origin-width=&quot;436&quot; data-origin-height=&quot;78&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652837633&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print (&quot;\nK-Means Clustering - Confusion Matrix\n\n&quot;,
       pd.crosstab(y_iris,k_means_fit.labels_,rownames = [&quot;Actuall&quot;],colnames = [&quot;Predicted&quot;]) )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;132&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/HNdrh/btrFdH01vZb/CZKojSKKZZ0WaYl5KFAQmK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/HNdrh/btrFdH01vZb/CZKojSKKZZ0WaYl5KFAQmK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/HNdrh/btrFdH01vZb/CZKojSKKZZ0WaYl5KFAQmK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHNdrh%2FbtrFdH01vZb%2FCZKojSKKZZ0WaYl5KFAQmK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;279&quot; height=&quot;132&quot; data-origin-width=&quot;279&quot; data-origin-height=&quot;132&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;실제&amp;nbsp;3종류의&amp;nbsp;꽃(y_iris)&amp;nbsp;과&amp;nbsp;군집으로&amp;nbsp;나눈&amp;nbsp;4개의&amp;nbsp;군집&amp;nbsp;간에&amp;nbsp;일치하는지를&amp;nbsp;확인한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652851946&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print (&quot;\nSilhouette-score: %0.3f&quot; % silhouette_score(x_iris, k_means_fit.labels_, metric='euclidean'))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;172&quot; data-origin-height=&quot;24&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bBw6pV/btrFe33QmlS/DpmJ2vTpnLCXDGw8D5D6K0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bBw6pV/btrFe33QmlS/DpmJ2vTpnLCXDGw8D5D6K0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bBw6pV/btrFe33QmlS/DpmJ2vTpnLCXDGw8D5D6K0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbBw6pV%2FbtrFe33QmlS%2FDpmJ2vTpnLCXDGw8D5D6K0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;172&quot; height=&quot;24&quot; data-origin-width=&quot;172&quot; data-origin-height=&quot;24&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652865802&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;for k in range(2,10):
    k_means_fitk = KMeans(n_clusters=k,max_iter=300)
    k_means_fitk.fit(x_iris)
    print (&quot;For K value&quot;,k,&quot;,Silhouette-score: %0.3f&quot; % silhouette_score(x_iris, k_means_fitk.labels_, metric='euclidean'))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;147&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dLMqQJ/btrFdHtcZ4h/lNUcECQq91Bab7Np9q4cOk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dLMqQJ/btrFdHtcZ4h/lNUcECQq91Bab7Np9q4cOk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dLMqQJ/btrFdHtcZ4h/lNUcECQq91Bab7Np9q4cOk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdLMqQJ%2FbtrFdHtcZ4h%2FlNUcECQq91Bab7Np9q4cOk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;283&quot; height=&quot;147&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;147&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집수를&amp;nbsp;2~10&amp;nbsp;까지&amp;nbsp;변경하면서&amp;nbsp;실루엣&amp;nbsp;스코어(silhouette-score)를&amp;nbsp;찾는다. &lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 계층적 군집분석 (Hierarchical Clustering)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;한&amp;nbsp;번&amp;nbsp;어떤&amp;nbsp;군집에&amp;nbsp;속한&amp;nbsp;개체는&amp;nbsp;분석&amp;nbsp;과정에서&amp;nbsp;다른&amp;nbsp;군집과&amp;nbsp;더&amp;nbsp;가깝게&amp;nbsp;계산되어도&amp;nbsp; &lt;br /&gt;다른&amp;nbsp;군집화가&amp;nbsp;허용되지&amp;nbsp;않는&amp;nbsp;방법이고,&amp;nbsp;속도도&amp;nbsp;다소&amp;nbsp;오래&amp;nbsp;걸린다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652900403&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm

plt.style.use('seaborn-deep')
cmap = matplotlib.cm.get_cmap('plasma')

data = pd.read_csv('Mall_Customers.csv')
X = data.iloc[:, [3,4]].values&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652918384&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import scipy.cluster.hierarchy as sch

plt.figure(1)
z = sch.linkage(X, method = 'ward')
dendrogram = sch.dendrogram(z)

plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('ward distances')

plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/xevUf/btrFfYBeznt/sZtGaDuHmAWaV0KKtaQz90/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/xevUf/btrFfYBeznt/sZtGaDuHmAWaV0KKtaQz90/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/xevUf/btrFfYBeznt/sZtGaDuHmAWaV0KKtaQz90/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxevUf%2FbtrFfYBeznt%2FsZtGaDuHmAWaV0KKtaQz90%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;391&quot; height=&quot;279&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;개체들&amp;nbsp;간의&amp;nbsp;거리를&amp;nbsp;ward&amp;nbsp;방법으로&amp;nbsp;하고&amp;nbsp;덴드로그램을&amp;nbsp;그린다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652937676&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.cluster import AgglomerativeClustering

k = 5
hc = AgglomerativeClustering(n_clusters = k, affinity = &quot;euclidean&quot;, linkage = 'ward')
y_hc = hc.fit_predict(X)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;군집수를&amp;nbsp;5로하여&amp;nbsp;병합&amp;nbsp;군집(Agglomerative&amp;nbsp;Clustering)을&amp;nbsp;적용한&amp;nbsp;군집분석을&amp;nbsp;수행한다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;시작할&amp;nbsp;때&amp;nbsp;각&amp;nbsp;포인트를&amp;nbsp;하나의&amp;nbsp;클러스터로&amp;nbsp;지정하고,&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;그&amp;nbsp;다음&amp;nbsp;종료&amp;nbsp;조건을&amp;nbsp;만족할&amp;nbsp;때까지&amp;nbsp;가장&amp;nbsp;비슷한&amp;nbsp;두&amp;nbsp;클러스터를&amp;nbsp;합치는&amp;nbsp;방식으로&amp;nbsp;진행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655652950160&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;labels = [('Cluster ' + str(i+1)) for i in range(k)]

plt.figure(2)
for i in range(k):
    plt.scatter(X[y_hc == i, 0], X[y_hc == i, 1], s = 20,
                c = cmap(i/k), label = labels[i]) 
                
plt.xlabel('Income')
plt.ylabel('Spending score')
plt.title('HC cluster plot')

plt.legend()
plt.show()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/dISqv7/btrE76HXgCP/KyeFEDvDB3SNxo1bZn5O7K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/dISqv7/btrE76HXgCP/KyeFEDvDB3SNxo1bZn5O7K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/dISqv7/btrE76HXgCP/KyeFEDvDB3SNxo1bZn5O7K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdISqv7%2FbtrE76HXgCP%2FKyeFEDvDB3SNxo1bZn5O7K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;391&quot; height=&quot;279&quot; data-origin-width=&quot;391&quot; data-origin-height=&quot;279&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;5개의&amp;nbsp;군집으로&amp;nbsp;나누어진&amp;nbsp;것을&amp;nbsp;확인할&amp;nbsp;수&amp;nbsp;있다.&lt;/span&gt;&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/331</guid>
      <comments>https://eunki.tistory.com/331#entry331comment</comments>
      <pubDate>Mon, 20 Jun 2022 00:36:11 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 엘라스틱넷 (Elasticnet)</title>
      <link>https://eunki.tistory.com/330</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;엘라스틱넷&amp;nbsp;(Elasticnet)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;릿지회귀와&amp;nbsp;라쏘회귀의&amp;nbsp;절충한&amp;nbsp;모델로,&amp;nbsp;맂시와&amp;nbsp;라쏘의&amp;nbsp;규제항을&amp;nbsp;단순히&amp;nbsp;더해서&amp;nbsp;사용한다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;두&amp;nbsp;규제항의&amp;nbsp;혼합정도를&amp;nbsp;혼합비율&amp;nbsp;r을&amp;nbsp;사용해&amp;nbsp;조절하게&amp;nbsp;된다. &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;만약&amp;nbsp;r=0&amp;nbsp;이면&amp;nbsp;릿지회귀와&amp;nbsp;같고,&amp;nbsp;r=1&amp;nbsp;이면&amp;nbsp;라쏘회귀와&amp;nbsp;같게&amp;nbsp;된다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- alpha : 값이&amp;nbsp;클수록&amp;nbsp;계수를&amp;nbsp;0에&amp;nbsp;가깝게&amp;nbsp;제약하여&amp;nbsp;훈련&amp;nbsp;데이터의&amp;nbsp;정확도는&amp;nbsp;낮아지지만&amp;nbsp;일반화에&amp;nbsp;기여한다.&amp;nbsp;(default&amp;nbsp;=&amp;nbsp;1) &lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 값이 0에 가까울수록 회귀계수를 아무런 제약을 하지 않은 선형회귀와 유사하게 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;-&amp;nbsp;l1_ratio&amp;nbsp;:&amp;nbsp;릿지와&amp;nbsp;라쏘의&amp;nbsp;규제&amp;nbsp;비율에&amp;nbsp;대한&amp;nbsp;가중&amp;nbsp;정도&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372208940&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.linear_model import ElasticNet

model=ElasticNet()
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.050029698219161034&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.051683303919568435&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;270&quot; data-origin-height=&quot;41&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/enFGkK/btrFbOfhfcy/JBYwuIKKVM35Pkl6nJarPK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/enFGkK/btrFbOfhfcy/JBYwuIKKVM35Pkl6nJarPK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/enFGkK/btrFbOfhfcy/JBYwuIKKVM35Pkl6nJarPK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FenFGkK%2FbtrFbOfhfcy%2FJBYwuIKKVM35Pkl6nJarPK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;270&quot; height=&quot;41&quot; data-origin-width=&quot;270&quot; data-origin-height=&quot;41&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 하이퍼파라미터 튜닝&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. Grid Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924211&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;param_grid={'alpha': [0.0, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 0.1, 0.5, 1.0, 2.0, 3.0]}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;alpha 값을 10가지로 설정하고 그리드 탐색을 진행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import GridSearchCV

grid_search=GridSearchCV(ElasticNet(), param_grid, cv=5)
grid_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;559&quot; data-origin-height=&quot;55&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2tiS8/btrFfYupaAA/DG2DYZATfRshD6Kjo1zC1K/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2tiS8/btrFfYupaAA/DG2DYZATfRshD6Kjo1zC1K/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2tiS8/btrFfYupaAA/DG2DYZATfRshD6Kjo1zC1K/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2tiS8%2FbtrFfYupaAA%2FDG2DYZATfRshD6Kjo1zC1K%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;559&quot; height=&quot;55&quot; data-origin-width=&quot;559&quot; data-origin-height=&quot;55&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(grid_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(grid_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(grid_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;230&quot; data-origin-height=&quot;55&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/HuRrq/btrE7C79Jkc/7XDTXc8du3M5TzSzddrck1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/HuRrq/btrE7C79Jkc/7XDTXc8du3M5TzSzddrck1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/HuRrq/btrE7C79Jkc/7XDTXc8du3M5TzSzddrck1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FHuRrq%2FbtrE7C79Jkc%2F7XDTXc8du3M5TzSzddrck1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;230&quot; height=&quot;55&quot; data-origin-width=&quot;230&quot; data-origin-height=&quot;55&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 1e-05로, 이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-2. Random Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924215&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from scipy.stats import randint

param_distribs = {'alpha': randint(low=0.00001, high=10)}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;랜덤 탐색을 위해 alpha를 0.00001 ~ 10 사이의 범위에서 100번(n_iter) 무작위 추출하는 방식을 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924216&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import RandomizedSearchCV

random_search=RandomizedSearchCV(ElasticNet(), 
                                 param_distributions=param_distribs, n_iter=100, cv=5)
random_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;55&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/rhdOg/btrFciNWB8X/naPNKRbmKhQHuUiqjHrdWK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/rhdOg/btrFciNWB8X/naPNKRbmKhQHuUiqjHrdWK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/rhdOg/btrFciNWB8X/naPNKRbmKhQHuUiqjHrdWK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrhdOg%2FbtrFciNWB8X%2FnaPNKRbmKhQHuUiqjHrdWK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;594&quot; height=&quot;55&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;55&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924217&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(random_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(random_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(random_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;208&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mq5j7/btrE753kshY/d3r0COa5dwtuK09BAwDmcK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mq5j7/btrE753kshY/d3r0COa5dwtuK09BAwDmcK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mq5j7/btrE753kshY/d3r0COa5dwtuK09BAwDmcK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fmq5j7%2FbtrE753kshY%2Fd3r0COa5dwtuK09BAwDmcK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;208&quot; height=&quot;57&quot; data-origin-width=&quot;208&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 0으로, 이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/330</guid>
      <comments>https://eunki.tistory.com/330#entry330comment</comments>
      <pubDate>Sun, 19 Jun 2022 23:59:20 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 라쏘 (Lasso)</title>
      <link>https://eunki.tistory.com/329</link>
      <description>&lt;div class=&quot;revenue_unit_wrap &quot;&gt;
&lt;div class=&quot;revenue_unit_item adfit&quot;&gt;
&lt;div class=&quot;revenue_unit_info&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;728x90&lt;/span&gt;&lt;/div&gt;
&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;ins id=&quot;kakao_ad_7sIzb5&quot; class=&quot;kakao_ad_area&quot; data-ad-unit=&quot;DAN-ztZ7AseScWWdoiQm&quot; data-ad-width=&quot;728&quot; data-ad-height=&quot;90px&quot;&gt; &lt;/ins&gt;&lt;/span&gt;
&lt;script src=&quot;//t1.daumcdn.net/kas/static/ba.min.js&quot;&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;라쏘&amp;nbsp;(Lasso)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;릿지&amp;nbsp;회귀모델과&amp;nbsp;유사하게&amp;nbsp;특성의&amp;nbsp;계수값을&amp;nbsp;0에&amp;nbsp;가깝게&amp;nbsp;하지만 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;실제&amp;nbsp;중요하지&amp;nbsp;않은&amp;nbsp;변수의&amp;nbsp;계수를&amp;nbsp;0으로&amp;nbsp;만들어&amp;nbsp;불필요한&amp;nbsp;변수를&amp;nbsp;제거하는&amp;nbsp;모델&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- alpha : 값이&amp;nbsp;클수록&amp;nbsp;계수를&amp;nbsp;0에&amp;nbsp;가깝게&amp;nbsp;제약하여&amp;nbsp;훈련&amp;nbsp;데이터의&amp;nbsp;정확도는&amp;nbsp;낮아지지만&amp;nbsp;일반화에&amp;nbsp;기여한다.&amp;nbsp;(default&amp;nbsp;=&amp;nbsp;1) &lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 값이 0에 가까울수록 회귀계수를 아무런 제약을 하지 않은 선형회귀와 유사하게 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372208940&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.linear_model import Lasso

model=Lasso()
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.5455724679313863&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.5626850497564577&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;276&quot; data-origin-height=&quot;42&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/2QEmO/btrE8OG2Vh2/P7SOEvnrQ8UoBrpwfgP4DK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/2QEmO/btrE8OG2Vh2/P7SOEvnrQ8UoBrpwfgP4DK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/2QEmO/btrE8OG2Vh2/P7SOEvnrQ8UoBrpwfgP4DK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F2QEmO%2FbtrE8OG2Vh2%2FP7SOEvnrQ8UoBrpwfgP4DK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;276&quot; height=&quot;42&quot; data-origin-width=&quot;276&quot; data-origin-height=&quot;42&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 하이퍼파라미터 튜닝&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. Grid Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924211&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;param_grid={'alpha': [0.0, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 0.1, 0.5, 1.0, 2.0, 3.0]}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;alpha 값을 11가지로 설정하고 그리드 탐색을 진행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import GridSearchCV

grid_search=GridSearchCV(Lasso(), param_grid, cv=5)
grid_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;563&quot; data-origin-height=&quot;58&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/mjA5t/btrFdGA2Rll/FKnxvRWCR5hyKyU4BFAOR1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/mjA5t/btrFdGA2Rll/FKnxvRWCR5hyKyU4BFAOR1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/mjA5t/btrFdGA2Rll/FKnxvRWCR5hyKyU4BFAOR1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmjA5t%2FbtrFdGA2Rll%2FFKnxvRWCR5hyKyU4BFAOR1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;563&quot; height=&quot;58&quot; data-origin-width=&quot;563&quot; data-origin-height=&quot;58&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(grid_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(grid_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(grid_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;219&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bueZ5i/btrE7EdQk71/xE5uPznkCedxbJQw5mufx0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bueZ5i/btrE7EdQk71/xE5uPznkCedxbJQw5mufx0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bueZ5i/btrE7EdQk71/xE5uPznkCedxbJQw5mufx0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbueZ5i%2FbtrE7EdQk71%2FxE5uPznkCedxbJQw5mufx0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;219&quot; height=&quot;57&quot; data-origin-width=&quot;219&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 0.5로, 이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-2. Random Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924215&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from scipy.stats import randint

param_distribs = {'alpha': randint(low=0.00001, high=10)}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;랜덤 탐색을 위해 alpha를 0.00001 ~ 10 사이의 범위에서 100번(n_iter) 무작위 추출하는 방식을 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924216&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import RandomizedSearchCV

random_search=RandomizedSearchCV(Lasso(), 
                                 param_distributions=param_distribs, n_iter=100, cv=5)
random_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;54&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bMxkaK/btrFe4VWpxJ/64TFPGXkunzrqim9tFcQ50/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bMxkaK/btrFe4VWpxJ/64TFPGXkunzrqim9tFcQ50/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bMxkaK/btrFe4VWpxJ/64TFPGXkunzrqim9tFcQ50/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbMxkaK%2FbtrFe4VWpxJ%2F64TFPGXkunzrqim9tFcQ50%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;594&quot; height=&quot;54&quot; data-origin-width=&quot;594&quot; data-origin-height=&quot;54&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924217&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(random_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(random_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(random_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;203&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/yOOyK/btrFe5Ay8W0/LjuAui7K3SfRtyeGbawlyk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/yOOyK/btrFe5Ay8W0/LjuAui7K3SfRtyeGbawlyk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/yOOyK/btrFe5Ay8W0/LjuAui7K3SfRtyeGbawlyk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FyOOyK%2FbtrFe5Ay8W0%2FLjuAui7K3SfRtyeGbawlyk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;203&quot; height=&quot;57&quot; data-origin-width=&quot;203&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 1로, 이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/329</guid>
      <comments>https://eunki.tistory.com/329#entry329comment</comments>
      <pubDate>Sun, 19 Jun 2022 23:51:56 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 릿지 (Ridge)</title>
      <link>https://eunki.tistory.com/328</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;릿지&amp;nbsp;(Ridge)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;선형회귀분석의&amp;nbsp;기본&amp;nbsp;원리를&amp;nbsp;따르나,&amp;nbsp;가중치(회귀계수)&amp;nbsp;값을&amp;nbsp;최대한&amp;nbsp;작게&amp;nbsp;만들어&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;모든&amp;nbsp;독립변수가&amp;nbsp;종속변수에&amp;nbsp;미치는&amp;nbsp;영향을&amp;nbsp;최소화하는&amp;nbsp;제약을&amp;nbsp;반영한&amp;nbsp;회귀모델&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;각 특성의 영향을 최소화하여 훈련 데이터에 과대적합되지 않도록 제약한 모델이다.미&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;선형&amp;nbsp;관계뿐만&amp;nbsp;아니라&amp;nbsp;다항&amp;nbsp;곡선&amp;nbsp;추정도&amp;nbsp;가능하다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- alpha&amp;nbsp;:&amp;nbsp;값이&amp;nbsp;클수록&amp;nbsp;규제가&amp;nbsp;강하여&amp;nbsp;회귀&amp;nbsp;계수가&amp;nbsp;0에&amp;nbsp;근접하다.&amp;nbsp;(default&amp;nbsp;=&amp;nbsp;1) &lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 값이 0에 가까울수록 규제를 하지 않아 선형 회귀와 유사한 결과를 보인다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372208940&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.linear_model import Ridge

model=Ridge()
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.5455487773718164&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.5626954941458684&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;40&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bLjG4l/btrE8Otwxex/SckUT5tk4xmnhtAWG4n1Nk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bLjG4l/btrE8Otwxex/SckUT5tk4xmnhtAWG4n1Nk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bLjG4l/btrE8Otwxex/SckUT5tk4xmnhtAWG4n1Nk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbLjG4l%2FbtrE8Otwxex%2FSckUT5tk4xmnhtAWG4n1Nk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;283&quot; height=&quot;40&quot; data-origin-width=&quot;283&quot; data-origin-height=&quot;40&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 하이퍼파라미터 튜닝&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. Grid Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924211&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;param_grid={'alpha': [1e-4, 1e-3, 1e-2, 0.1, 0.5, 1.0, 5.0, 10.0]}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;alpha 값을 11가지로 설정하고 그리드 탐색을 진행한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import GridSearchCV

grid_search=GridSearchCV(Ridge(), param_grid, cv=5)
grid_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;532&quot; data-origin-height=&quot;58&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cRr7k8/btrE9SbkJMd/eQpqbdCxY6xwqsHRtaEa41/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cRr7k8/btrE9SbkJMd/eQpqbdCxY6xwqsHRtaEa41/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cRr7k8/btrE9SbkJMd/eQpqbdCxY6xwqsHRtaEa41/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcRr7k8%2FbtrE9SbkJMd%2FeQpqbdCxY6xwqsHRtaEa41%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;532&quot; height=&quot;58&quot; data-origin-width=&quot;532&quot; data-origin-height=&quot;58&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924213&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(grid_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(grid_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(grid_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;216&quot; data-origin-height=&quot;55&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/csf0rj/btrFaOT6MVU/hfW8cVcyP0sBEDTrG3Nzp1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/csf0rj/btrFaOT6MVU/hfW8cVcyP0sBEDTrG3Nzp1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/csf0rj/btrFaOT6MVU/hfW8cVcyP0sBEDTrG3Nzp1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcsf0rj%2FbtrFaOT6MVU%2FhfW8cVcyP0sBEDTrG3Nzp1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;216&quot; height=&quot;55&quot; data-origin-width=&quot;216&quot; data-origin-height=&quot;55&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 0.1로, 이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-2. Random Search&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924215&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from scipy.stats import randint

param_distribs = {'alpha': randint(low=0.0001, high=100)}&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;랜덤 탐색을 위해 alpha를 0.0001 ~ 100 사이의 범위에서 100번(n_iter) 무작위 추출하는 방식을 적용한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924216&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import RandomizedSearchCV

random_search=RandomizedSearchCV(Ridge(), 
                                 param_distributions=param_distribs, n_iter=100, cv=5)
random_search.fit(X_scaled_train, y_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;58&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Anfo9/btrE9Qq4LNV/uIcBIucUfBKZKHX3PIkOA0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Anfo9/btrE9Qq4LNV/uIcBIucUfBKZKHX3PIkOA0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Anfo9/btrE9Qq4LNV/uIcBIucUfBKZKHX3PIkOA0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FAnfo9%2FbtrE9Qq4LNV%2FuIcBIucUfBKZKHX3PIkOA0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;596&quot; height=&quot;58&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;58&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924217&quot; class=&quot;lisp&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;print(&quot;Best Parameter: {}&quot;.format(random_search.best_params_))
print(&quot;Best Score: {:.4f}&quot;.format(random_search.best_score_))
print(&quot;TestSet Score: {:.4f}&quot;.format(random_search.score(X_scaled_test, y_test)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;55&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b0wcsC/btrFchhbhIS/mO7Wu4bbG34k7Vn8KPBHik/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b0wcsC/btrFchhbhIS/mO7Wu4bbG34k7Vn8KPBHik/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b0wcsC/btrFchhbhIS/mO7Wu4bbG34k7Vn8KPBHik/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb0wcsC%2FbtrFchhbhIS%2FmO7Wu4bbG34k7Vn8KPBHik%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;207&quot; height=&quot;55&quot; data-origin-width=&quot;207&quot; data-origin-height=&quot;55&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;최적의 alpha는 0.1로,&amp;nbsp;이때 훈련 데이터의 정확도는 54.5%, 테스트 데이터의 정확도는 56.3% 이다.&lt;/span&gt;&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/328</guid>
      <comments>https://eunki.tistory.com/328#entry328comment</comments>
      <pubDate>Sun, 19 Jun 2022 23:44:30 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 선형회귀모델 (Linear Regression Model)</title>
      <link>https://eunki.tistory.com/327</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;선형회귀모델&amp;nbsp;(Linear&amp;nbsp;Regression&amp;nbsp;Model)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;연속형 원인 변수가 연속형 결과 변수에 영향을 미치는지를 분석하여 레이블 변수를 예측&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;가장 대표적인 오차 지표인 RMSE는 실제값과 예측값 간에 전 구간에 걸친 평균적인 오차&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. Statmodel&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;파이썬의 통계분석 모듈&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre id=&quot;code_1655648181966&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import statsmodels.api as sm

x_train_new = sm.add_constant(X_train)
x_test_new = sm.add_constant(X_test)
x_train_new.head()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;365&quot; data-origin-height=&quot;165&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/DH296/btrE4CtxvDW/PowjwK5GLZnCEkVcs3HWq1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/DH296/btrE4CtxvDW/PowjwK5GLZnCEkVcs3HWq1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/DH296/btrE4CtxvDW/PowjwK5GLZnCEkVcs3HWq1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDH296%2FbtrE4CtxvDW%2FPowjwK5GLZnCEkVcs3HWq1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;365&quot; height=&quot;165&quot; data-origin-width=&quot;365&quot; data-origin-height=&quot;165&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;X_train 에 상수항 변수를 더하여 x_train_new 라는 변수로 만든다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;const 변수가 상수를 추정하는 역할을 한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;통계적 회귀분석에서는 일반적으로 X 데이터의 정규화를 하지 않는다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 &lt;/b&gt;&lt;b&gt;데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648232454&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;multi_model = sm.OLS(y_train,x_train_new).fit()
print (multi_model.summary())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;496&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Ss5Yh/btrFe5gd7rU/Lyda0IJtm83GdLFIJs2JJ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Ss5Yh/btrFe5gd7rU/Lyda0IJtm83GdLFIJs2JJ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Ss5Yh/btrFe5gd7rU/Lyda0IJtm83GdLFIJs2JJ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FSs5Yh%2FbtrFe5gd7rU%2FLyda0IJtm83GdLFIJs2JJ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;596&quot; height=&quot;496&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;496&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;R-squared: 0.546 &amp;rarr; 54.6% 수준으로 회귀직선(예측값)과 실제값이 일치한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;coef : 각 X 변수가 1증가할 때마다 Y가 변화하는 정도 (=기울기)&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;p&amp;gt;|t| : 통계적으로 유의한가를 검증한 결과로, 0.05보다 작으면 유의한 영향을 미치는 변수로 본다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648250219&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;multi_model2 = sm.OLS(y_test,x_test_new).fit()
print (multi_model2.summary())&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;496&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/lVOz4/btrE8Rpd5Y7/PXwWn6XJLAjsmQ8tz7gsVK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/lVOz4/btrE8Rpd5Y7/PXwWn6XJLAjsmQ8tz7gsVK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/lVOz4/btrE8Rpd5Y7/PXwWn6XJLAjsmQ8tz7gsVK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FlVOz4%2FbtrE8Rpd5Y7%2FPXwWn6XJLAjsmQ8tz7gsVK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;596&quot; height=&quot;496&quot; data-origin-width=&quot;596&quot; data-origin-height=&quot;496&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924206&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.linear_model import LinearRegression

model=LinearRegression()
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.5455724996358273&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.562684388358716&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;273&quot; data-origin-height=&quot;43&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bUcehF/btrFe3W1xLh/050sYVkFHaMAaf6SjpI1E0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bUcehF/btrFe3W1xLh/050sYVkFHaMAaf6SjpI1E0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bUcehF/btrFe3W1xLh/050sYVkFHaMAaf6SjpI1E0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbUcehF%2FbtrFe3W1xLh%2F050sYVkFHaMAaf6SjpI1E0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;273&quot; height=&quot;43&quot; data-origin-width=&quot;273&quot; data-origin-height=&quot;43&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② MAE (Mean Absolute Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648728707&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_test, pred_test)  # 47230.874701637375&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;③ MSE&amp;nbsp;(Mean&amp;nbsp;Squared&amp;nbsp;Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648748210&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import mean_squared_error 

mean_squared_error(y_test, pred_test)  # 3996869138.1105847&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;④ MAPE&amp;nbsp;(Mean&amp;nbsp;Absolute&amp;nbsp;Percentage&amp;nbsp;Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648765241&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def MAPE(y_test, y_pred):
    return np.mean(np.abs((y_test - pred_test) / y_test)) * 100 
    
MAPE(y_test, pred_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;182&quot; data-origin-height=&quot;39&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/6ki5w/btrFbNAAK63/dNaBpzQhJb7GFETsL41Tek/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/6ki5w/btrFbNAAK63/dNaBpzQhJb7GFETsL41Tek/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/6ki5w/btrFbNAAK63/dNaBpzQhJb7GFETsL41Tek/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F6ki5w%2FbtrFbNAAK63%2FdNaBpzQhJb7GFETsL41Tek%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;182&quot; height=&quot;39&quot; data-origin-width=&quot;182&quot; data-origin-height=&quot;39&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;⑤ MPE&amp;nbsp;(Mean&amp;nbsp;Percentage&amp;nbsp;Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655648786724&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;def MAE(y_test, y_pred):
    return np.mean((y_test - pred_test) / y_test) * 100
    
MAE(y_test, pred_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;171&quot; data-origin-height=&quot;38&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bzdcZK/btrE75oFVIk/bKekZKxv4B1BUoLrKEaKE1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bzdcZK/btrE75oFVIk/bKekZKxv4B1BUoLrKEaKE1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bzdcZK/btrE75oFVIk/bKekZKxv4B1BUoLrKEaKE1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbzdcZK%2FbtrE75oFVIk%2FbKekZKxv4B1BUoLrKEaKE1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;171&quot; height=&quot;38&quot; data-origin-width=&quot;171&quot; data-origin-height=&quot;38&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/327</guid>
      <comments>https://eunki.tistory.com/327#entry327comment</comments>
      <pubDate>Sun, 19 Jun 2022 23:26:56 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 앙상블 스태킹 (Stacking)</title>
      <link>https://eunki.tistory.com/326</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;앙상블&amp;nbsp;스태킹&amp;nbsp;(Stacking)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;데이터셋이&amp;nbsp;아니라&amp;nbsp;여러&amp;nbsp;학습기에서&amp;nbsp;예측한&amp;nbsp;예측값으로 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;다시&amp;nbsp;학습&amp;nbsp;데이터를&amp;nbsp;만들어&amp;nbsp;일반화된&amp;nbsp;최종&amp;nbsp;모델을&amp;nbsp;구성하는&amp;nbsp;방법&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;-&amp;nbsp;estimators&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 1. 분류 (Classification)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372128783&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import pandas as pd

# 유방암 예측 분류 데이터
data1=pd.read_csv('breast-cancer-wisconsin.csv', encoding='utf-8')

X=data1[data1.columns[1:10]]
y=data1[[&quot;Class&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369148159&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, stratify=y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369378258&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train) 

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372208940&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier

estimators = [('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
              ('svr', SVC(random_state=42))]
model = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.986328125&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372304391&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import confusion_matrix

confusion_train=confusion_matrix(y_train, pred_train)
print(&quot;훈련데이터 오차행렬:\n&quot;, confusion_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;146&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/1g6ut/btrE76gL3lR/Cwc4WW8Tcjmdu5g4w2o26k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/1g6ut/btrE76gL3lR/Cwc4WW8Tcjmdu5g4w2o26k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/1g6ut/btrE76gL3lR/Cwc4WW8Tcjmdu5g4w2o26k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F1g6ut%2FbtrE76gL3lR%2FCwc4WW8Tcjmdu5g4w2o26k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;146&quot; height=&quot;57&quot; data-origin-width=&quot;146&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 3명이 오분류, 환자(1) 중 4명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;②&amp;nbsp;분류예측&amp;nbsp;레포트&amp;nbsp;(classification&amp;nbsp;report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372392067&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import classification_report

cfreport_train=classification_report(y_train, pred_train)
print(&quot;분류예측 레포트:\n&quot;, cfreport_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;389&quot; data-origin-height=&quot;161&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bGRz2t/btrE7ycjZA5/i8ioHB9YznWsP50TbcCAb0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bGRz2t/btrE7ycjZA5/i8ioHB9YznWsP50TbcCAb0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bGRz2t/btrE7ycjZA5/i8ioHB9YznWsP50TbcCAb0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbGRz2t%2FbtrE7ycjZA5%2Fi8ioHB9YznWsP50TbcCAb0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;389&quot; height=&quot;161&quot; data-origin-width=&quot;389&quot; data-origin-height=&quot;161&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 0.99, 재현율(recall) = 0.98&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370087488&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.9649122807017544&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370110454&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;confusion_test=confusion_matrix(y_test, pred_test)
print(&quot;테스트데이터 오차행렬:\n&quot;, confusion_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;163&quot; data-origin-height=&quot;58&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/11g7J/btrFaMVrMMT/ePLys6CV9GRRUO4PgiDMOK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/11g7J/btrFaMVrMMT/ePLys6CV9GRRUO4PgiDMOK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/11g7J/btrFaMVrMMT/ePLys6CV9GRRUO4PgiDMOK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F11g7J%2FbtrFaMVrMMT%2FePLys6CV9GRRUO4PgiDMOK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;163&quot; height=&quot;58&quot; data-origin-width=&quot;163&quot; data-origin-height=&quot;58&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 5명이 오분류, 환자(1) 중 1명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② 분류예측 레포트 (classification report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370139549&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;cfreport_test=classification_report(y_test, pred_test)
print(&quot;분류예측 레포트:\n&quot;, cfreport_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;385&quot; data-origin-height=&quot;163&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qMmAu/btrE4T2P2iC/dEdq9j4WKkyZggXKYiQ9S0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qMmAu/btrE4T2P2iC/dEdq9j4WKkyZggXKYiQ9S0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qMmAu/btrE4T2P2iC/dEdq9j4WKkyZggXKYiQ9S0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqMmAu%2FbtrE4T2P2iC%2FdEdq9j4WKkyZggXKYiQ9S0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;385&quot; height=&quot;163&quot; data-origin-width=&quot;385&quot; data-origin-height=&quot;163&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 0.96, 재현율(recall) = 0.97&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 2. 회귀 (Regression)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924206&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor

estimators = [('lr', LinearRegression()), 
              ('knn', KNeighborsRegressor())]
model = StackingRegressor(estimators=estimators, 
                          final_estimator=RandomForestRegressor(n_estimators=10, random_state=42))
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.543404932348004&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.4781188528801523&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;271&quot; data-origin-height=&quot;42&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/oo1Ja/btrFdHGzDek/tJnJobuQqD0KWtcjmVDu30/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/oo1Ja/btrFdHGzDek/tJnJobuQqD0KWtcjmVDu30/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/oo1Ja/btrFdHGzDek/tJnJobuQqD0KWtcjmVDu30/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Foo1Ja%2FbtrFdHGzDek%2FtJnJobuQqD0KWtcjmVDu30%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;271&quot; height=&quot;42&quot; data-origin-width=&quot;271&quot; data-origin-height=&quot;42&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/326</guid>
      <comments>https://eunki.tistory.com/326#entry326comment</comments>
      <pubDate>Sun, 19 Jun 2022 21:31:09 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 앙상블 부스팅 (Boosting)</title>
      <link>https://eunki.tistory.com/325</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;앙상블 배깅 (Boosting)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;여러&amp;nbsp;개의&amp;nbsp;약한&amp;nbsp;학습기를&amp;nbsp;순차적으로&amp;nbsp;학습시켜&amp;nbsp;예측하면서 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;잘&amp;nbsp;못&amp;nbsp;예측한&amp;nbsp;데이터에&amp;nbsp;가중치를&amp;nbsp;부여하여&amp;nbsp;오류를&amp;nbsp;개선해&amp;nbsp;나가며&amp;nbsp;학습하는&amp;nbsp;앙상블&amp;nbsp;모델 &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;배깅이&amp;nbsp;병렬식&amp;nbsp;앙상블인&amp;nbsp;반면,&amp;nbsp;부스팅은&amp;nbsp;순차적인&amp;nbsp;직렬식&amp;nbsp;앙상블이다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1.&amp;nbsp;AdaBoosting&lt;/b&gt; &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;-&amp;nbsp;base_estimator &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- n_estimator : 모델&amp;nbsp;수행횟수&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2.&amp;nbsp;GradientBoosting&lt;/b&gt; &lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;-&amp;nbsp;learning_rate&amp;nbsp;:&amp;nbsp;학습률&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 1. 분류 (Classification)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655368684108&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import pandas as pd

# 암 예측 분류 데이터
data=pd.read_csv('breast-cancer-wisconsin.csv', encoding='utf-8')

X=data[data.columns[1:10]]
y=data[[&quot;Class&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369148159&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, stratify=y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369378258&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train) 

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. &lt;/b&gt;&lt;b&gt;AdaBoosting&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369652753&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(n_estimators=100, random_state=0)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 1.0&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369834721&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import confusion_matrix

confusion_train=confusion_matrix(y_train, pred_train)
print(&quot;훈련데이터 오차행렬:\n&quot;, confusion_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;148&quot; data-origin-height=&quot;59&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/n3ryI/btrFe5AqqCU/fK9072yPGU4VkMoVBFnHAK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/n3ryI/btrFe5AqqCU/fK9072yPGU4VkMoVBFnHAK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/n3ryI/btrFe5AqqCU/fK9072yPGU4VkMoVBFnHAK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fn3ryI%2FbtrFe5AqqCU%2FfK9072yPGU4VkMoVBFnHAK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;148&quot; height=&quot;59&quot; data-origin-width=&quot;148&quot; data-origin-height=&quot;59&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #333333; font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상&amp;nbsp;333명,&amp;nbsp;환자&amp;nbsp;179명을&amp;nbsp;정확하게&amp;nbsp;분류했다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② 분류예측 레포트 (classification report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369953277&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import classification_report

cfreport_train=classification_report(y_train, pred_train)
print(&quot;분류예측 레포트:\n&quot;, cfreport_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;159&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/btskpz/btrE7EY6ByY/ZYKaKONXK6kxOEJ9IuAEK0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/btskpz/btrE7EY6ByY/ZYKaKONXK6kxOEJ9IuAEK0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/btskpz/btrE7EY6ByY/ZYKaKONXK6kxOEJ9IuAEK0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbtskpz%2FbtrE7EY6ByY%2FZYKaKONXK6kxOEJ9IuAEK0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;387&quot; height=&quot;159&quot; data-origin-width=&quot;387&quot; data-origin-height=&quot;159&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 1.0, 재현율(recall) = 1.0&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370087488&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.9532163742690059&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370110454&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;confusion_test=confusion_matrix(y_test, pred_test)
print(&quot;테스트데이터 오차행렬:\n&quot;, confusion_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;165&quot; data-origin-height=&quot;61&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cc2cBC/btrE75IUdsI/Q3S1Y1amhrXGpfs87AcuaK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cc2cBC/btrE75IUdsI/Q3S1Y1amhrXGpfs87AcuaK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cc2cBC/btrE75IUdsI/Q3S1Y1amhrXGpfs87AcuaK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcc2cBC%2FbtrE75IUdsI%2FQ3S1Y1amhrXGpfs87AcuaK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;165&quot; height=&quot;61&quot; data-origin-width=&quot;165&quot; data-origin-height=&quot;61&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 5명이 오분류, 환자(1) 중 3명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② 분류예측 레포트 (classification report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370139549&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;cfreport_test=classification_report(y_test, pred_test)
print(&quot;분류예측 레포트:\n&quot;, cfreport_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;386&quot; data-origin-height=&quot;162&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/QRLYL/btrFch9bhBF/SGHy2KNRGwDf9O01mz1BJ0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/QRLYL/btrFch9bhBF/SGHy2KNRGwDf9O01mz1BJ0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/QRLYL/btrFch9bhBF/SGHy2KNRGwDf9O01mz1BJ0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FQRLYL%2FbtrFch9bhBF%2FSGHy2KNRGwDf9O01mz1BJ0%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;386&quot; height=&quot;162&quot; data-origin-width=&quot;386&quot; data-origin-height=&quot;162&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 0.95, 재현율(recall) = 0.95&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. GradientBoosting &lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655640981829&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 1.0&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655640981829&quot; class=&quot;routeros&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import confusion_matrix

confusion_train=confusion_matrix(y_train, pred_train)
print(&quot;훈련데이터 오차행렬:\n&quot;, confusion_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;148&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/nvyGb/btrE75oyofE/TBFjnNsxq97bk2fSVvtL40/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/nvyGb/btrE75oyofE/TBFjnNsxq97bk2fSVvtL40/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/nvyGb/btrE75oyofE/TBFjnNsxq97bk2fSVvtL40/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FnvyGb%2FbtrE75oyofE%2FTBFjnNsxq97bk2fSVvtL40%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;148&quot; height=&quot;57&quot; data-origin-width=&quot;148&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;background-color: #ffffff; color: #333333; font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상&amp;nbsp;333명,&amp;nbsp;환자&amp;nbsp;179명을&amp;nbsp;정확하게&amp;nbsp;분류했다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655640981830&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.9649122807017544&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655640981830&quot; class=&quot;routeros&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;confusion_test=confusion_matrix(y_test, pred_test)
print(&quot;테스트데이터 오차행렬:\n&quot;, confusion_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;160&quot; data-origin-height=&quot;57&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/pTn6c/btrFaN1P9vM/uATNoYnxHH4c7kTtJxdI2k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/pTn6c/btrFaN1P9vM/uATNoYnxHH4c7kTtJxdI2k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/pTn6c/btrFaN1P9vM/uATNoYnxHH4c7kTtJxdI2k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FpTn6c%2FbtrFaN1P9vM%2FuATNoYnxHH4c7kTtJxdI2k%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;160&quot; height=&quot;57&quot; data-origin-width=&quot;160&quot; data-origin-height=&quot;57&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 5명이 오분류, 환자(1) 중 1명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 2. 회귀 (Regression)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2.&amp;nbsp;&lt;/b&gt;&lt;b&gt;AdaBoosting&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655641133857&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.ensemble import AdaBoostRegressor

model = AdaBoostRegressor(random_state=0, n_estimators=100)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.4353130085971758&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655641133859&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.43568387094087124&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;269&quot; data-origin-height=&quot;43&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cnuLwJ/btrE9QxGth1/1O5FY4N1LKKO5pTOD3bjR1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cnuLwJ/btrE9QxGth1/1O5FY4N1LKKO5pTOD3bjR1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cnuLwJ/btrE9QxGth1/1O5FY4N1LKKO5pTOD3bjR1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcnuLwJ%2FbtrE9QxGth1%2F1O5FY4N1LKKO5pTOD3bjR1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;269&quot; height=&quot;43&quot; data-origin-width=&quot;269&quot; data-origin-height=&quot;43&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3. GradientBoosting&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655641133860&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(random_state=0)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.6178724780500952&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;3-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655641133861&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.5974112241813845&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655641241310&quot; class=&quot;maxima&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;280&quot; data-origin-height=&quot;43&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cnd567/btrFdHs0PIJ/eTs7sExklNyEEioxPrVCzK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cnd567/btrFdHs0PIJ/eTs7sExklNyEEioxPrVCzK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cnd567/btrFdHs0PIJ/eTs7sExklNyEEioxPrVCzK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcnd567%2FbtrFdHs0PIJ%2FeTs7sExklNyEEioxPrVCzK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;280&quot; height=&quot;43&quot; data-origin-width=&quot;280&quot; data-origin-height=&quot;43&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/325</guid>
      <comments>https://eunki.tistory.com/325#entry325comment</comments>
      <pubDate>Sun, 19 Jun 2022 21:16:40 +0900</pubDate>
    </item>
    <item>
      <title>[빅분기 실기] 앙상블 배깅 (Bagging)</title>
      <link>https://eunki.tistory.com/324</link>
      <description>&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;앙상블&amp;nbsp;배깅&amp;nbsp;(Bagging)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;학습 데이터에 대해 여러 개의 부트스트랩 (Bootstrap) 데이터를 생성하고&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;각 데이터에 하나 또는 여러 알고리즘을 학습시킨 후&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;산출된 결과 중 투표 (Voting) 방식에 의해 최종 결과를 선정하는 알고리즘&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;[주요 하이퍼파라미터]&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;- n_estimators : 부트스트랩 데이터셋 수&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style7&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 1. 분류 (Classification)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655368684108&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import pandas as pd

# 암 예측 분류 데이터
data=pd.read_csv('breast-cancer-wisconsin.csv', encoding='utf-8')

X=data[data.columns[1:10]]
y=data[[&quot;Class&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369148159&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, stratify=y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369378258&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train) 

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369652753&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier

model = BaggingClassifier(base_estimator=SVC(), n_estimators=10, random_state=0)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.982421875&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;10개의&amp;nbsp;데이터셋에&amp;nbsp;SVC를&amp;nbsp;훈련시킨&amp;nbsp;10개&amp;nbsp;모델&amp;nbsp;결과를&amp;nbsp;종합한다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369834721&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import confusion_matrix

confusion_train=confusion_matrix(y_train, pred_train)
print(&quot;훈련데이터 오차행렬:\n&quot;, confusion_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;149&quot; data-origin-height=&quot;59&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/diOgM5/btrE76HIOKz/y0aA3G1AF0cJOVwTGxFhKK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/diOgM5/btrE76HIOKz/y0aA3G1AF0cJOVwTGxFhKK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/diOgM5/btrE76HIOKz/y0aA3G1AF0cJOVwTGxFhKK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdiOgM5%2FbtrE76HIOKz%2Fy0aA3G1AF0cJOVwTGxFhKK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;149&quot; height=&quot;59&quot; data-origin-width=&quot;149&quot; data-origin-height=&quot;59&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 4명이 오분류, 환자(1) 중 5명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② 분류예측 레포트 (classification report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655369953277&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.metrics import classification_report

cfreport_train=classification_report(y_train, pred_train)
print(&quot;분류예측 레포트:\n&quot;, cfreport_train)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;392&quot; data-origin-height=&quot;161&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cJ4HAr/btrFaOsPhBz/fTCOjEW1iyfWd3LrzpEDc1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cJ4HAr/btrFaOsPhBz/fTCOjEW1iyfWd3LrzpEDc1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cJ4HAr/btrFaOsPhBz/fTCOjEW1iyfWd3LrzpEDc1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcJ4HAr%2FbtrFaOsPhBz%2FfTCOjEW1iyfWd3LrzpEDc1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;392&quot; height=&quot;161&quot; data-origin-width=&quot;392&quot; data-origin-height=&quot;161&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 0.98, 재현율(recall) = 0.98&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370087488&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.9590643274853801&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① 오차행렬 (confusion matrix)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370110454&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;confusion_test=confusion_matrix(y_test, pred_test)
print(&quot;테스트데이터 오차행렬:\n&quot;, confusion_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;164&quot; data-origin-height=&quot;54&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/qeSpK/btrE7wZQrSC/ThQkJWludhboM7Z7Sb5oB1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/qeSpK/btrE7wZQrSC/ThQkJWludhboM7Z7Sb5oB1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/qeSpK/btrE7wZQrSC/ThQkJWludhboM7Z7Sb5oB1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FqeSpK%2FbtrE7wZQrSC%2FThQkJWludhboM7Z7Sb5oB1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;164&quot; height=&quot;54&quot; data-origin-width=&quot;164&quot; data-origin-height=&quot;54&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정상(0) 중 5명이 오분류, 환자(1) 중 2명이 오분류되었다.&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;② 분류예측 레포트 (classification report)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655370139549&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;cfreport_test=classification_report(y_test, pred_test)
print(&quot;분류예측 레포트:\n&quot;, cfreport_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;388&quot; data-origin-height=&quot;161&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/c6Zgnm/btrFcitp8fC/HWCehB80Xo8LoCRTDu82Q1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/c6Zgnm/btrFcitp8fC/HWCehB80Xo8LoCRTDu82Q1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/c6Zgnm/btrFcitp8fC/HWCehB80Xo8LoCRTDu82Q1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc6Zgnm%2FbtrFcitp8fC%2FHWCehB80Xo8LoCRTDu82Q1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;388&quot; height=&quot;161&quot; data-origin-width=&quot;388&quot; data-origin-height=&quot;161&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;정밀도(precision) = 0.95, 재현율(recall) = 0.96&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style3&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;Part 2. 회귀 (Regression)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1. 분석 데이터 준비&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924203&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;# 주택 가격 데이터
data2=pd.read_csv('house_price.csv', encoding='utf-8')

X=data2[data2.columns[1:5]]
y=data2[[&quot;house_value&quot;]]&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-2. train-test 데이터셋 나누기&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924204&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=42)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;1-3. Min-Max 정규화&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924205&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler()
scaler.fit(X_train)

X_scaled_train=scaler.transform(X_train)
X_scaled_test=scaler.transform(X_test)&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2. 기본모델 적용&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-1. 훈련 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924206&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import BaggingRegressor

model = BaggingRegressor(base_estimator=KNeighborsRegressor(), n_estimators=10, random_state=0)
model.fit(X_scaled_train, y_train)

pred_train=model.predict(X_scaled_train)
model.score(X_scaled_train, y_train)  # 0.6928982134381334&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size18&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;2-2. 테스트 데이터&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;pred_test=model.predict(X_scaled_test)
model.score(X_scaled_test, y_test)  # 0.5612676280708411&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;span style=&quot;font-family: 'Noto Sans Demilight', 'Noto Sans KR';&quot;&gt;&lt;b&gt;① RMSE (Root Mean Squared Error)&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;pre id=&quot;code_1655372924209&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;import numpy as np
from sklearn.metrics import mean_squared_error 

MSE_train = mean_squared_error(y_train, pred_train)
MSE_test = mean_squared_error(y_test, pred_test)

print(&quot;훈련   데이터 RMSE:&quot;, np.sqrt(MSE_train))
print(&quot;테스트 데이터 RMSE:&quot;, np.sqrt(MSE_test))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignLeft&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;275&quot; data-origin-height=&quot;43&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/coqx4l/btrFdGAQawr/vKohdqcxYf7YhRhVscZtkK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/coqx4l/btrFdGAQawr/vKohdqcxYf7YhRhVscZtkK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/coqx4l/btrFdGAQawr/vKohdqcxYf7YhRhVscZtkK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcoqx4l%2FbtrFdGAQawr%2FvKohdqcxYf7YhRhVscZtkK%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;275&quot; height=&quot;43&quot; data-origin-width=&quot;275&quot; data-origin-height=&quot;43&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;</description>
      <category>데이터 분석/빅데이터 분석 기사</category>
      <category>Python</category>
      <category>데이터 분석</category>
      <category>빅데이터 분석 기사</category>
      <category>빅분기</category>
      <category>실기</category>
      <author>eunki</author>
      <guid isPermaLink="true">https://eunki.tistory.com/324</guid>
      <comments>https://eunki.tistory.com/324#entry324comment</comments>
      <pubDate>Sun, 19 Jun 2022 20:34:42 +0900</pubDate>
    </item>
  </channel>
</rss>