Learn

[데이터처리와분석] 데이터 준비하기

부루기 2024. 6. 12. 15:19

728x90

ML_Intro_Regression

개요¶

1. ML을 위한 첫 데이터를 다운받고 데이터에 대해서 확인해보는 과정¶

2. 사용 메소드(전처리)¶

read_csv(파일,인코딩)
train_set.copy()
df_copy.corr()
train_x = train_set.drop(["Salary"], axis=1), train_y = df_copy["Salary"]
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

3. 사용 메소드(학습)¶

LinearRegression()
lin_reg.fit(train_x, train_y)
lin_reg.predict(train_x)

4. 사용 메소드(Plot)¶

df_copy.plot.scatter(x='YearsExperience', y='Salary')
plt.plot(train_x,train_y_hat, color='red', linewidth=2)

In [ ]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [ ]:

#sns.set()
%matplotlib inline

df = pd.read_csv("SalaryData.csv")

In [ ]:

train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

In [ ]:

df_copy = train_set.copy()

In [ ]:

df_copy.plot.scatter(x='YearsExperience', y='Salary')
# plot을 이용해서 .plot.scatter(x,y)를 설정해준 것

Out[ ]:

<Axes: xlabel='YearsExperience', ylabel='Salary'>

No description has been provided for this image

In [ ]:

df_copy.corr()
# .corr을 통해서 각기 변수에 대해서 correlation을 구한 것이다.

Out[ ]:

	YearsExperience	Salary
YearsExperience	1.00000	0.98211
Salary	0.98211	1.00000

In [ ]:

train_x = train_set.drop(["Salary"], axis=1)
train_y = df_copy["Salary"]
# 학습하기 위해서 데이터를 가져오기 (x는 행으로 가져와야함)

In [ ]:

lin_reg = LinearRegression()
# LinearRegression을 통해서 데이터만 넣으면 바로 값이 나온다.
lin_reg.fit(train_x, train_y)
# .fit으로 학습하기

Out[ ]:

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Linearregression의 값을 확인해 볼 수 있는 변수

coef_
intercept_

In [ ]:

print("Coefficients: ", lin_reg.coef_)
# 계수
print("Intercept: ", lin_reg.intercept_)
# 그 값

Coefficients:  [9423.81532303]
Intercept:  25321.583011776813

In [ ]:

plt.scatter(train_x, train_y,  color='blue')
# plot.scatter를 통해서 값을 넣기
train_y_hat = lin_reg.predict(train_x)
plt.plot(train_x,train_y_hat, color='red', linewidth=2)

Out[ ]:

[<matplotlib.lines.Line2D at 0x152870570d0>]

728x90

저작자표시 비영리 변경금지

'Learn' 카테고리의 다른 글

[데이터처리와 분석] K-Mean 실습 및 확인 (0)	2024.06.13
[컴퓨터비전] R-CNN (0)	2024.06.13
[머신러닝] Optimizer(SGD+M, AdaGrad, RMSProp, Adam, AdamW) (1)	2024.06.12
[머신러닝] Normalization 종류 (0)	2024.06.12
[머신러닝] Momentum & Nesterov (2)	2024.06.11

현재글[데이터처리와분석] 데이터 준비하기

초보개발자의 성장블로그

이제 막 언어를 배운 개발자의 성장 일기장 내 나름대로의 경험창고

독후감, Git, 딥러닝기초, 리엑트, 찰스펫졸드, 프로그래밍언어책, 파이썬, 웹크롤링, 클론코딩, 코딩진로, 노마드코더, Core C programming, 텐서플로어, 정리노트, 허민석, 생활코딩, 영화앱, 딥러닝워크북, 자바스크립트, 초보개발자,

Today :
Yesterday :

초보개발자의 성장블로그

[데이터처리와분석] 데이터 준비하기

개요¶

1. ML을 위한 첫 데이터를 다운받고 데이터에 대해서 확인해보는 과정¶

2. 사용 메소드(전처리)¶

3. 사용 메소드(학습)¶

4. 사용 메소드(Plot)¶

'Learn' 카테고리의 다른 글

'Learn'의 다른글

티스토리툴바

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

[데이터처리와분석] 데이터 준비하기

개요¶

1. ML을 위한 첫 데이터를 다운받고 데이터에 대해서 확인해보는 과정¶

2. 사용 메소드(전처리)¶

3. 사용 메소드(학습)¶

4. 사용 메소드(Plot)¶

'Learn' 카테고리의 다른 글

'Learn'의 다른글

관련글

티스토리툴바