본문 바로가기

카테고리 없음

MySql 의 Character Set에 대하여

 

 

 

언제나 한글 깨짐 문제로 인한 캐릭터  확인  변경 작업이 많다.

일단 DB 저장된 한글이 클라이언트에서  보이는 경우는 클라이언트의 인코딩 값을 변경해서 조치를   있다.

그러나 저장 당시의 인코딩 값이 서로 다른 한글이 들어가 있을 경우는 어떤 한글은 정상적으로 보이지만, 어떤 한글은 깨져서 보이는 경우가 발생하게 된다.

MySQL 기본으로 서버 캐릭터 셋이 latin1으로 설정 되어 있어서 DB 생성  캐릭터  값을 지정해주지 않으면 latin1으로 생성이 된다.

 

Character set ?

사용하는 언어를 표현하기 위한 문자들의 집합을 의미. 한글을 표현하기 위하여 사용하는 캐릭터 셋에는 여러가지가 존재 한다.

Encoding ?

Character Set을 컴퓨터가 이해할 수 있는 바이트와 매핑해 주는 것.

 

0. 사용할  있는 캐릭터  종류 확인

 

mysql> show character set ;

+----------+---------------------------------+---------------------+--------+

| Charset  | Description                     | Default collation   | Maxlen |

+----------+---------------------------------+---------------------+--------+

| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |

| dec8     | DEC West European               | dec8_swedish_ci     |      1 |

| cp850    | DOS West European               | cp850_general_ci    |      1 |

| hp8      | HP West European                | hp8_english_ci      |      1 |

| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |

| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |

| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |

| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |

| ascii    | US ASCII                        | ascii_general_ci    |      1 |

| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |

| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |

| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |

| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |

| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |

| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |

| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |

| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |

| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |

| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |

| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |

| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |

| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |

| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |

| cp866    | DOS Russian                     | cp866_general_ci    |      1 |

| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |

| macce    | Mac Central European            | macce_general_ci    |      1 |

| macroman | Mac West European               | macroman_general_ci |      1 |

| cp852    | DOS Central European            | cp852_general_ci    |      1 |

| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |

| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |

| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |

| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |

| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |

| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |

| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |

| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |

| binary   | Binary pseudo charset           | binary              |      1 |

| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |

| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |

| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |

| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |

+----------+---------------------------------+---------------------+--------+

 

mysql> show character set where description like '%Korean%' ;

+---------+---------------+-------------------+--------+

| Charset | Description   | Default collation | Maxlen |

+---------+---------------+-------------------+--------+

| euckr   | EUC-KR Korean | euckr_korean_ci   |      2 |

+---------+---------------+-------------------+--------+

 

 

위에서 보면 Maxlen 칼럼이 보이는데 영어 이외의 언어를 저장하기 위해서는 2 이상의 값을 가진 언어 셋을 가져야만 한글이 DB 저장이 된다.

euckr 한글을 2바이트로 저장하고, utf8 한글을 3바이트로 저장한다. (공백이나 영문은 1바이트로 처리)

그래서 euckr 제한  한글만 표현이 가능하지만, 한글 1글자당 2바이트를 사용하므로, 제한된 한글만 사용하는 환경에서는 euckr 설정 하는 것이 유리

 

1. 간단하게 캐릭터 셋을 확인 하는 방법

 

mysql> status

--------------

mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper

 

Connection id:                3836

Current database:

Current user:                root@localhost

SSL:                        Not in use

Current pager:                stdout

Using outfile:                ''

Using delimiter:        ;

Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Protocol version:        10

Connection:                Localhost via UNIX socket

Server characterset:        latin1

Db     characterset:        latin1

Client characterset:        utf8

Conn.  characterset:        utf8

UNIX socket:                /tmp/mysql.sock

Uptime:                        1 hour 11 min 12 sec

 

Threads: 3  Questions: 33930  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.942

 

 

Servere characterset : 데이터베이스 생성  기본 값을 지정해 주지 않을 경우,  값으로 데이터 베이스 캐릭터 셋이 설정 .

DB      characterset : 테이블 생성  기본 값을 지정해 주지 않을 경우,  값으로 테이블의 캐릭터 셋이 설정 .

Client  characterset : 클라이언트 인코딩 설정 .

 

 

2. 데이터 베이스의 디폴트 캐릭터  설정

 

생성  설정해 주는 방법

CREATE DATABASE `utf8db` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;

CREATE DATABASE `euekrdb` DEFAULT CHARACTER SET euckr COLLATE euckr_korean_ci ;

 

Alter 명령으로 변경 하는 방법

ALTER  DATABASE  db명 DEFAULT CHARACTER SET utf8 ;

 

데이터베이스 캐릭터  확인

SELECT schema_name , default_character_set_name

FROM information_schema.schemata ;

 

schema_name        |default_character_set_name |

-------------------|---------------------------|

information_schema |utf8                       |

euekrdb            |euckr                      |

mysql              |latin1                     |

performance_schema |utf8                       |

sys                |utf8                       |

test               |latin1                     |

utf8db             |utf8                       |

 

테이블의 캐릭터  설정  확인

CREATE TABLE 테이블명 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;

 

SELECT table_schema , table_name , table_collation

FROM information_schema.tables

WHERE table_schema = 'information_schema' AND table_name = 'PROCESSLIST'

 

table_schema       |table_name  |table_collation |

-------------------|------------|----------------|

information_schema |PROCESSLIST |utf8_general_ci | 

 

 

3. 세션레벨에서 인코딩  변경

 

임시적인 변경

 

set names 케릭터셋 ;

 

mysql> status

--------------

mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper

 

Connection id:                3836

Current database:

Current user:                root@localhost

SSL:                        Not in use

Current pager:                stdout

Using outfile:                ''

Using delimiter:        ;

Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Protocol version:        10

Connection:                Localhost via UNIX socket

Server characterset:        latin1

Db     characterset:        latin1

Client characterset:        utf8

Conn.  characterset:        utf8

UNIX socket:                /tmp/mysql.sock

Uptime:                        1 hour 11 min 12 sec

 

Threads: 3  Questions: 33930  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.942

--------------

 

mysql> set names latin1 ;

Query OK, 0 rows affected (0.00 sec)

 

mysql> status

--------------

mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper

 

Connection id:                3836

Current database:

Current user:                root@localhost

SSL:                        Not in use

Current pager:                stdout

Using outfile:                ''

Using delimiter:        ;

Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Protocol version:        10

Connection:                Localhost via UNIX socket

Server characterset:        latin1

Db     characterset:        latin1

Client characterset:        latin1

Conn.  characterset:        latin1

UNIX socket:                /tmp/mysql.sock

Uptime:                        1 hour 20 min 3 sec

 

Threads: 3  Questions: 38153  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.943

--------------

 

mysql> set names utf8 ;

Query OK, 0 rows affected (0.00 sec)

 

mysql> status

--------------

mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper

 

Connection id:                3836

Current database:

Current user:                root@localhost

SSL:                        Not in use

Current pager:                stdout

Using outfile:                ''

Using delimiter:        ;

Server version:                5.7.14-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Protocol version:        10

Connection:                Localhost via UNIX socket

Server characterset:        latin1

Db     characterset:        latin1

Client characterset:        utf8

Conn.  characterset:        utf8

UNIX socket:                /tmp/mysql.sock

Uptime:                        1 hour 25 min 38 sec

 

Threads: 3  Questions: 40805  Slow queries: 0  Opens: 120  Flush tables: 1  Open tables: 113  Queries per second avg: 7.941

--------------

 

 

영구적인 변경

 

/etc/my.cnf 파일에 캐릭터 셋을 명시  MySQL  기동.

 

) utf8 변경 

 

[client]

default-character-set = utf8

 

[mysqld]

skip-character-set-client-handshake

default-character-set = utf8

character-set-server = utf8

collation-server = utf8_general_ci

init-connect = SET NAMES utf8

 

[mysqldump]

default-character-set = utf8

 

[mysql]

default-character-set = utf8

 

 

4. 캐릭터  테스트

 

# Server   characterSet : latin1

# Database characterset : euckr

# Client   인코딩 값은 변경해 가면서 테스트

 

# Client 인코딩값을 데이터베이스 캐릭터 셋과 동일한 값으로 변경

mysql> set names euckr ;

mysql> status ;

 

Current database:        euekrdb

Current user:                root@localhost

Server characterset:        latin1

Db     characterset:        euckr

Client characterset:        euckr

Conn.  characterset:        euckr

 

#테스트 테이블 생성

mysql> create table et1  (id int , name varchar(10)) ;

mysql> create table eet1 (id int , name varchar(10)) DEFAULT CHARSET=euckr  ;

mysql> create table eut1 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;

mysql> commit ;

mysql> select table_name , table_collation from information_schema.tables where table_name like 'e%1' ;

+------------+-----------------+

| table_name | table_collation |

+------------+-----------------+

| eet1       | euckr_korean_ci |

| et1        | euckr_korean_ci |

| eut1       | utf8_general_ci |

+------------+-----------------+

캐릭터 셋을 명시적으로 지정해 주지 않았던 et1 테이블은 데이터베이스 기본 캐릭터 셋으로 설정 .

 

#데이터 Insert

mysql> insert into et1 values (1 , '1111' ) ;

mysql> commit ;

mysql> insert into eet1 values (1 , '1111' ) ;

mysql> commit ;

mysql> insert into eut1 values (1 , '1111' ) ;

mysql> commit;

 

mysql> select * from et1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

+------+--------+

 

mysql> select * from eet1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

+------+--------+

 

 

mysql> select * from eut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

+------+--------+

 

# 클라이언트 인코딩 변경

mysql> set names utf8 ;

mysql> status ;

 

Current database:        euekrdb

Server characterset:        latin1

Db     characterset:        euckr

Client characterset:        utf8

Conn.  characterset:        utf8

 

mysql> select * from et1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 11??11  |

+------+---------+

 

mysql> select * from eet1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 11??11  |

+------+---------+

 

mysql> select * from eut1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 11??11  |

+------+---------+

 

# 새로운 데이터 Insert

mysql> insert into et1 values (11 , '1111' ) ;

ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1

 

mysql> insert into eet1 values (11 , '1111' ) ;

ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1

 

mysql> insert into eut1 values (11 , '1111' ) ;

mysql> commit;

 

mysql> select * from eut1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 11??11  |

|   11 | 1111  |

+------+---------+

 

# 다시 클라이언트 인코딩 변경

mysql> set names euckr ;

mysql> status ;

 

Current database:        euekrdb

Server characterset:        latin1

Db     characterset:        euckr

Client characterset:        euckr

Conn.  characterset:        euckr

 

mysql> select * from eut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

|   11 | 11?11  |

+------+--------+

 

 

# Server   characterSet : latin1

# Database characterset : utf8

# Client  인코딩 값은 변경해 가면서 테스트

 

# Client 인코딩값을 데이터베이스 캐릭터 셋과 동일한 값으로 변경

mysql> set names utf8;

mysql> status

 

Current database:        utf8db

Server characterset:        latin1

Db     characterset:        utf8

Client characterset:        utf8

Conn.  characterset:        utf8

 

#테스트 테이블 생성

mysql> create table ut1  (id int , name varchar(10)) ;

mysql> create table uet1 (id int , name varchar(10)) DEFAULT CHARSET=euckr  ;

mysql> create table uut1 (id int , name varchar(10)) DEFAULT CHARSET=utf8 ;

mysql> commit ;

mysql> select table_name , table_collation from information_schema.tables where table_name like 'u%1' ;

+------------+-----------------+

| table_name | table_collation |

+------------+-----------------+

| uet1       | euckr_korean_ci |

| ut1        | utf8_general_ci |

| uut1       | utf8_general_ci |

+------------+-----------------+

 

#데이터 Insert

mysql> insert into ut1 values (1 , '1111' ) ;

mysql> commit ;

mysql> insert into uet1 values (1 , '1111' ) ;

ERROR 1366 (HY000): Incorrect string value: '\xC8\xAB11' for column 'name' at row 1

mysql> insert into uut1 values (1 , '1111' ) ;

mysql> commit;

 

mysql> select * from ut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

+------+--------+

 

mysql> select * from uet1 ;

Empty set (0.00 sec)

 

mysql> select * from uut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 1111 |

+------+--------+

 

# 클라이언트 인코딩 변경

mysql> set names euckr ;

mysql> status

 

Current database:        utf8db

Server characterset:        latin1

Db     characterset:        utf8

Client characterset:        euckr

Conn.  characterset:        euckr

 

# 새로운 데이터 Insert

mysql> insert into ut1 values (11 , '1111' ) ;

mysql> commit ;

mysql> insert into uet1 values (11 , '1111' ) ;

mysql> commit ;

mysql> insert into uut1 values (11 , '1111' ) ;

mysql> commit ;

 

mysql> select * from ut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 11?11  |

|   11 | 1111 |

+------+--------+

 

mysql> select * from uet1 ;

+------+--------+

| id   | name   |

+------+--------+

|   11 | 1111 |

+------+--------+

 

mysql> select * from uut1 ;

+------+--------+

| id   | name   |

+------+--------+

|    1 | 11?11  |

|   11 | 1111 |

+------+--------+

 

# 다시 클라이언트 인코딩 변경

mysql> set names utf8 ;

mysql> status

 

Current database:        utf8db

Server characterset:        latin1

Db     characterset:        utf8

Client characterset:        utf8

Conn.  characterset:        utf8

 

mysql> select * from ut1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 1111  |

|   11 | 11??11  |

+------+---------+

 

mysql> select * from uet1 ;

+------+---------+

| id   | name    |

+------+---------+

|   11 | 11??11  |

+------+---------+

 

mysql> select * from uet1 ;

+------+---------+

| id   | name    |

+------+---------+

|   11 | 11??11  |

+------+---------+

 

mysql> select * from uut1 ;

+------+---------+

| id   | name    |

+------+---------+

|    1 | 1111  |

|   11 | 11??11  |

+------+---------+

 

utf8상태는 한글을 3바이트로 저장을 한다. 그래서 utf8 테이블에 euckr 한글을 저장하게 되면 2바이트로 저장하게 되므로, 저장이 된다.

그러나 euckr 상태인 테이블에 utf8 데이터를 저장하려고 하면, 2바이트로 한글을 저장해야 하는데 3바이트로 저장을 시도하게 되므로 에러가 발생 한다.

 

테이블의 캐릭터 셋과 상관없이 한글을 저장할 당시의 인코딩 값과 일치하는 한글만 제대로 표현이 된다.

그래서 테이블 별로 캐릭터 셋을 지정해   있지만, 데이터 베이스 레벨에서 캐릭터 셋을 통일하여서 한글을 사용하면, 한글 깨짐 문제를 미리 방지할  있다.



출처: https://bstar36.tistory.com/307