Mastering MySQL Data Types: CHAR vs. VARCHAR with a Dash of Matillion ETL

Hi, As I dive into the intricacies of MySQL databases and Matillion ETL, I'm seeking some enlightenment regarding the disparities between CHAR and VARCHAR. Specifically, I'm eager to understand how these choices might harmonize within the Matillion ETL environment. Could someone provide some guidance on this intricate dance?

 

Let's begin by sharing a CodeIgniter model snippet and MySQL table structure:

 

[code]class Customer_model extends CI_Model {

  public function add_customer($customer_id, $customer_name, $email) {

    $data = array(

      'customer_id' => $customer_id,

      'customer_name' => $customer_name,

      'email' => $email

    );

 

    $this->db->insert('customers', $data);

    return $this->db->insert_id();

  }

}

[/code]

 

[code]

CREATE TABLE customers (

  id INT AUTO_INCREMENT PRIMARY KEY,

  customer_id CHAR(8),

  customer_name VARCHAR(50),

  email VARCHAR(255)

);

[/code]

 

Question 1: CHAR vs. VARCHAR in MySQL

For the customer_id, I've used CHAR(8), and for customer_name, I've chosen VARCHAR(50). Can someone unravel the key distinctions between CHAR and VARCHAR in MySQL, specifically considering storage efficiency? How might these choices align with data transformation processes in Matillion ETL?

 

Question 2: Matillion ETL Integration

Considering the Matillion ETL tool, are there considerations or best practices when working with VARCHAR fields like customer_name or email in MySQL? How does the choice between CHAR and VARCHAR influence data transformations within Matillion ETL, and are there optimizations or configurations to enhance compatibility?

 

Question #3: ETL Efficiency with MySQL Data Types

In the context of data transfer between MySQL and Matillion ETL, does the decision between CHAR and VARCHAR affect the performance of data transformation and loading processes? Are there any special Matillion ETL capabilities or setups that work better with one sort of data than another?

 

Question 4: The Impact on Matillion ETL Job Design

Finally, when developing ETL tasks in Matillion, how do the MySQL data types used affect the design and efficiency of transformation components? Are there any disadvantages or advantages to utilizing CHAR or VARCHAR fields in MySQL when communicating with Matillion ETL job components?

 

Your wisdom on these questions will be a beacon as I navigate the MySQL landscape in tandem with Matillion ETL. Thanks a million for sharing your insights!

 

 

 

 

Hi @vish234al

Thank you for a really well-written question, I took this away and spoke to my colleagues about your questions on this subject and have a series of answers for you, of course, the community can also share their thoughts too!

Question 1:

CHAR fields are fixed-length; a CHAR(8) will always use 8 space units, and values shorter than this will be marked with a terminating character. VARCHAR stores the number of characters that are required, and no more. VARCHAR is always more space efficient. However, storing a variable number of characters comes at a price - if all fields in a record are fixed-length, then the DB can access the next record in storage by jumping a fixed distance ahead. This is very fast. The jump ahead for variable length fields must either be calculated each time, or for longer fields (MySQL docs suggest 768 characters) they are stored 'off page', ie. in a separate storage area, and need to be looked up when referenced. (The DB won't look them up unless it needs to, however.) This is slower for some operations, but since less data need to be read from disk, may actually be faster in some cases.

For things like customer IDs, which are short and usually a fixed length, CHAR makes sense. Transitioning to VARCHAR needs to be judged on a 'contents' basis. For things like customer names, which will not (usually) be needed in lookup operations and which can be quite variable in length, especially as part of the size of the complete record, VARCHAR makes sense. Which is 'faster' will also depend on fine details like database and driver version; the database default values are usually well-tuned for most use cases.

Matillion ETL uses Java 'String' types internally for all character-like types and usually works in 'push down' mode for transformations, handing these decisions off to the database. Your decision should be made based on the information to be stored in that column only.

Question 2:

As before; no. METL usually won't handle your data directly, and when it does (eg. in staging operations), it uses a generic string data type.

Question 3:

In contexts of data transfer, the most efficient storage type in terms of space will be optimal - whatever is most compact will transfer faster. METL does not benefit internally from CHAR v VARCHAR types.

Question 4:

There are no advantages either way - our transformation components work equally well on CHAR and VARCHAR. The SQL which is generated will be executed by the database engine, which may work more efficiently on certain types, as per the first answer.

I hope this helps, feel free to reach out with any other questions.

Kind regards, Joe

hi joe,

thank you so much for your help