Categories Machine Learning

A Beginner’s Guide to ClickHouse Database

[ad_1]

A Beginners Guide to ClickHouse Database
Image by Author | Canva Pro

 

In this tutorial, we will explore ClickHouse, which is a powerful and fast column-store database management system. The step-by-step guide will cover installing and setting up ClickHouse using the CURL command, creating databases, running SQL queries, and interacting with the ClickHouse database using both the native client and the Python client.

 

What is ClickHouse?

 

ClickHouse is an open-source column-oriented database management system that enables users to generate analytical reports using SQL queries in real-time. Designed for online analytical processing (OLAP), ClickHouse excels at handling large volumes of data, including streaming real-time data. Its columnar storage format allows for efficient data compression and fast query processing, making it particularly well-suited for data analytics.

In addition to its open-source version, ClickHouse offers a managed service known as ClickHouse Cloud. The cloud simplifies operational overhead, reduces costs, and provides a serverless experience with features like autoscaling, tenant isolation, and security measures.

 

Installing ClickHouse Server

 

If you are using Windows, you need to run these commands in WSL; for Linux and MacOS, you just have to type the following command in the terminal.

  1. Create the `clickhouse` folder and change the directory.
  2. Download the ClickHouse using the `curl` command.
$ mkdir ./clickhouse
$ cd ./clickhouse
$ curl https://clickhouse.com | sh

 

Output:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  113M  100  113M    0     0  2363k      0  0:00:49  0:00:49 --:--:-- 2369k

Successfully downloaded the ClickHouse binary, you can run it as:
    ./clickhouse

You can also install it:
    sudo ./clickhouse install

 

Running the server will first install the ClickHouse and then run the server, which can take at least 5 minutes. 

 

That’s it. Our clickhouse server is running and can be accessed with any supported client, even with DBeaver.

 

ClickHouse Native Client

 

In this section, we will use the native ClickHouse client to create the database and table, insert values into the new table, and run some SQL queries. This can all be done through your preferred terminal.

Please launch a new terminal window and type the following command to connect the ClickHouse client with the server.

 

In return, we will see the following messaging in our terminal window. 

ClickHouse client version 24.9.1.1595 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.9.1.

AbidTUFF15. :)

 

Creating a new dataBase

To create a new database, use the standard SQL command `CREATE DATABASE` followed by the database name, and press Enter. ClickHouse will respond with a success or error message if the command needs to be revised.

CREATE DATABASE Ecommerce

 

Creating a new table

To create a new table in the “Ecommerce” database, we need to provide the table name, column names with data types, engine type, and primary key.

CREATE TABLE Ecommerce.ElectronicSales
(
    customer_id UInt32,
    product_description String,
    sale_timestamp DateTime,
    sale_amount Float32
)
ENGINE = MergeTree
PRIMARY KEY (customer_id, sale_timestamp);

 

Inserting the values

To insert multiple values into the “ElectronicSales” table, we can write a multi-line SQL command in any editor and then copy and paste it into the ClickHouse client. This approach allows us to write complex SQL queries with ease and execute them in the ClickHouse client.

INSERT INTO Ecommerce.ElectronicSales (customer_id, product_description, sale_timestamp, sale_amount) VALUES
    (101, 'Smartphone XYZ', now(), -1.0),
    (102, 'Laptop ABC', yesterday(), 1.41421),
    (102, 'Wireless Headphones', today(), 2.718),
    (101, '4K Television', now() + 5, 3.14159);

 

We got a successful message that four rows have been added. 

Ok.

4 rows in set. Elapsed: 0.005 sec.

 

Running SQL query

To retrieve all the rows from the “ElectronicSales” table, we will execute a simple SQL query and order the results by the “sale_timestamp” column.

SELECT *
FROM Ecommerce.ElectronicSales
ORDER BY sale_timestamp;

 

A Beginners Guide to ClickHouse DatabaseA Beginners Guide to ClickHouse Database

 

It is both easy and fast. The query took only 0.002 seconds.

 

Loading external data

We can load the external or local data and run SQL queries using the ClickHouse table functions. In our case, we are loading the CSV file from GitHub repository. 

SELECT *
FROM url('https://raw.githubusercontent.com/kingabzpro/Data-Pipeline-with-Prefect/main/Online%20Sales%20Data.csv', 'CSV')
LIMIT 3

 

A Beginners Guide to ClickHouse DatabaseA Beginners Guide to ClickHouse Database

 

Inserting values into the table using the external data

To insert the CSV file into the table, we first need to create a new table with similar column names and data types. Don’t forget to write the engine type and primary key.

CREATE TABLE Ecommerce.GeneralSale
  (
     transactionid   INT,
     date            DATE,
     productcategory STRING,
     productname     STRING,
     unitssold       INT,
     unitprice       FLOAT,
     totalrevenue    FLOAT,
     region          STRING,
     paymentmethod   STRING
  ) 
ENGINE = MergeTree
PRIMARY KEY (TransactionID, Date);

 

After that, run the `INSERT INTO` command and provide the GitHub CSV file link as shown below.

INSERT INTO Ecommerce.GeneralSale
SELECT *
FROM   Url( 'https://raw.githubusercontent.com/kingabzpro/Data-Pipeline-with-Prefect/main/Online%20Sales%20Data.csv', 'CSV' ) 
SETTINGS input_format_allow_errors_num = 25000;

 

Just run the simple SQL query to check whether the values have been successfully added. 

SELECT *
FROM Ecommerce.GeneralSale
LIMIT 3

 

A Beginners Guide to ClickHouse DatabaseA Beginners Guide to ClickHouse Database

 

ClickHouse Python Integration

 

To access the ClickHouse server using Python, we first have to install the ClickHouse Connect Python package, which will let us run SQL queries on the ClickHouse database using Python. 

%pip install clickhouse-connect

 

We haven’t set the server username, password, or changed the port number. So, to connect to the default server, we just need to leave the `clickhouse_connect.get_client()` function as default. To verify if our database is successfully connected, we will run a sample query to view the top three rows of the “GeneralSale” table.

import clickhouse_connect

client = clickhouse_connect.get_client()
print(client.query('SELECT * FROM Ecommerce.GeneralSale LIMIT 3').result_set)

 

Output:

[(10001, datetime.date(2024, 1, 1), 'Electronics', 'iPhone 14 Pro', 2, 999.989990234375, 1999.97998046875, 'North America', 'Credit Card'), (10002, datetime.date(2024, 1, 2), 'Home Appliances', 'Dyson V11 Vacuum', 1, 499.989990234375, 499.989990234375, 'Europe', 'PayPal'), (10003, datetime.date(2024, 1, 3), 'Clothing', "Levi's 501 Jeans", 3, 69.98999786376953, 209.97000122070312, 'Asia', 'Debit Card')]

 

The `command` function lets us run SQL queries without returning anything. We have used it to create a new table called “SalesSample”.

client.command ("""CREATE TABLE SalesSample (
    TransactionID UInt16,
    ProductName String,
    Quantity UInt8,
    TotalAmount Float32
) ENGINE = Memory;
"""
)

 

We will now use the `insert` command to add values to the “SalesSample” table. 

data = [
    [1, "Laptop", 2, 1500.00],
    [2, "Smartphone", 5, 2500.00],
    [3, "Headphones", 10, 500.00],
]



client.insert('SalesSample', data)

 

To verify if the values have been successfully added to the “ElectronicSales” table, you can run a simple SQL query and view the result using the `.result_set`.

print(client.query('SELECT * FROM SalesSample LIMIT 3').result_set)

 

Output:

[(1, 'Laptop', 2, 1500.0), (2, 'Smartphone', 5, 2500.0), (3, 'Headphones', 10, 500.0)

 

Conclusion

 

ClickHouse is quite similar to DuckDB (Why is DuckDB Getting Popular? – KDnuggets) in terms of performance and functionality. ClickHouse’s popularity among modern web applications is largely attributed to its simplicity in installation and API usage. Developers appreciate the ease of setting up and integrating ClickHouse into their projects, allowing for a smoother and more efficient development process.

In this tutorial, we have learned about ClickHouse features and how to start the ClickHouse server using a few bash commands. We have also learned to use native and Python clients to create databases and tables, add values, and run simple SQL queries.

Do let me know if you are interested in learning how to build a project around ClickHouse or if you would like me to write a more advanced tutorial about ClickHouse.
 
 

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Our Top 3 Course Recommendations

1. Google Cybersecurity Certificate – Get on the fast track to a career in cybersecurity.

2. Google Data Analytics Professional Certificate – Up your data analytics game

3. Google IT Support Professional Certificate – Support your organization in IT

[ad_2]

More From Author

You May Also Like