In this example, I will join the Apache Hive table with the Amazon Redshift table and run the JOIN query to access both tables from Presto. Step 5: Run SQL query to join data between different systems like Redshift and HiveĪnother great use case of Presto is Data Federation. > FROM (SELECT eventid, sum(pricepaid) total_price > FROM (SELECT eventid, total_price, ntile(1000) over(order by total_price desc) as percentile Presto:public> - Find events in the 99.9 percentile in terms of all time gross sales. > ORDER BY total_quantity desc limit 10) Q, users > FROM (SELECT buyerid, sum(qtysold) total_quantity >SELECT firstname, lastname, total_quantity Query 20210810_185527_00212_krtkp, FINISHED, 1 nodeĠ:03 įollowing are some more complex queries you can run against sample data: presto:public> - Find top 10 buyers by quantity Salesid | listid | sellerid | buyerid | eventid | dateid | qtysold | pricepaid | commission | saletime Query 20210810_185448_00211_krtkp, FINISHED, 3 nodesįurther, you can explore tables as “sales” in the below example. presto> show tables from my_redshift.public I have chosen the schema “public” which is a part of the “dev” Redshift Database. Here, I have used sample data that comes with the Redshift Cluster setup. Query 20210810_173543_00209_krtkp, FINISHED, 2 nodesįurther, you can check all available schemas for your Amazon Redshift from Presto to work with. IN the below example you can see a new catalog for Redshift Database got initiated called “ my_redshift. $./presto-cli.jar -server -catalog bigquery -schema -user -password Step 4: Check for available datasets, schemas, and tables, etc and run SQL queries with Presto Client to access the Redshift databaseĪfter successfully database connection with Amazon Redshift, You can connect to Presto CLI and run the following queries and make sure that the Redshift catalog gets picked up and perform show schemas and show tables to understand available data. This is how my catalog properties look: my_redshift.properties: |Ĭonnection-url=jdbc:postgresql://.:5439/dev Create the file with the following contents, replacing the connection properties as appropriate for your setup: connection-password=secretĬonnection-url=jdbc:postgresql://:5439/database Step 3: Configure Presto Catalog for Amazon Redshift ConnectorĮssentially, to configure the Redshift connector, create a catalog properties file in, etc/catalog named, for example, redshift.properties, to mount the Redshift connector as the redshift catalog. If your Presto Compute Plane VPC and data sources are in a different VPC then you need to configure a VPC peering connection. Find the assigned Security Group and check its Inbound rules. In simple words, the Security Group settings of the Redshift database play a role of a firewall and prevent inbound database connections over port 5439. So even if you have created your Amazon Redshift cluster in a public VPC, the security group assigned to the target Redshift cluster can prevent inbound connections to the database cluster. You can skip this section if you want to use your existing Redshift cluster, just make sure your redshift cluster is accessible from Presto because AWS services are secure by default. Set up your own Presto cluster on Kubernetes using these instructions or you can use Ahana’s managed service for Presto.Ĭreate an Amazon Redshift cluster from AWS Console and make sure it’s up and running with the dataset and tables.īelow screen shows Amazon Redshift cluster – “ redshift-presto-demo”įurther, JDBC URL from Cluster is required to set up a redshift connector with Presto. Step 1: Setup a Presto Cluster With Kubernetes This can be used to join data between different systems like Redshift and Hive, or between two different Redshift clusters. Presto's Redshift connector allows querying the data stored in an external Amazon Redshift cluster. This tutorial will show how to run SQL queries with Presto (running on Kubernetes) on AWS Redshift. Presto has evolved into a unified SQL engine on top of cloud data lakes for both interactive queries as well as batch workloads with multiple data sources.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |