CodePaper
About
Services
BlogContact

Filed Under: Data Analysis

SQL Excel Automation every Analyst should know

Here is a quick review of some of the basic SQL commands that should be common if you are an analyst or work with data on a day to day basis.

By Alex Quasar
data-analytics

Here is a quick review of some of the basic SQL commands that should be common if you are an analyst or work with data on a day to day basis.

These examples use a simplified database for hypothetical corporation, "ABC Box Company". This cutting edge firm sells different boxes of various different shapes,sizes and colors. Their database includes a `CustomerTable` which contains some information typical to a customer. Inside the `customerTable` we have columns like

  • CustomerName

  • CustomerNumber

  • PhoneNumber

  • Address

  • City

Each `customerNumber` in this table is unique. (i.e each row has a unqiue customerNumber and there is no duplicates. )

Also there will be a hypothetical `revenueTable` that contains the following columns:

  • InvoiceDate

  • InvoiceID

  • Category

  • CustomerNumber

  • Amount

Where the last column `Amount` is the invoice transaction amount and `InvoiceID` is unique for each record. Okay, now that is out of the way, let's get into some basic commands:

SELECT

The select statement tells the database what columns from what table you want to return. The*keyword tells SQL to return all columns from the database.

Example:

SELECT * FROM CustomerTable

Example:

Specify what columns you want to return before theFROMkeyword. Return Customer Number, Customer Name and City:

SELECT CustomerNumberm CustomerName,City 
FROM CustomerTable 

SELECT DISTINCT

Select Distinct is used when you want to return only unique values. Example: See all the unique cities where customers are from:

SELECT DISTINCT City FROM CustomerTable
    

COMMENTS

Comments are notes or documentation within the SQL query that do not get executed. For demo purposes I will write a bunch of comments in the Queries below to help illustrate some of the examples. There are two ways to write comments in SQL. Single line comments and multiline comments

  • Single Line Comments denoted using: `--`

  • Multi Line Comments are denoted using: `/* */`

Example:

-- I am single line comment
         
/* 
  I am a 
  multi-line 
  comment
*/


TOP N Rows with Order By

TheTOPcommand selects the top number of rows you specify from the table. This is useful when you only want to return the top 10 rows of your data set (i.e the last 10 customer purchases). It is also useful when you have a large dataset and want to see what information and columns are inside to avoid unnecessarily loading the server. Specify the column and order you want to order by using theORDER BYcommand along with the optional parameters ascending (ASC) or descending (DESC)

Example:

SELECT TOP 10 CustomerNumber 
FROM CustomerTable 
ORDER BY InvoiceDate DESC


WHERE

TheWHEREcommand is used to filter the data. Example:

Select CustomerName and CustomerNumber where the city is New York:

SELECT CustomerNumber,CustomerName 
FROM CustomerTable
  
WHERE city = 'New York'


LOGICAL OPERATORS

There are several logical operators in SQL which are typically used in conjunction with the `WHERE` command. Commonly used SQL logical operators are `AND`, `OR`, `IN`, `NOT IN`:

Example:

Select `CustomerName` and `CustomerNumber` where the city is New York and the the customer name is 'Ann Smith'
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE city = 'New York' AND CustomerName = 'Ann Smith'

Example:

Select CustomerName and CustomerNumber where the city is New York or Los Angeles
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE city = 'New York' OR city = 'Los Angeles'

Example:

Select CustomerName and CustomerNumber where the CustomerNumber is 'C-123456' or 'C-123457'
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE CustomerNumber IN ('C-123456','C-123457')


LIKE with WildCard Match

TheLIKEcommand lets you filter for multiple values when you don't know the exact criteria you want to filter for. Using LIKE with regular expressions such as the`%`character creates wildcard matches. When a phrase is placed between two % characters, than the output will be anything containing those characters. When the`%`is placed after the phrase then it must contain those starting characters. Likewise, when placed before a phrase than it must end with the select characters. For example:

Example:

Find Customer Name and Number where CustomerName __contains__ characters "`Abe`"
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE CustomerName LIKE '%ABE%'
-- Expected Sample Return: Cabel, Abel, Abe, Abe Lincoln

Example:

Find Customer Name and Number where CustomerName __starts with__ characters "`Abe`"
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE CustomerName LIKE 'ABE%'

-- Expected Sample Return: Cabel, Abel, Abe, Abe Lincoln

Example:

Find Customer Name and Number where CustomerName __ends with__ characters "`Abe`"
SELECT CustomerNumber,CustomerName 
FROM CustomerTable
WHERE CustomerName LIKE '%ABE'

-- Expected Sample Return: Cabel, Abe


SUM

TheSUMcommand is used for summing a particular column. This is useful when you want to get the total amount of some numerical data.

Example:

Select All Revenue from the RevenueTable
SELECT SUM(amount) 
FROM RevenueTable

Example:

Select All Revenue from the RevenueTable where Invoice Date on Jan 1 2019 or later.
SELECT SUM(amount) 
FROM RevenueTable
WHERE InvoiceDate >= '2019-01-01'


COUNT

TheCOUNTcommand is used for counting up a particular column. This is useful when you want to get the total number of records for a particular column.

Example:

Count up the number of customers from the CustomerTable.

SELECT COUNT(CustomerNumber) 
FROM CustomerTable 

Example:

Count up the number of invoices from the RevenueTable.
SELECT COUNT(InvoiceID) 
FROM RevenueTable


MIN and MAX

TheMINandMAXcommand is used to find the min and max amount of something.

Example:

`--` Find the smallest invoice transaction
 SELECT MIN(amount) FROM RevenueTable
 
 `--` Find the largest invoice transaction
 SELECT MAX(amount) FROM RevenueTable


GROUP BY

The GROUP BY command is used with the COUNT and SUM commands and other data aggregators. This is useful when you want to see aggregate data grouped by a specific column.

Example:

Find the total spend grouped by each customerNumber
SELECT SUM(amount)
FROM RevenueTable
GROUP BY CustomerNumber

Example:

Find the total spend grouped by each `customerNumber` where the invoice date is after Jan 1 2019.
SELECT SUM(amount) FROM RevenueTable
WHERE InvoiceDate >= '2019-01-01'
GROUP BY CustomerNumber


CASE

TheCASEcommand is useful when you want to sum up or count based on a particular criteria. Example: Sum the total amount in the revenue table when the category is 'foo'

SELECT 
SUM(CASE WHEN Category = 'foo' THEN Amount ELSE 0 END ) as totalAmountFoo
FROM RevenueTable

Example:

Count up the number of invoices when the category is 'bar'
SELECT 
SUM(CASE WHEN Category = 'bar' THEN 1 ELSE 0 END ) as totalAmountFoo
FROM RevenueTable

We could have also changed the query to the following to give the same result:

SELECT 
COUNT(CASE WHEN Category = 'bar' THEN 1 ELSE null END ) as totalAmountFoo
FROM RevenueTable

TheCASEcommand can also be used to create a data hierarchy. For example you want to create a "Main Category" column based on the "Category" column already in your database.

Example

Create a `MainCategory` column called Foo_OR_Bar based on the category column.
SELECT 
   CASE 
     WHEN Category = 'foo' THEN 'Foo_OR_Bar'
     WHEN Category = 'bar' THEN 'Foo_OR_Bar'
     ELSE 'OTHER' END as MainCategory
FROM RevenueTable
We could have also shortened this to
SELECT 
   CASE 
     WHEN Category IN ( 'foo' ,'bar') THEN 'Foo_OR_Bar'
     ELSE 'OTHER' END as MainCategory
FROM RevenueTable


Primary and Foreign keys

If you spent a bit of time working with datasets or SQL, you may have heard of primary and foreign keys.

  • Aprimary keyis simply the column in a table that uniquely identifies each record in a table.

  • Aforeign keyis denoted as the column in one table that lets you join to another column in another table, typically a primary key.

Renaming table names (giving them an alias)

Since table names are often very long in a SQL database it is a good convention to give them an alias. This helps to keep your queries cleaner and avoid having to type out the table name over and over again. Define the alias right after you call the Table Name:

Example:

Create an alias and name it 'r' for the RevenueTable
SELECT * FROM RevenueTable as r
Short hand version:
SELECT * FROM RevenueTable r

You don't need to explicitly specify `as` to create the alias. Aliases come in handy when joining with other tables, since now you don't have to reference the full table name.


JOINS

So far the examples have been limited to querying only one table at a time. However, the real power behind SQL is the ability to join different datasets (tables) together. When you want to join on (ie bring in data from another table) use theJOINorLEFT JOINcommand.

The crucial difference is that with JOIN you bring in rows that only occur in both tables. With LEFT JOIN, you keep all the existing rows from the original table.

Example:

Join the revenue table on the customer table and create aliases 'r' and 'c' for them.
SELECT * FROM RevenueTable r
 LEFT JOIN CustomerTable c ON c.CustomerNumber = r.CustomerNumber

Using the*key will return all columns from both databases. We don't necessarily need this. Say we want to return

  • All the columns from revenueTable

  • The `customerName` from the customerTable

Then the following query will do the job:

SELECT r.*, c.CustomerName FROM RevenueTable r
LEFT JOIN CustomerTable c ON c.CustomerNumber = r.CustomerNumber


SUB QUERIES

Sub queries are used when you want to modify the table you intend to join on. You might want to do this for several reasons. For instance, the table is to large and you want to filter the table before you join, or the table contains invoice data and you want to aggregate the invoice data to the customer level before joining. Lets do the later:

Example:

Return the customer name and customer number along with the total amount each has spent:
SELECT c.CustomerName, c.CustomerNumber, SUM(Amount) as TotalAmount
 FROM CustomerTable c
 LEFT JOIN RevenueTable r ON c.CustomerNumber = r.CustomerNumber
 GROUP BY c.CustomerName,c.CustomerNumber
This is one way to do it, but it depending on how the size of each table it might be much faster to write using a subquery.
SELECT c.CustomerNumber,CustomerName, Total 
FROM CustomerTable c 
LEFT JOIN (

   SELECT CUstomerNumber, SUM(Amount) as Total
   FROM revenueTable r
   GROUP BY CustomerNumber
)  r ON c.CustomerNumber = r.CustomerNumber 
WHERE Total IS NOT NULL

The sub query is used ( ie. inner query inside the left join ) to first aggregrate the data at the customer level before joining it on the `customerTable`. You can join or left join on the customer table as many times as needed. For instance, if there is another table say `refundTable` than we could left join that right underneath like such:

SELECT c.CustomerNumber,CustomerName, Total, TotalRefund
FROM CustomerTable c 
LEFT JOIN (

   SELECT CUstomerNumber, SUM(Amount) as Total
   FROM revenueTable r
   GROUP BY CustomerNumber
)  r ON c.CustomerNumber = r.CustomerNumber 
LEFT JOIN (

   SELECT CUstomerNumber, SUM(Amount) as TotalRefund
   FROM refundTable r
   GROUP BY CustomerNumber

)  r ON c.CustomerNumber = r.CustomerNumber
WHERE Total IS NOT NULL


SQL DATES FUNCTIONS

Common built in SQL DATE functions includeYEAR,MONTH,GETDATE()andDATEDIFF. Some common use cases for SQL queries with dynamic dates.

Example:

The SQL function GETDATE() returns the date and time when the query was run. See below:

SELECT GETDATE(); // returns the date and time.

Wrapping GETDATE() inside of YEAR or MONTH gives back the the associated year and month number for the date.

To return today's revenue by customerNumber we must first convert the date and time from GETDATE to a date only. This can be done as follows:

SELECT CONVERT(date, GETDATE()); // returns today's date

Insert that into our query to get the revenue by customer for today.

SELECT CustomerNumber, SUM(Amount) as RevenueYTD 
FROM revenueTable r
WHERE  InvoiceDate = CONVERT(date, GETDATE())  
GROUP BY CustomerNumber

Example:

Return yesterday's revenue by customerNumber

SELECT CustomerNumber, SUM(Amount) as RevenueYTD 
FROM revenueTable r
WHERE  InvoiceDate = CONVERT(date, GETDATE() -1) 
GROUP BY CustomerNumber

Example:

Return revenue by for the last 7 days by customerNumber not including today

SELECT CustomerNumber, SUM(Amount) as RevenueYTD 
FROM revenueTable r
WHERE  
   InvoiceDate >= CONVERT(date, GETDATE() -7) 
   AND InvoiceDate != CONVERT(date, GETDATE())
GROUP BY CustomerNumber

Example:

Return year-to-date revenue by customerNumber

SELECT CustomerNumber, SUM(Amount) as RevenueYTD 
FROM revenueTable r
WHERE YEAR(InvoiceDate) >= YEAR(GETDATE())
GROUP BY CustomerNumber

Example:

Return year-to-date revenue by customerNumber grouped by Year and Month also

SELECT 
   CustomerNumber, 
   YEAR(InvoiceDate) as InvoiceYear,
   MONTH(InvoiceDate) as InvoiceMonth
   SUM(Amount) as RevenueYTD 
FROM revenueTable r
WHERE YEAR(InvoiceDate) >= YEAR(GETDATE())
GROUP BY 
  CustomerNumber,
  YEAR(InvoiceDate),
  MONTH(InvoiceDate)

Example:

Return year-to-date revenue for the company grouped by Year and Month.

SELECT 
       YEAR(InvoiceDate) as InvoiceYear,
       MONTH(InvoiceDate) as InvoiceMonth
       SUM(Amount) as RevenueYTD 
   FROM revenueTable r
   WHERE YEAR(InvoiceDate) >= YEAR(GETDATE())
   GROUP BY 
      YEAR(InvoiceDate),
      MONTH(InvoiceDate)

Example:

Compare revenue for this year vs last year for the company

SELECT 
   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) THEN Amount ELSE 0 END) as RevenueThisYear,
   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) -1 THEN Amount ELSE 0 END) as RevenueThisYear
   FROM revenueTable r
   WHERE YEAR(InvoiceDate) >= YEAR(GETDATE()) -1

Example:

Compare revenue for this year vs last year for the company by month

SELECT 
   MONTH(InvoiceDate) as InvoiceMonth,
   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) THEN Amount ELSE 0 END) as RevenueThisYear,
   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) -1 THEN Amount ELSE 0 END) as RevenueThisYear
   FROM [SALESLINK_REPORTING].[dbo].[v_RevenueXrep] r

   WHERE YEAR(InvoiceDate) >= YEAR(GETDATE()) -1
   GROUP BY MONTH(invoiceDate)
   ORDER BY MONTH(InvoiceDate)

Example:

Compare revenue for this year vs last year for the company by month renaming month number to month name using the CASE SQL command.

SELECT 

   CASE 
	WHEN MONTH(InvoiceDate) = 1 THEN 'Jan'
	WHEN MONTH(InvoiceDate) = 2 THEN 'Feb'
	WHEN MONTH(InvoiceDate) = 3 THEN 'Mar'
	WHEN MONTH(InvoiceDate) = 4 THEN 'Apr'
	WHEN MONTH(InvoiceDate) = 5 THEN 'May'
	WHEN MONTH(InvoiceDate) = 6 THEN 'Jun'
	WHEN MONTH(InvoiceDate) = 7 THEN 'Jul'
	WHEN MONTH(InvoiceDate) = 8 THEN 'Aug'
	WHEN MONTH(InvoiceDate) = 9 THEN 'Sep'
	WHEN MONTH(InvoiceDate) = 10 THEN 'Oct'
	WHEN MONTH(InvoiceDate) = 11 THEN 'Nov'
	WHEN MONTH(InvoiceDate) = 12 THEN 'Dec'
	ELSE 'N/A'
	END as InvoiceMonth,

   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) THEN Amount ELSE 0 END) as RevenueThisYear,
   SUM(CASE WHEN YEAR(InvoiceDate) = YEAR(GETDATE()) -1 THEN Amount ELSE 0 END) as RevenueThisYear
   FROM [SALESLINK_REPORTING].[dbo].[v_RevenueXrep] r

   WHERE YEAR(InvoiceDate) >= YEAR(GETDATE()) -1
   GROUP BY MONTH(invoiceDate)
   ORDER BY MONTH(InvoiceDate)

SQL DATE FUNCTION II (Using Dynamic Dates with Weeks)

There is currently no built in SQL function for Weeks, depending on the flavor or version of SQL that is used. This can make getting dates a bit harder. A universal SQL way to get dates is to combine the DATEDIFF and DATEADD functions as follows:

Example

Get the start of the current week based on Monday:

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()) /7*7, 0) as WeekStart

Get the start of the current week based on Sunday:

SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()) /7*7, 0) as WeekStart

Example

Find the day the week will end based on today's date:

DATEADD(day, DATEDIFF(day, 6, CallDate-1) /7*7 + 7, 6) AS WeekEnd

Example

Determine the week over week revenue for the company this year from the revenueTable where the starting week is every Monday. Have the current week showing first.

SELECT 
      DATEADD(day, DATEDIFF(day, 0, [InvoiceDate]) /7*7, 0) as WeekStart,
      SUM([Amount]) as Weekly_Revenue	
FROM RevenueTable
WHERE YEAR(InvoiceDate) = YEAR(GETDATE())
GROUP BY DATEADD(day, DATEDIFF(day, 0, [InvoiceDate]) /7*7, 0)
ORDER BY DATEADD(day, DATEDIFF(day, 0, [InvoiceDate]) /7*7, 0) DESC



Step 2: Getting the Data into Excel

Using the GET DATA Feature in Excel

You can insert custom queries into excel and use refresh all inside your excel workbook. This always ensures your data is up-to-date and eliminates the need for vlookups and reduces potential copy paste errors that may occur when manipulating data in multiple steps manually. It also saves time for reporting as all the data is now there from the server and your workflow ( data manipulation steps) can be automated using the Query Editor feature in Excel.

In Excel versions 2016 or later select 'Data'in the top ribbon. You should see the 'Get Data'option at the top left corner. For SQL server database, select'From Database'and From 'SQL Server Database'. You can then paste your custom query intoAdvanced Options. For more data manipulation and to rename your table hit 'Edit'. This will open up the 'Query Editor' giving a whole slew of data manipulation options such as replacing values, appending or merging other queries, and adding in custom conditional columns.

** Note in older versions of Excel you will need to download the __Power Query__ add-on.

Using the Query Editor Features to transform your data

The `Query Editor` features have a ton of other options you can use for cleaning and transforming your data. Most of these are pretty self explanatory on the user interface. Here are some of the top features I use

  1. The Append method: The append method stacks two or more tables on top of each other. This is useful when you have data in two separate servers and it is not possible to append using a SQL command. When appending, make sure to have the same column names.

  2. The Merge Method: The merge feature is like a SQL left join. You can use this for joining different SQL tables together when: a) The data is in different servers or locations. b) You rather use the Query Editor features in excel to transform and manipulate the data.

Continue Reading

Navigation
HomeAboutServicesBlogContactPrivacy PolicyTerms & Conditions
Follow Us
Services
Websites
Contact Us

codepaper.dev@gmail.com

587.501.7726

Subscribe To Our Newsletter

Stay up to the date with the latest in Automation and CodePaper by subscribing to our newsletter

Special Thanks To