Gamma Gamma Hey Part One
12 June 2022
Gamma Gamma Hey (an options analysis) Part One:
Primary Data Collection and Cleaning
Abstract
In this first post in a series analyzing real-world derivatives data, we set up a Raspberry Pi 4 with a R Shiny Server. R and shell scripts are used to create a table within the PostgreSQL database to hold the options data. Additional scripts and a cron
job are used to automate scraping data from the CBOE website and appending the data to the table in the PostgreSQL database. An R markdown is used to take an initial look at our data.
Background
I have been doing some thinking over the past months about how my previous projects have been guilty of trying to cover too much ground at once. You take some data, you slap a model on it, voila! A true data project should be more than that. It should grow in time with your understanding of the data. To truly understand that data requires more than just a one-off project and a bit of research. It is with this intention that I aim to start this post as a series going through the entire lifecycle of a data project, from inception and data sourcing to advanced models and system maintenance. In this first post we will start our journey at a carte blanche. We have no fancy models. We have no data. We have nothing except a Raspberry Pi 4 and a curiosity about options. The options market can be a (very dangerous) playground for a statistics nerd. Plenty of APIs can be found for finding stock data in R with some readily available libraries. Considerably less sources exist for options, but I am spoiled. I don’t want Yahoo Finance data on an API someone else built. Why have room temperature Dasani when you can go drink from an Alpine spring? I want the source! Bring me to Pablo! This being said, the only rational conclusion is to scrape data directly from the CBOE ourselves. I don’t just want this data for today, nor do I want the limited historical data being offered. Predictably, that will cost you. This is not a good start for an enterprise aimed primarily at not losing money! We may not be able to do this for past data (or tick-level data), but one solution is to create our own database, starting today.
Setting up the Raspberry Pi and R shiny Server
Hardware
The Raspberry Pi is the perfect tool to carry out our scraping needs. It’s portable (currently hidden behind a night stand), uses very little power, I don’t have to keep my main laptop on all the time, and we can dedicate it solely to our project.
system(
"cat /sys/firmware/devicetree/base/model",
intern = TRUE
)
## [1] "Raspberry Pi 4 Model B Rev 1.4"
Memory
I am using a 128GB sd card to hold both the OS and all of our data (we will probably need to mount external storage eventually). For our operating system we want to choose the lightest possible while still being able to perform our task(s) at hand. We will be using Debian Bullseye.
system(
"df -h",
intern = TRUE
)
## [1] "Filesystem Size Used Avail Use% Mounted on"
## [2] "/dev/root 118G 15G 98G 14% /"
## [3] "devtmpfs 3.7G 0 3.7G 0% /dev"
## [4] "tmpfs 3.9G 16K 3.9G 1% /dev/shm"
## [5] "tmpfs 1.6G 776K 1.6G 1% /run"
## [6] "tmpfs 5.0M 4.0K 5.0M 1% /run/lock"
## [7] "/dev/mmcblk0p1 253M 31M 222M 12% /boot"
## [8] "tmpfs 787M 0 787M 0% /run/user/1000"
Setting up the R Shiny Server
I followed this guide to install R shiny Server on a Raspberry Pi. There is an updated article that uses an Ansible build and would probably play much nicer with my 64-bit Raspberry Pi, but meh. The instructions work (mostly) and if you take enough stress-breaks you’ll get there. The main work-around involves getting a 64-bit version of NGINX and node for arch instead of the 32-bit. Nothing too crazy. This does take about a day though, so make sure you bring snacks.
version[c(2,13)]
## _
## arch aarch64
## version.string R version 4.0.2 (2020-06-22)
installed.packages()[,3]
## askpass assertthat backports bench bit
## "1.1" "0.2.1" "1.4.1" "1.1.2" "4.0.4"
## bit64 blob boot brio broom
## "4.0.5" "1.2.3" "1.3-28" "1.1.3" "0.8.0"
## callr class cluster codetools colorspace
## "3.7.0" "7.3-20" "2.1.3" "0.2-18" "2.0-3"
## covr cpp11 crayon curl data.table
## "3.5.1" "0.4.2" "1.5.1" "4.3.2" "1.14.2"
## DBI dbplyr desc diffobj dplyr
## "1.1.2" "2.2.0" "1.4.1" "0.3.5" "1.0.9"
## ellipsis evaluate fansi farver filehash
## "0.3.2" "0.15" "1.0.3" "2.1.0" "2.4-3"
## foreign generics ggplot2 glue gridBase
## "0.8-82" "0.1.2" "3.3.6" "1.6.2" "0.4-7"
## gridExtra gtable highr httr igraph
## "2.3" "0.3.0" "0.9" "1.4.3" "1.3.1"
## isoband jsonlite KernSmooth knitr labeling
## "0.2.5" "1.8.0" "2.23-20" "1.39" "0.4.2"
## Lahman lattice lazyeval lifecycle lobstr
## "10.0-1" "0.20-45" "0.2.2" "1.0.1" "1.1.1"
## lubridate magrittr MASS Matrix memoise
## "1.8.0" "2.0.3" "7.3-57" "1.4-1" "2.0.1"
## mgcv microbenchmark munsell nlme nnet
## "1.8-40" "1.4.9" "0.5.0" "3.1-157" "7.3-17"
## nycflights13 openssl pdftools pillar pkgconfig
## "1.0.2" "2.0.2" "3.2.1" "1.7.0" "2.0.3"
## pkgload plogr plyr png praise
## "1.2.4" "0.2.0" "1.8.7" "0.1-7" "1.0.0"
## processx profmem ps purrr qpdf
## "3.6.0" "0.6.0" "1.7.0" "0.3.4" "1.2.0"
## R6 RColorBrewer rematch2 reshape2 rex
## "2.5.1" "1.1-3" "2.1.2" "1.4.4" "1.2.1"
## rlang rmarkdown rpart rprojroot RSQLite
## "1.0.2" "2.14" "4.1.16" "2.0.3" "2.2.14"
## scales spatial stringi stringr survival
## "1.2.0" "7.3-15" "1.7.6" "1.4.0" "3.3-1"
## sys testthat tibble tidyr tidyselect
## "3.4" "3.1.4" "3.1.7" "1.2.0" "1.1.2"
## tikzDevice tinytex treemap utf8 vctrs
## "0.12.3.1" "0.39" "2.4-3" "1.2.2" "0.4.1"
## viridisLite waldo wesanderson xfun yaml
## "0.4.0" "0.4.0" "0.3.6" "0.31" "2.3.5"
## evaluate getPass highr knitr markdown
## "0.15" "0.2-2" "0.9" "1.39" "1.1"
## rmarkdown RPostgreSQL rstudioapi stringr timeDate
## "2.14" "0.7-3" "0.13" "1.4.0" "3043.102"
## tinytex xfun yaml base base64enc
## "0.39" "0.31" "2.3.5" "4.0.2" "0.1-3"
## boot bslib cachem Cairo class
## "1.3-25" "0.3.1" "1.0.6" "1.5-15" "7.3-17"
## cli cluster codetools commonmark compiler
## "3.3.0" "2.1.0" "0.2-16" "1.8.0" "4.0.2"
## crayon datasets DBI digest dplyr
## "1.5.1" "4.0.2" "1.1.2" "0.6.29" "1.0.9"
## ellipsis fansi fastmap fontawesome foreign
## "0.3.2" "1.0.3" "1.1.0" "0.2.2" "0.8-80"
## fs generics glue graphics grDevices
## "1.5.2" "0.1.2" "1.6.2" "4.0.2" "4.0.2"
## grid htmltools httpuv jquerylib jsonlite
## "4.0.2" "0.5.2" "1.6.5" "0.1.4" "1.8.0"
## KernSmooth later lattice lifecycle magrittr
## "2.23-17" "1.3.0" "0.20-41" "1.0.1" "2.0.3"
## MASS Matrix methods mgcv mime
## "7.3-51.6" "1.2-18" "4.0.2" "1.8-31" "0.12"
## nlme nnet parallel pillar pkgconfig
## "3.1-148" "7.3-14" "4.0.2" "1.7.0" "2.0.3"
## promises purrr R6 rappdirs Rcpp
## "1.2.0.1" "0.3.4" "2.5.1" "0.3.3" "1.0.8.3"
## rlang rpart sass shiny sourcetools
## "1.0.2" "4.1-15" "0.4.1" "1.7.1" "0.1.7"
## spatial splines stats stats4 stringi
## "7.3-12" "4.0.2" "4.0.2" "4.0.2" "1.7.6"
## survival tcltk tibble tidyselect tools
## "3.1-12" "4.0.2" "3.1.7" "1.1.2" "4.0.2"
## utf8 utils vctrs withr xtable
## "1.2.2" "4.0.2" "0.4.1" "2.5.0" "1.8-4"
Before we proceed
A couple of things to keep in mind when setting this up. Our Raspberry Pi is good for scraping, but its not going to be training neural nets anytime soon. We just want to use this to grab data and store it. We can still use the Raspberry Pi to do some simple plots (and create this Rmarkdown), but we are probably going to do anything more in the way of heavy lifting on a more powerful machine.
Setting up PostgreSQL
So now we have a situation where we will ideally be collecting an ever increasing amount of data. Saving this data as an .Rdata
object or a .csv
will quickly add up in terms of memory, and we would be stuck importing all of the data before we could filter out what we wanted, which would slow down any analysis that uses it every time we want to call this object from memory. In addition, we previously stated that we wanted to be able to do more advanced models on other machines. Sending a large data file through SCP is tedious and tough to replicate at scale. The PostgreSQL is a light RDBMS that is already included in our Shiny build, so let’s just use that. We will be using the public schema, but be sure to set up a db
before proceeding. R has many packages that will allow users to connect to a PostgreSQL database. We will be using RPostgreSQL
and DBI
, but there are others.
Scraping options data from the CBOE
We can scrape data directly from the CBOE delayed quotes page. Here is an example of the SPY option chain. This provides plenty of info, including calculating the Greeks for us. Unfortunately, the CBOE uses scripts on their page to prevent scrapers from getting their data. Fortunately, the internet exists. Here is a link showing how to do this specifically to the CBOE. Here is another link on how to scrape data from sites running scripts. Go nuts.
Scraping a single ticker (scraper.R)
To start we only need the jsonlite
package to scrape, the others are for manipulation and cleaning.
# import our libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(jsonlite)
library(stringi)
We create a variable called ticker
to hold our desired value, then supply it to the string to make our ticker request.
# let's pick a ticker to scrape
ticker <- "QQQ"
# create our scraping address
scrape_target <- paste0(
"https://cdn.cboe.com/api/global/delayed_quotes/options/",
ticker,
".json"
)
print(scrape_target)
## [1] "https://cdn.cboe.com/api/global/delayed_quotes/options/QQQ.json"
This will fetch data for every strike for every expiration for that ticker.
# next we read the json data at our scrape target
scrape_data <- read_json(
scrape_target,
simplifyVector = TRUE
)
This returns a list
object to us (think of it as R
’s best approximation for json
data nested structure).
typeof(scrape_data)
## [1] "list"
Let’s take a quick look at where most of our useful data is housed.
glimpse(scrape_data$data$options)
## Rows: 7,032
## Columns: 23
## $ option <chr> "QQQ220613C00230000", "QQQ220613P00230000", "QQQ22061…
## $ bid <dbl> 59.11, 0.00, 54.11, 0.00, 49.11, 0.00, 44.11, 0.00, 3…
## $ bid_size <dbl> 50, 0, 50, 0, 75, 0, 50, 0, 45, 0, 45, 0, 50, 0, 75, …
## $ ask <dbl> 59.44, 0.01, 54.44, 0.01, 49.42, 0.01, 44.46, 0.02, 3…
## $ ask_size <dbl> 50, 106, 50, 85, 75, 70, 75, 212, 50, 5, 50, 81, 75, …
## $ iv <dbl> 1.7101, 0.8180, 1.5628, 0.7465, 1.4124, 0.6762, 1.285…
## $ open_interest <dbl> 0, 514, 0, 7, 0, 100, 0, 6, 0, 401, 0, 3, 0, 3, 0, 30…
## $ volume <dbl> 10, 1, 0, 0, 16, 0, 0, 200, 10, 401, 6, 4, 0, 1, 24, …
## $ delta <dbl> 0.9994, -0.0005, 0.9991, -0.0008, 0.9987, -0.0012, 0.…
## $ gamma <dbl> 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002, 0.000…
## $ theta <dbl> -0.0010, -0.0026, -0.0022, -0.0038, -0.0039, -0.0054,…
## $ rho <dbl> 0.0172, 0.0000, 0.0186, 0.0000, 0.0195, 0.0000, 0.020…
## $ vega <dbl> 0.0005, 0.0005, 0.0007, 0.0007, 0.0011, 0.0011, 0.001…
## $ theo <dbl> 60.1050, 0.0057, 55.1050, 0.0057, 50.1250, 0.0057, 45…
## $ change <dbl> -11.765, 0.005, 0.000, 0.000, -10.245, 0.000, 0.000, …
## $ open <dbl> 59.30, 0.01, 0.00, 0.00, 51.19, 0.00, 0.00, 0.01, 40.…
## $ high <dbl> 59.30, 0.01, 0.00, 0.00, 51.19, 0.00, 0.00, 0.01, 40.…
## $ low <dbl> 58.96, 0.01, 0.00, 0.00, 50.49, 0.00, 0.00, 0.01, 40.…
## $ tick <chr> "down", "no_change", "no_change", "down", "down", "no…
## $ last_trade_price <dbl> 58.96, 0.01, 0.00, 0.01, 50.49, 0.02, 0.00, 0.01, 40.…
## $ last_trade_time <chr> "2022-06-10T12:24:25", "2022-06-10T15:51:18", NA, "20…
## $ percent_change <dbl> -16.6349, 100.0000, 0.0000, 0.0000, -16.8684, 0.0000,…
## $ prev_day_close <dbl> 70.725, 0.005, 65.725, 0.005, 60.735, 0.005, 55.765, …
This is full of information, but we need to transform this into a dataframe so we can then store this into a similarly structured PostgreSQL database.
# make it into a data frame
option_data <- as.data.frame(
scrape_data$data$options
)
We also need to change the datetime to an R datetime object.
# clean last trade datetime from string
option_data$last_trade_time <- as.POSIXct(
option_data$last_trade_time,
"%Y-%m-%dT%H:%M:%S",
tz = "America/Chicago"
)
We can also grab some useful data about the underlying from a different part of the nested data.
# grab underlying close
option_data$underlying_close <- scrape_data$data$close
Let’s take a look at the first column of our data.
print(option_data$option[1])
## [1] "QQQ220613C00230000"
This doesn’t look all that informative on the surface, but this actually has a lot of information. If you look at the above, QQQ
represents the underlying, 22
represents the year of expiration, 06
the month, 13
the day, c
whether it is a call or a put, and 00230000
represents the $230.00 strike. Let’s create some more useful columns from this.
# grab our strike price from the option name string
option_data$strike_price <- as.numeric(stri_sub(option_data$option, -8)) / 1000
# grab our contract type from the option name string
option_data$contract_type <- as.factor(stri_sub(option_data$option, -9, -9))
# grab our underlying ticker from the option name string
option_data$underlying_ticker <- as.factor(stri_sub(option_data$option, -999, -16))
# grab our underlying ticker from the option name string
option_data$expiration_date <- as.Date(
stri_sub(option_data$option, -15, -10),
format = "%y%m%d"
)
Options data is unsurpisingly, not static. The value for the $330 call will be different (probably) tomorrow than it is today. We need to grab the date on the date that we scrape the data so we can tell when this data was taken. Let’s create a column for that.
# if we want to set this up to scrape every day
# we need to create a column to record
# on what day the data was scraped
option_data$scrape_date <- Sys.Date()
We can also use that original string from the first column as the primary key column in our PostgreSQL database. To make this our unique identifier, we append the date of the scrape to the end of the string to make each row unique based on a string combination of the underlying, expiration, strike, and scrape date.
# since we have already taken all the useful data
# from the option column, we can keep using it as
# a unique identifier for our table if we append
# the scrape date to the end of the string
option_data$option <- paste0(
option_data$option,
as.character(Sys.Date())
)
Let’s take a look at our cleaned data. We can add more columns later, but this will be what will be stored in the PostgreSQL database.
print(option_data[1,])
## option bid bid_size ask ask_size iv
## 1 QQQ220613C002300002022-06-12 59.11 50 59.44 50 1.7101
## open_interest volume delta gamma theta rho vega theo change open
## 1 0 10 0.9994 1e-04 -0.001 0.0172 5e-04 60.105 -11.765 59.3
## high low tick last_trade_price last_trade_time percent_change
## 1 59.3 58.96 down 58.96 2022-06-10 12:24:25 -16.6349
## prev_day_close underlying_close strike_price contract_type underlying_ticker
## 1 70.725 288.84 230 C QQQ
## expiration_date scrape_date
## 1 2022-06-13 2022-06-12
Iterating our scraper over our watchlist (grab_watchlist.R)
Scraping and storing data is nice, but I want to expand this to get all of the tickers on my watchlist. First we call the previous scraper.R
file that defines our scraper function.
# grab our scraper script
source("/home/sebi/optionsBacktesting/scraper.R")
Due to restrictions from my job, these are all broad market ETFs. SLYV is notably less liquid than the others, but I sell covered calls against it, so why not include it too.
# declare the tickers we want in our watchlist
watchlist <- c(
"QQQ",
"SPY",
"IWM",
"SLYV",
"FXI",
"DIA",
"ARKK",
"FEZ",
"EEM",
"EWW",
"EWZ",
"XLB",
"XLV",
"XLU",
"XLF",
"XLI",
"XOP",
"GLD",
"SLV",
"TLT",
"HYG"
)
We will want to create an empty dataframe where we will append the results of a call to scraper.R
for each ticker.
# create an empty dataframe
watchlist_data <- data.frame()
summary(watchlist_data)
## < table of extent 0 x 0 >
This simply iterates through our watchlist we created, and for every ticker we run grab_option_data()
(the function we defined in scraper.R
). This will return a dataframe, which we will then stick onto the bottom of our formerly empty data frame.
# for each ticker in the watchlist grab the option data
# and union it to the watchlist_data df
watchlist_data <- do.call(
rbind,
lapply(
watchlist,
grab_option_data
)
)
Let’s check the result of our pull to see all of the unique tickers in the dataframe we created. It should be the same as the watchlist we created.
print(unique(as.factor(watchlist_data$underlying_ticker)))
## [1] QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM EWW EWZ XLB XLV XLU XLF
## [16] XLI XOP GLD SLV TLT HYG
## 21 Levels: QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM EWW EWZ XLB XLV XLU ... HYG
Pulling data for our whole watchlist produces about 49k rows.
# number of rows for a single day's pull
nrow(watchlist_data)
## [1] 49190
Creating a table in PostgreSQL (create_watchlist_table.R)
Now we know how to get the option data for every watchlist ticker for every expiration for every strike. But how do we get this into our PostgreSQL database? First we load the RPostgreSQL
library and call the grab_watchlist.R
script containing our grab_watchlist()
function.
library(RPostgreSQL)
## Loading required package: DBI
source("/home/sebi/optionsBacktesting/grab_watchlist.R")
Next we need to declare our driver name, the name of our database, the host (I am running this markdown from the Raspberry Pi, hence localhost
), and the port we wish to connect through. Normally the user would have to enter their username and password to access the database, but I can’t stand around each day waiting to enter my password, nor am I going to put my credentials on the screen for you monsters, so I need to automate this too. Luckily, the PostgreSQL documentation very clearly shows how to set up a .pgpass
file on the Raspberry Pi. Note that this is for local PostgreSQL access which I intend to automate. This can only be manually accessed by a user who would have logged in by entering their credentials. I will still require logging remotely to require manually entering the user and password at this point, but we will get to that later.
# set driver name
driver_name <- dbDriver(drvName = "PostgreSQL")
# establish database connection
db <- DBI::dbConnect(driver_name,
dbname="sebi",
host="localhost",
port = 5432
)
We run our grab_watchlist()
function to grab today’s data.
# grab todays watchlist data
watchlist_data <- grab_watchlist()
We only intend to build this table once, for the following days we will append to this table. However, we do want to standardize this procedure. The script to create a table will also allow overwriting of existing data. We do not need to worry about defining the schema (will be saved in public
) nor whether a column is VARCHAR(9)
or VARCHAR(10)
. R
will handle all of this for us.
# create our table with todays data (overwrite if already exists)
DBI::dbWriteTable(
db,
value = watchlist_data,
name = "watchlist_data",
overwrite = TRUE,
row.names = FALSE
)
We do need to define a primary key for the table, which we created from a combination of the option string and scrape_date
earlier.
# set primary key column
DBI::dbSendQuery(
db,
'ALTER TABLE watchlist_data ADD PRIMARY KEY ("option")'
)
Let’s be polite and disconnect from our database.
# disconnect from database
DBI::dbDisconnect(db)
Appending to the existing PostgreSQL table (append_watchlist_data.R)
library(DBI)
library(RPostgreSQL)
source("/home/sebi/optionsBacktesting/grab_watchlist.R")
The append script looks pretty much the same, except with append = TRUE
instead of overwrite = TRUE
. This is the script that will run every day.
append_watchlist_data <- function(){
# grab our watchlist data
watchlist_data <- grab_watchlist()
# establish driver name
driver_name <- dbDriver(drvName = "PostgreSQL")
# create database connection
db <- DBI::dbConnect(driver_name,
dbname="sebi",
host="localhost",
port = 5432
)
# append our scraped data to the table
DBI::dbWriteTable(
db,
name = "watchlist_data",
value = watchlist_data,
row.names = FALSE,
append = TRUE
)
# close database connection
DBI::dbDisconnect(db)
}
Only scrape for data on market days (trading_day_scheduler.R)
If we were to scrape this page on weekends, we would get the same result on Saturday as Friday because the market is closed. The same applies for market holidays. To prevent unnecessary duplicates, we only call this script when it is a weekday and not a holiday.
library(timeDate)
source("/home/sebi/optionsBacktesting/append_watchlist_data.R")
if(as.POSIXlt(Sys.Date())$wday %in% 1:5 & !(Sys.Date() %in% as.Date(timeDate::holidayNYSE()))){
append_watchlist_data()
} else {
message("Market Closed Today")
}
## Market Closed Today
Setting up a cronjob to automate data scrapes
Automating the task is as simple as setting up a cron
job. Make sure to do this as the same user that as you intend tolaunch the scripts. For the sake of not poking the bear we will only run this once a day. The beginning of our line 31 22 * * *
means that I want the job to run at every day at 10:31 PM for two reasons:
- The market will be closed
- 10:31 PM is when I stopped messing up the command.
The second part of our cron
job tells our Raspberry Pi what to do at that time. In this case, first change directory (cd /home/sebi/optionsBacktesting
) , then run the R
script trading_day_scheduler.R
to see if the market is open and add data to the PostgreSQL database if it is(Rscript trading_day_scheduler.R
). The final part (> log.txt 2>&1
) sends any terminal information to a log file we created.
system(
"sudo crontab -u sebi -l",
intern = TRUE
)
## [1] "# Edit this file to introduce tasks to be run by cron."
## [2] "# "
## [3] "# Each task to run has to be defined through a single line"
## [4] "# indicating with different fields when the task will be run"
## [5] "# and what command to run for the task"
## [6] "# "
## [7] "# To define the time you can provide concrete values for"
## [8] "# minute (m), hour (h), day of month (dom), month (mon),"
## [9] "# and day of week (dow) or use '*' in these fields (for 'any')."
## [10] "# "
## [11] "# Notice that tasks will be started based on the cron's system"
## [12] "# daemon's notion of time and timezones."
## [13] "# "
## [14] "# Output of the crontab jobs (including errors) is sent through"
## [15] "# email to the user the crontab file belongs to (unless redirected)."
## [16] "# "
## [17] "# For example, you can run a backup of all your user accounts"
## [18] "# at 5 a.m every week with:"
## [19] "# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/"
## [20] "# "
## [21] "# For more information see the manual pages of crontab(5) and cron(8)"
## [22] "# "
## [23] "31 22 * * * cd /home/sebi/optionsBacktesting && Rscript trading_day_scheduler.R > log.txt 2>&1"
## [24] "# m h dom mon dow command"
We’ve made a few scripts so far. Let’s take a look at all of our files again.
system(
"ls -al /home/sebi/optionsBacktesting/",
intern = TRUE
)
## [1] "total 816"
## [2] "drwxr-xr-x 3 sebi sebi 4096 Jun 12 15:57 ."
## [3] "drwxr-xr-x 13 sebi sebi 4096 Jun 12 15:57 .."
## [4] "-rw-r--r-- 1 sebi sebi 700 Jun 12 15:57 append_watchlist_data.R"
## [5] "-rw-r--r-- 1 sebi sebi 836 Jun 12 15:57 create_watchlist_table.R"
## [6] "-rw-r--r-- 1 sebi sebi 747615 Jun 12 15:57 gamma_gamma_hey_an_options_analysis_part_one_data_collection_and_cleaning.html"
## [7] "-rw-r--r-- 1 sebi sebi 21015 Jun 12 15:57 gamma_gamma_hey_an_options_analysis_part_one_data_collection_and_cleaning.Rmd"
## [8] "drwxr-xr-x 8 sebi sebi 4096 Jun 12 15:57 .git"
## [9] "-rw-r--r-- 1 sebi sebi 570 Jun 12 15:57 .gitignore"
## [10] "-rw-r--r-- 1 sebi sebi 718 Jun 12 15:57 grab_watchlist.R"
## [11] "-rw-r--r-- 1 sebi sebi 0 Jun 12 15:57 log.txt"
## [12] "-rw-r--r-- 1 sebi sebi 2952 Jun 12 15:57 option_eda_scratch.Rmd"
## [13] "-rw-r--r-- 1 sebi sebi 205 Jun 12 15:57 optionsBacktesting.Rproj"
## [14] "-rw-r--r-- 1 sebi sebi 503 Jun 12 15:57 psql_data_pull_test.R"
## [15] "-rw-r--r-- 1 sebi sebi 595 Jun 12 15:57 README.md"
## [16] "-rw-r--r-- 1 sebi sebi 1848 Jun 12 15:57 scraper.R"
## [17] "-rw-r--r-- 1 sebi sebi 540 Jun 12 15:57 scratch_notebook.Rmd"
## [18] "-rw-r--r-- 1 sebi sebi 331 Jun 12 15:57 test_postgresql.R"
## [19] "-rw-r--r-- 1 sebi sebi 250 Jun 12 15:57 trading_day_scheduler.R"
Performing an intial query from our database
We’ve reached the end of the first phase. But you’re not into data if your natural sense of curiosity doesn’t start to tingle. Let’s do an initial pull of data from our repository and build a visualization. We will use the package getPass
to prompt the user to enter their credentials before accessing the database.
library(getPass)
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(reshape2)
library(wesanderson)
library(RColorBrewer)
# set driver name
driver_name <- dbDriver(drvName = "PostgreSQL")
# establish database connection
db <- DBI::dbConnect(driver_name,
dbname = "sebi",
host = "192.168.0.12",
port = 5432,
user = "sebi",
password = getPass("Enter Password:")
)
res <- dbSendQuery(db, "SELECT * FROM watchlist_data;")
data_pull <- dbFetch(res)
dbClearResult(res)
## [1] TRUE
dbDisconnect(db)
## [1] TRUE
Let’s just look at the most recent day’s data.
# make a vector of the tickers from the data fetch
tickers <- as.character(unique(as.factor(data_pull$underlying_ticker)))
# grab just the options from the most recent pull
recent_pull <- data_pull[data_pull$scrape_date == max(data_pull$scrape_date),]
# a blank dataframe to hold our put call ratios
pcr_list <- data.frame(matrix(ncol = length(tickers), nrow = 1))
colnames(pcr_list) <- tickers
print(pcr_list)
## QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM EWW EWZ XLB XLV XLU XLF XLI XOP GLD SLV
## 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## TLT HYG
## 1 NA NA
Now let’s calulate the volume of puts divided by the volume of calls for each ticker (the put-call ratio). The higher the number is, the more bearish sentiment was in the market that day (more puts than calls). A number higher than 0.7 is generally considered neutral-to-bearish.
# calculate put call ratio for each ticker and add it to the put call ratio data frame
lapply(
1:length(tickers),
function(x){
column_name <- tickers[x]
if(sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "C","volume"]) != 0){
pcr_list[,column_name] <<- sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "P","volume"]) / sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "C","volume"])
} else {
pcr_list[,column_name] <<- NA
}
}
)
## [[1]]
## [1] 1.915155
##
## [[2]]
## [1] 1.715442
##
## [[3]]
## [1] 3.06898
##
## [[4]]
## [1] NA
##
## [[5]]
## [1] 0.4678065
##
## [[6]]
## [1] 1.495895
##
## [[7]]
## [1] 1.388159
##
## [[8]]
## [1] 48.32862
##
## [[9]]
## [1] 0.6237533
##
## [[10]]
## [1] 4.414454
##
## [[11]]
## [1] 1.189949
##
## [[12]]
## [1] 6.290867
##
## [[13]]
## [1] 1.965305
##
## [[14]]
## [1] 0.9599455
##
## [[15]]
## [1] 4.549
##
## [[16]]
## [1] 4.712251
##
## [[17]]
## [1] 1.221854
##
## [[18]]
## [1] 0.371338
##
## [[19]]
## [1] 0.2026978
##
## [[20]]
## [1] 1.096202
##
## [[21]]
## [1] 7.565716
print(pcr_list)
## QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM
## 1 1.915155 1.715442 3.06898 NA 0.4678065 1.495895 1.388159 48.32862 0.6237533
## EWW EWZ XLB XLV XLU XLF XLI XOP
## 1 4.414454 1.189949 6.290867 1.965305 0.9599455 4.549 4.712251 1.221854
## GLD SLV TLT HYG
## 1 0.371338 0.2026978 1.096202 7.565716
Right away we can create a tool more useful than what I can currently get from my Fidelity Active Trader Pro software. Instead of just displaying the put-call ratio for a single underlying, we can do this for every underlying in our watchlist. Since our watchlist consists of ETFs covering certain sectors and asset classes this can give us a pretty good picture of what is going on in the market today. First let’s melt our data.
# melt our data for our plot
melted_pcr_list <- reshape2::melt(
pcr_list[, colSums(is.na(pcr_list)) < nrow(pcr_list)],
variable.name = "Underlying",
value.name = "put_call_ratio"
)
## No id variables; using all as measure variables
# turn the underlying ticker into a factor
melted_pcr_list$Underlying <- factor(
melted_pcr_list$Underlying,
levels = melted_pcr_list$Underlying[order(melted_pcr_list$put_call_ratio)]
)
print(melted_pcr_list)
## Underlying put_call_ratio
## 1 QQQ 1.9151548
## 2 SPY 1.7154418
## 3 IWM 3.0689800
## 4 FXI 0.4678065
## 5 DIA 1.4958950
## 6 ARKK 1.3881586
## 7 FEZ 48.3286219
## 8 EEM 0.6237533
## 9 EWW 4.4144543
## 10 EWZ 1.1899488
## 11 XLB 6.2908665
## 12 XLV 1.9653047
## 13 XLU 0.9599455
## 14 XLF 4.5489996
## 15 XLI 4.7122507
## 16 XOP 1.2218543
## 17 GLD 0.3713380
## 18 SLV 0.2026978
## 19 TLT 1.0962025
## 20 HYG 7.5657157
Next let’s plot our data as a bar chart so we can really see the differences. I used the Rushmore1
palette from the wesanderson
package because I am a shameless millennial. Until the next post, “I’ll just go back out the window.”
ggplot(
data =melted_pcr_list,
aes(
y = Underlying,
x = put_call_ratio,
fill = Underlying
)
) +
geom_bar(stat = "identity") +
ggtitle("Watchlist Put-Call Ratios") +
theme_minimal() +
theme(legend.position = "none") +
scale_fill_manual(values = colorRampPalette(wes_palettes$Rushmore1)(nrow(melted_pcr_list))) +
xlab("Put Call Ratio")
Part 2: Basic Options SME