Gamma Gamma Hey Part One

12 June 2022

Primary Data Collection and Cleaning

Abstract

In this first post in a series analyzing real-world derivatives data, we set up a Raspberry Pi 4 with a R Shiny Server. R and shell scripts are used to create a table within the PostgreSQL database to hold the options data. Additional scripts and a cron job are used to automate scraping data from the CBOE website and appending the data to the table in the PostgreSQL database. An R markdown is used to take an initial look at our data.

Background

I have been doing some thinking over the past months about how my previous projects have been guilty of trying to cover too much ground at once. You take some data, you slap a model on it, voila! A true data project should be more than that. It should grow in time with your understanding of the data. To truly understand that data requires more than just a one-off project and a bit of research. It is with this intention that I aim to start this post as a series going through the entire lifecycle of a data project, from inception and data sourcing to advanced models and system maintenance. In this first post we will start our journey at a carte blanche. We have no fancy models. We have no data. We have nothing except a Raspberry Pi 4 and a curiosity about options. The options market can be a (very dangerous) playground for a statistics nerd. Plenty of APIs can be found for finding stock data in R with some readily available libraries. Considerably less sources exist for options, but I am spoiled. I don’t want Yahoo Finance data on an API someone else built. Why have room temperature Dasani when you can go drink from an Alpine spring? I want the source! Bring me to Pablo! This being said, the only rational conclusion is to scrape data directly from the CBOE ourselves. I don’t just want this data for today, nor do I want the limited historical data being offered. Predictably, that will cost you. This is not a good start for an enterprise aimed primarily at not losing money! We may not be able to do this for past data (or tick-level data), but one solution is to create our own database, starting today.

Setting up the Raspberry Pi and R shiny Server

Hardware

The Raspberry Pi is the perfect tool to carry out our scraping needs. It’s portable (currently hidden behind a night stand), uses very little power, I don’t have to keep my main laptop on all the time, and we can dedicate it solely to our project.

system(
  "cat /sys/firmware/devicetree/base/model",
  intern = TRUE
)

## [1] "Raspberry Pi 4 Model B Rev 1.4"

Memory

I am using a 128GB sd card to hold both the OS and all of our data (we will probably need to mount external storage eventually). For our operating system we want to choose the lightest possible while still being able to perform our task(s) at hand. We will be using Debian Bullseye.

system(
  "df -h",
  intern = TRUE
)

## [1] "Filesystem      Size  Used Avail Use% Mounted on"    
## [2] "/dev/root       118G   15G   98G  14% /"             
## [3] "devtmpfs        3.7G     0  3.7G   0% /dev"          
## [4] "tmpfs           3.9G   16K  3.9G   1% /dev/shm"      
## [5] "tmpfs           1.6G  776K  1.6G   1% /run"          
## [6] "tmpfs           5.0M  4.0K  5.0M   1% /run/lock"     
## [7] "/dev/mmcblk0p1  253M   31M  222M  12% /boot"         
## [8] "tmpfs           787M     0  787M   0% /run/user/1000"

Setting up the R Shiny Server

I followed this guide to install R shiny Server on a Raspberry Pi. There is an updated article that uses an Ansible build and would probably play much nicer with my 64-bit Raspberry Pi, but meh. The instructions work (mostly) and if you take enough stress-breaks you’ll get there. The main work-around involves getting a 64-bit version of NGINX and node for arch instead of the 32-bit. Nothing too crazy. This does take about a day though, so make sure you bring snacks.

version[c(2,13)]

##                _                           
## arch           aarch64                     
## version.string R version 4.0.2 (2020-06-22)

installed.packages()[,3]

##        askpass     assertthat      backports          bench            bit 
##          "1.1"        "0.2.1"        "1.4.1"        "1.1.2"        "4.0.4" 
##          bit64           blob           boot           brio          broom 
##        "4.0.5"        "1.2.3"       "1.3-28"        "1.1.3"        "0.8.0" 
##          callr          class        cluster      codetools     colorspace 
##        "3.7.0"       "7.3-20"        "2.1.3"       "0.2-18"        "2.0-3" 
##           covr          cpp11         crayon           curl     data.table 
##        "3.5.1"        "0.4.2"        "1.5.1"        "4.3.2"       "1.14.2" 
##            DBI         dbplyr           desc        diffobj          dplyr 
##        "1.1.2"        "2.2.0"        "1.4.1"        "0.3.5"        "1.0.9" 
##       ellipsis       evaluate          fansi         farver       filehash 
##        "0.3.2"         "0.15"        "1.0.3"        "2.1.0"        "2.4-3" 
##        foreign       generics        ggplot2           glue       gridBase 
##       "0.8-82"        "0.1.2"        "3.3.6"        "1.6.2"        "0.4-7" 
##      gridExtra         gtable          highr           httr         igraph 
##          "2.3"        "0.3.0"          "0.9"        "1.4.3"        "1.3.1" 
##        isoband       jsonlite     KernSmooth          knitr       labeling 
##        "0.2.5"        "1.8.0"      "2.23-20"         "1.39"        "0.4.2" 
##         Lahman        lattice       lazyeval      lifecycle         lobstr 
##       "10.0-1"      "0.20-45"        "0.2.2"        "1.0.1"        "1.1.1" 
##      lubridate       magrittr           MASS         Matrix        memoise 
##        "1.8.0"        "2.0.3"       "7.3-57"        "1.4-1"        "2.0.1" 
##           mgcv microbenchmark        munsell           nlme           nnet 
##       "1.8-40"        "1.4.9"        "0.5.0"      "3.1-157"       "7.3-17" 
##   nycflights13        openssl       pdftools         pillar      pkgconfig 
##        "1.0.2"        "2.0.2"        "3.2.1"        "1.7.0"        "2.0.3" 
##        pkgload          plogr           plyr            png         praise 
##        "1.2.4"        "0.2.0"        "1.8.7"        "0.1-7"        "1.0.0" 
##       processx        profmem             ps          purrr           qpdf 
##        "3.6.0"        "0.6.0"        "1.7.0"        "0.3.4"        "1.2.0" 
##             R6   RColorBrewer       rematch2       reshape2            rex 
##        "2.5.1"        "1.1-3"        "2.1.2"        "1.4.4"        "1.2.1" 
##          rlang      rmarkdown          rpart      rprojroot        RSQLite 
##        "1.0.2"         "2.14"       "4.1.16"        "2.0.3"       "2.2.14" 
##         scales        spatial        stringi        stringr       survival 
##        "1.2.0"       "7.3-15"        "1.7.6"        "1.4.0"        "3.3-1" 
##            sys       testthat         tibble          tidyr     tidyselect 
##          "3.4"        "3.1.4"        "3.1.7"        "1.2.0"        "1.1.2" 
##     tikzDevice        tinytex        treemap           utf8          vctrs 
##     "0.12.3.1"         "0.39"        "2.4-3"        "1.2.2"        "0.4.1" 
##    viridisLite          waldo    wesanderson           xfun           yaml 
##        "0.4.0"        "0.4.0"        "0.3.6"         "0.31"        "2.3.5" 
##       evaluate        getPass          highr          knitr       markdown 
##         "0.15"        "0.2-2"          "0.9"         "1.39"          "1.1" 
##      rmarkdown    RPostgreSQL     rstudioapi        stringr       timeDate 
##         "2.14"        "0.7-3"         "0.13"        "1.4.0"     "3043.102" 
##        tinytex           xfun           yaml           base      base64enc 
##         "0.39"         "0.31"        "2.3.5"        "4.0.2"        "0.1-3" 
##           boot          bslib         cachem          Cairo          class 
##       "1.3-25"        "0.3.1"        "1.0.6"       "1.5-15"       "7.3-17" 
##            cli        cluster      codetools     commonmark       compiler 
##        "3.3.0"        "2.1.0"       "0.2-16"        "1.8.0"        "4.0.2" 
##         crayon       datasets            DBI         digest          dplyr 
##        "1.5.1"        "4.0.2"        "1.1.2"       "0.6.29"        "1.0.9" 
##       ellipsis          fansi        fastmap    fontawesome        foreign 
##        "0.3.2"        "1.0.3"        "1.1.0"        "0.2.2"       "0.8-80" 
##             fs       generics           glue       graphics      grDevices 
##        "1.5.2"        "0.1.2"        "1.6.2"        "4.0.2"        "4.0.2" 
##           grid      htmltools         httpuv      jquerylib       jsonlite 
##        "4.0.2"        "0.5.2"        "1.6.5"        "0.1.4"        "1.8.0" 
##     KernSmooth          later        lattice      lifecycle       magrittr 
##      "2.23-17"        "1.3.0"      "0.20-41"        "1.0.1"        "2.0.3" 
##           MASS         Matrix        methods           mgcv           mime 
##     "7.3-51.6"       "1.2-18"        "4.0.2"       "1.8-31"         "0.12" 
##           nlme           nnet       parallel         pillar      pkgconfig 
##      "3.1-148"       "7.3-14"        "4.0.2"        "1.7.0"        "2.0.3" 
##       promises          purrr             R6       rappdirs           Rcpp 
##      "1.2.0.1"        "0.3.4"        "2.5.1"        "0.3.3"      "1.0.8.3" 
##          rlang          rpart           sass          shiny    sourcetools 
##        "1.0.2"       "4.1-15"        "0.4.1"        "1.7.1"        "0.1.7" 
##        spatial        splines          stats         stats4        stringi 
##       "7.3-12"        "4.0.2"        "4.0.2"        "4.0.2"        "1.7.6" 
##       survival          tcltk         tibble     tidyselect          tools 
##       "3.1-12"        "4.0.2"        "3.1.7"        "1.1.2"        "4.0.2" 
##           utf8          utils          vctrs          withr         xtable 
##        "1.2.2"        "4.0.2"        "0.4.1"        "2.5.0"        "1.8-4"

Before we proceed

A couple of things to keep in mind when setting this up. Our Raspberry Pi is good for scraping, but its not going to be training neural nets anytime soon. We just want to use this to grab data and store it. We can still use the Raspberry Pi to do some simple plots (and create this Rmarkdown), but we are probably going to do anything more in the way of heavy lifting on a more powerful machine.

Setting up PostgreSQL

So now we have a situation where we will ideally be collecting an ever increasing amount of data. Saving this data as an .Rdata object or a .csv will quickly add up in terms of memory, and we would be stuck importing all of the data before we could filter out what we wanted, which would slow down any analysis that uses it every time we want to call this object from memory. In addition, we previously stated that we wanted to be able to do more advanced models on other machines. Sending a large data file through SCP is tedious and tough to replicate at scale. The PostgreSQL is a light RDBMS that is already included in our Shiny build, so let’s just use that. We will be using the public schema, but be sure to set up a db before proceeding. R has many packages that will allow users to connect to a PostgreSQL database. We will be using RPostgreSQL and DBI, but there are others.

Scraping options data from the CBOE

We can scrape data directly from the CBOE delayed quotes page. Here is an example of the SPY option chain. This provides plenty of info, including calculating the Greeks for us. Unfortunately, the CBOE uses scripts on their page to prevent scrapers from getting their data. Fortunately, the internet exists. Here is a link showing how to do this specifically to the CBOE. Here is another link on how to scrape data from sites running scripts. Go nuts.

Scraping a single ticker (scraper.R)

To start we only need the jsonlite package to scrape, the others are for manipulation and cleaning.

# import our libraries
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(jsonlite)
library(stringi)

We create a variable called ticker to hold our desired value, then supply it to the string to make our ticker request.

# let's pick a ticker to scrape
ticker <- "QQQ"

# create our scraping address
scrape_target <- paste0(
  "https://cdn.cboe.com/api/global/delayed_quotes/options/",
  ticker,
  ".json"
  )
  
print(scrape_target)

## [1] "https://cdn.cboe.com/api/global/delayed_quotes/options/QQQ.json"

This will fetch data for every strike for every expiration for that ticker.

# next we read the json data at our scrape target
scrape_data <- read_json(
    scrape_target,
    simplifyVector = TRUE
  )

This returns a list object to us (think of it as R’s best approximation for json data nested structure).

typeof(scrape_data)

## [1] "list"

Let’s take a quick look at where most of our useful data is housed.

glimpse(scrape_data$data$options)

## Rows: 7,032
## Columns: 23
## $ option           <chr> "QQQ220613C00230000", "QQQ220613P00230000", "QQQ22061…
## $ bid              <dbl> 59.11, 0.00, 54.11, 0.00, 49.11, 0.00, 44.11, 0.00, 3…
## $ bid_size         <dbl> 50, 0, 50, 0, 75, 0, 50, 0, 45, 0, 45, 0, 50, 0, 75, …
## $ ask              <dbl> 59.44, 0.01, 54.44, 0.01, 49.42, 0.01, 44.46, 0.02, 3…
## $ ask_size         <dbl> 50, 106, 50, 85, 75, 70, 75, 212, 50, 5, 50, 81, 75, …
## $ iv               <dbl> 1.7101, 0.8180, 1.5628, 0.7465, 1.4124, 0.6762, 1.285…
## $ open_interest    <dbl> 0, 514, 0, 7, 0, 100, 0, 6, 0, 401, 0, 3, 0, 3, 0, 30…
## $ volume           <dbl> 10, 1, 0, 0, 16, 0, 0, 200, 10, 401, 6, 4, 0, 1, 24, …
## $ delta            <dbl> 0.9994, -0.0005, 0.9991, -0.0008, 0.9987, -0.0012, 0.…
## $ gamma            <dbl> 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002, 0.000…
## $ theta            <dbl> -0.0010, -0.0026, -0.0022, -0.0038, -0.0039, -0.0054,…
## $ rho              <dbl> 0.0172, 0.0000, 0.0186, 0.0000, 0.0195, 0.0000, 0.020…
## $ vega             <dbl> 0.0005, 0.0005, 0.0007, 0.0007, 0.0011, 0.0011, 0.001…
## $ theo             <dbl> 60.1050, 0.0057, 55.1050, 0.0057, 50.1250, 0.0057, 45…
## $ change           <dbl> -11.765, 0.005, 0.000, 0.000, -10.245, 0.000, 0.000, …
## $ open             <dbl> 59.30, 0.01, 0.00, 0.00, 51.19, 0.00, 0.00, 0.01, 40.…
## $ high             <dbl> 59.30, 0.01, 0.00, 0.00, 51.19, 0.00, 0.00, 0.01, 40.…
## $ low              <dbl> 58.96, 0.01, 0.00, 0.00, 50.49, 0.00, 0.00, 0.01, 40.…
## $ tick             <chr> "down", "no_change", "no_change", "down", "down", "no…
## $ last_trade_price <dbl> 58.96, 0.01, 0.00, 0.01, 50.49, 0.02, 0.00, 0.01, 40.…
## $ last_trade_time  <chr> "2022-06-10T12:24:25", "2022-06-10T15:51:18", NA, "20…
## $ percent_change   <dbl> -16.6349, 100.0000, 0.0000, 0.0000, -16.8684, 0.0000,…
## $ prev_day_close   <dbl> 70.725, 0.005, 65.725, 0.005, 60.735, 0.005, 55.765, …

This is full of information, but we need to transform this into a dataframe so we can then store this into a similarly structured PostgreSQL database.

# make it into a data frame
option_data <- as.data.frame(
  scrape_data$data$options
)

We also need to change the datetime to an R datetime object.

# clean last trade datetime from string
option_data$last_trade_time <- as.POSIXct(
  option_data$last_trade_time, 
  "%Y-%m-%dT%H:%M:%S", 
  tz = "America/Chicago"
  )

We can also grab some useful data about the underlying from a different part of the nested data.

# grab underlying close
option_data$underlying_close <- scrape_data$data$close

Let’s take a look at the first column of our data.

print(option_data$option[1])

## [1] "QQQ220613C00230000"

This doesn’t look all that informative on the surface, but this actually has a lot of information. If you look at the above, QQQ represents the underlying, 22 represents the year of expiration, 06 the month, 13 the day, c whether it is a call or a put, and 00230000 represents the $230.00 strike. Let’s create some more useful columns from this.

  # grab our strike price from the option name string
  option_data$strike_price <- as.numeric(stri_sub(option_data$option, -8)) / 1000
   
  # grab our contract type from the option name string
  option_data$contract_type <- as.factor(stri_sub(option_data$option, -9, -9))
  
  # grab our underlying ticker from the option name string
  option_data$underlying_ticker <- as.factor(stri_sub(option_data$option, -999, -16))
  
  # grab our underlying ticker from the option name string
  option_data$expiration_date <- as.Date(
    stri_sub(option_data$option, -15, -10),
    format = "%y%m%d"
    )

Options data is unsurpisingly, not static. The value for the $330 call will be different (probably) tomorrow than it is today. We need to grab the date on the date that we scrape the data so we can tell when this data was taken. Let’s create a column for that.

# if we want to set this up to scrape every day
# we need to create a column to record 
# on what day the data was scraped
option_data$scrape_date <- Sys.Date()

We can also use that original string from the first column as the primary key column in our PostgreSQL database. To make this our unique identifier, we append the date of the scrape to the end of the string to make each row unique based on a string combination of the underlying, expiration, strike, and scrape date.

# since we have already taken all the useful data
# from the option column, we can keep using it as
# a unique identifier for our table if we append
# the scrape date to the end of the string
option_data$option <- paste0(
  option_data$option,
  as.character(Sys.Date())
)

Let’s take a look at our cleaned data. We can add more columns later, but this will be what will be stored in the PostgreSQL database.

print(option_data[1,])

##                         option   bid bid_size   ask ask_size     iv
## 1 QQQ220613C002300002022-06-12 59.11       50 59.44       50 1.7101
##   open_interest volume  delta gamma  theta    rho  vega   theo  change open
## 1             0     10 0.9994 1e-04 -0.001 0.0172 5e-04 60.105 -11.765 59.3
##   high   low tick last_trade_price     last_trade_time percent_change
## 1 59.3 58.96 down            58.96 2022-06-10 12:24:25       -16.6349
##   prev_day_close underlying_close strike_price contract_type underlying_ticker
## 1         70.725           288.84          230             C               QQQ
##   expiration_date scrape_date
## 1      2022-06-13  2022-06-12

Iterating our scraper over our watchlist (grab_watchlist.R)

Scraping and storing data is nice, but I want to expand this to get all of the tickers on my watchlist. First we call the previous scraper.R file that defines our scraper function.

# grab our scraper script
source("/home/sebi/optionsBacktesting/scraper.R")

Due to restrictions from my job, these are all broad market ETFs. SLYV is notably less liquid than the others, but I sell covered calls against it, so why not include it too.

  # declare the tickers we want in our watchlist
watchlist <- c(
  "QQQ",
  "SPY",
  "IWM",
  "SLYV",
  "FXI",
  "DIA",
  "ARKK",
  "FEZ",
  "EEM",
  "EWW",
  "EWZ",
  "XLB",
  "XLV",
  "XLU",
  "XLF",
  "XLI",
  "XOP",
  "GLD",
  "SLV",
  "TLT",
  "HYG"
)

We will want to create an empty dataframe where we will append the results of a call to scraper.R for each ticker.

  # create an empty dataframe 
  watchlist_data <- data.frame()

  summary(watchlist_data)

## < table of extent 0 x 0 >

This simply iterates through our watchlist we created, and for every ticker we run grab_option_data() (the function we defined in scraper.R). This will return a dataframe, which we will then stick onto the bottom of our formerly empty data frame.

  # for each ticker in the watchlist grab the option data
  # and union it to the watchlist_data df
  watchlist_data <- do.call(
    rbind,
    lapply(
      watchlist,
      grab_option_data
      )
    )

Let’s check the result of our pull to see all of the unique tickers in the dataframe we created. It should be the same as the watchlist we created.

print(unique(as.factor(watchlist_data$underlying_ticker)))

##  [1] QQQ  SPY  IWM  SLYV FXI  DIA  ARKK FEZ  EEM  EWW  EWZ  XLB  XLV  XLU  XLF 
## [16] XLI  XOP  GLD  SLV  TLT  HYG 
## 21 Levels: QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM EWW EWZ XLB XLV XLU ... HYG

Pulling data for our whole watchlist produces about 49k rows.

# number of rows for a single day's pull
nrow(watchlist_data)

## [1] 49190

Creating a table in PostgreSQL (create_watchlist_table.R)

Now we know how to get the option data for every watchlist ticker for every expiration for every strike. But how do we get this into our PostgreSQL database? First we load the RPostgreSQL library and call the grab_watchlist.R script containing our grab_watchlist() function.

library(RPostgreSQL)

## Loading required package: DBI

source("/home/sebi/optionsBacktesting/grab_watchlist.R")

Next we need to declare our driver name, the name of our database, the host (I am running this markdown from the Raspberry Pi, hence localhost), and the port we wish to connect through. Normally the user would have to enter their username and password to access the database, but I can’t stand around each day waiting to enter my password, nor am I going to put my credentials on the screen for you monsters, so I need to automate this too. Luckily, the PostgreSQL documentation very clearly shows how to set up a .pgpass file on the Raspberry Pi. Note that this is for local PostgreSQL access which I intend to automate. This can only be manually accessed by a user who would have logged in by entering their credentials. I will still require logging remotely to require manually entering the user and password at this point, but we will get to that later.

# set driver name
driver_name <- dbDriver(drvName = "PostgreSQL")
  
# establish database connection
db <- DBI::dbConnect(driver_name,
             dbname="sebi",
             host="localhost",
             port = 5432
             )

We run our grab_watchlist() function to grab today’s data.

# grab todays watchlist data
watchlist_data <- grab_watchlist()

We only intend to build this table once, for the following days we will append to this table. However, we do want to standardize this procedure. The script to create a table will also allow overwriting of existing data. We do not need to worry about defining the schema (will be saved in public) nor whether a column is VARCHAR(9) or VARCHAR(10). R will handle all of this for us.

# create our table with todays data (overwrite if already exists)
DBI::dbWriteTable(
  db,
  value =  watchlist_data,
  name = "watchlist_data",
  overwrite = TRUE,
  row.names = FALSE
)

We do need to define a primary key for the table, which we created from a combination of the option string and scrape_date earlier.

# set primary key column
DBI::dbSendQuery(
    db,
    'ALTER TABLE watchlist_data ADD PRIMARY KEY ("option")'
    )

Let’s be polite and disconnect from our database.

  # disconnect from database
  DBI::dbDisconnect(db)

Appending to the existing PostgreSQL table (append_watchlist_data.R)

library(DBI)
library(RPostgreSQL)

source("/home/sebi/optionsBacktesting/grab_watchlist.R")

The append script looks pretty much the same, except with append = TRUE instead of overwrite = TRUE. This is the script that will run every day.

append_watchlist_data <- function(){
  
  # grab our watchlist data
  watchlist_data <- grab_watchlist()
  
  # establish driver name
  driver_name <- dbDriver(drvName = "PostgreSQL")
  
  # create database connection
  db <- DBI::dbConnect(driver_name,
                       dbname="sebi",
                       host="localhost",
                       port = 5432
  )
  
  # append our scraped data to the table
  DBI::dbWriteTable(
    db,
    name = "watchlist_data",
    value = watchlist_data,
    row.names = FALSE,
    append = TRUE
  )
  
  # close database connection
  DBI::dbDisconnect(db)
}

Only scrape for data on market days (trading_day_scheduler.R)

If we were to scrape this page on weekends, we would get the same result on Saturday as Friday because the market is closed. The same applies for market holidays. To prevent unnecessary duplicates, we only call this script when it is a weekday and not a holiday.

library(timeDate)

source("/home/sebi/optionsBacktesting/append_watchlist_data.R")

if(as.POSIXlt(Sys.Date())$wday %in% 1:5 & !(Sys.Date() %in% as.Date(timeDate::holidayNYSE()))){
  append_watchlist_data()
} else {
  message("Market Closed Today")
}

## Market Closed Today

Setting up a cronjob to automate data scrapes

Automating the task is as simple as setting up a cron job. Make sure to do this as the same user that as you intend tolaunch the scripts. For the sake of not poking the bear we will only run this once a day. The beginning of our line 31 22 * * * means that I want the job to run at every day at 10:31 PM for two reasons:

The market will be closed
10:31 PM is when I stopped messing up the command.

The second part of our cron job tells our Raspberry Pi what to do at that time. In this case, first change directory (cd /home/sebi/optionsBacktesting) , then run the R script trading_day_scheduler.R to see if the market is open and add data to the PostgreSQL database if it is(Rscript trading_day_scheduler.R). The final part (> log.txt 2>&1) sends any terminal information to a log file we created.

system(
  "sudo crontab -u sebi -l",
  intern = TRUE
)

##  [1] "# Edit this file to introduce tasks to be run by cron."                                        
##  [2] "# "                                                                                            
##  [3] "# Each task to run has to be defined through a single line"                                    
##  [4] "# indicating with different fields when the task will be run"                                  
##  [5] "# and what command to run for the task"                                                        
##  [6] "# "                                                                                            
##  [7] "# To define the time you can provide concrete values for"                                      
##  [8] "# minute (m), hour (h), day of month (dom), month (mon),"                                      
##  [9] "# and day of week (dow) or use '*' in these fields (for 'any')."                               
## [10] "# "                                                                                            
## [11] "# Notice that tasks will be started based on the cron's system"                                
## [12] "# daemon's notion of time and timezones."                                                      
## [13] "# "                                                                                            
## [14] "# Output of the crontab jobs (including errors) is sent through"                               
## [15] "# email to the user the crontab file belongs to (unless redirected)."                          
## [16] "# "                                                                                            
## [17] "# For example, you can run a backup of all your user accounts"                                 
## [18] "# at 5 a.m every week with:"                                                                   
## [19] "# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/"                                             
## [20] "# "                                                                                            
## [21] "# For more information see the manual pages of crontab(5) and cron(8)"                         
## [22] "# "                                                                                            
## [23] "31 22 * * * cd /home/sebi/optionsBacktesting && Rscript trading_day_scheduler.R > log.txt 2>&1"
## [24] "# m h  dom mon dow   command"

We’ve made a few scripts so far. Let’s take a look at all of our files again.

system(
  "ls -al /home/sebi/optionsBacktesting/",
  intern = TRUE
)

##  [1] "total 816"                                                                                                                 
##  [2] "drwxr-xr-x  3 sebi sebi   4096 Jun 12 15:57 ."                                                                             
##  [3] "drwxr-xr-x 13 sebi sebi   4096 Jun 12 15:57 .."                                                                            
##  [4] "-rw-r--r--  1 sebi sebi    700 Jun 12 15:57 append_watchlist_data.R"                                                       
##  [5] "-rw-r--r--  1 sebi sebi    836 Jun 12 15:57 create_watchlist_table.R"                                                      
##  [6] "-rw-r--r--  1 sebi sebi 747615 Jun 12 15:57 gamma_gamma_hey_an_options_analysis_part_one_data_collection_and_cleaning.html"
##  [7] "-rw-r--r--  1 sebi sebi  21015 Jun 12 15:57 gamma_gamma_hey_an_options_analysis_part_one_data_collection_and_cleaning.Rmd" 
##  [8] "drwxr-xr-x  8 sebi sebi   4096 Jun 12 15:57 .git"                                                                          
##  [9] "-rw-r--r--  1 sebi sebi    570 Jun 12 15:57 .gitignore"                                                                    
## [10] "-rw-r--r--  1 sebi sebi    718 Jun 12 15:57 grab_watchlist.R"                                                              
## [11] "-rw-r--r--  1 sebi sebi      0 Jun 12 15:57 log.txt"                                                                       
## [12] "-rw-r--r--  1 sebi sebi   2952 Jun 12 15:57 option_eda_scratch.Rmd"                                                        
## [13] "-rw-r--r--  1 sebi sebi    205 Jun 12 15:57 optionsBacktesting.Rproj"                                                      
## [14] "-rw-r--r--  1 sebi sebi    503 Jun 12 15:57 psql_data_pull_test.R"                                                         
## [15] "-rw-r--r--  1 sebi sebi    595 Jun 12 15:57 README.md"                                                                     
## [16] "-rw-r--r--  1 sebi sebi   1848 Jun 12 15:57 scraper.R"                                                                     
## [17] "-rw-r--r--  1 sebi sebi    540 Jun 12 15:57 scratch_notebook.Rmd"                                                          
## [18] "-rw-r--r--  1 sebi sebi    331 Jun 12 15:57 test_postgresql.R"                                                             
## [19] "-rw-r--r--  1 sebi sebi    250 Jun 12 15:57 trading_day_scheduler.R"

Performing an intial query from our database

We’ve reached the end of the first phase. But you’re not into data if your natural sense of curiosity doesn’t start to tingle. Let’s do an initial pull of data from our repository and build a visualization. We will use the package getPass to prompt the user to enter their credentials before accessing the database.

library(getPass)
library(ggplot2)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(reshape2)
library(wesanderson)
library(RColorBrewer)

# set driver name
driver_name <- dbDriver(drvName = "PostgreSQL")

# establish database connection
db <- DBI::dbConnect(driver_name,
                     dbname = "sebi",
                     host = "192.168.0.12",
                     port = 5432,
                     user = "sebi",
                     password = getPass("Enter Password:")
)

res <- dbSendQuery(db, "SELECT * FROM watchlist_data;")
data_pull <- dbFetch(res)
dbClearResult(res)

## [1] TRUE

dbDisconnect(db)

## [1] TRUE

Let’s just look at the most recent day’s data.

# make a vector of the tickers from the data fetch
tickers <- as.character(unique(as.factor(data_pull$underlying_ticker)))

# grab just the options from the most recent pull
recent_pull <- data_pull[data_pull$scrape_date == max(data_pull$scrape_date),]

# a blank dataframe to hold our put call ratios
pcr_list <- data.frame(matrix(ncol = length(tickers), nrow = 1))
colnames(pcr_list) <- tickers

print(pcr_list)

##   QQQ SPY IWM SLYV FXI DIA ARKK FEZ EEM EWW EWZ XLB XLV XLU XLF XLI XOP GLD SLV
## 1  NA  NA  NA   NA  NA  NA   NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
##   TLT HYG
## 1  NA  NA

Now let’s calulate the volume of puts divided by the volume of calls for each ticker (the put-call ratio). The higher the number is, the more bearish sentiment was in the market that day (more puts than calls). A number higher than 0.7 is generally considered neutral-to-bearish.

# calculate put call ratio for each ticker and add it to the put call ratio data frame
lapply(
  1:length(tickers),
  function(x){
    column_name <- tickers[x]
    if(sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "C","volume"]) != 0){
      pcr_list[,column_name] <<- sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "P","volume"]) / sum(recent_pull[recent_pull$underlying_ticker == column_name & recent_pull$contract_type == "C","volume"])
    } else {
        pcr_list[,column_name] <<- NA
      }
  }
)

## [[1]]
## [1] 1.915155
## 
## [[2]]
## [1] 1.715442
## 
## [[3]]
## [1] 3.06898
## 
## [[4]]
## [1] NA
## 
## [[5]]
## [1] 0.4678065
## 
## [[6]]
## [1] 1.495895
## 
## [[7]]
## [1] 1.388159
## 
## [[8]]
## [1] 48.32862
## 
## [[9]]
## [1] 0.6237533
## 
## [[10]]
## [1] 4.414454
## 
## [[11]]
## [1] 1.189949
## 
## [[12]]
## [1] 6.290867
## 
## [[13]]
## [1] 1.965305
## 
## [[14]]
## [1] 0.9599455
## 
## [[15]]
## [1] 4.549
## 
## [[16]]
## [1] 4.712251
## 
## [[17]]
## [1] 1.221854
## 
## [[18]]
## [1] 0.371338
## 
## [[19]]
## [1] 0.2026978
## 
## [[20]]
## [1] 1.096202
## 
## [[21]]
## [1] 7.565716

print(pcr_list)

##        QQQ      SPY     IWM SLYV       FXI      DIA     ARKK      FEZ       EEM
## 1 1.915155 1.715442 3.06898   NA 0.4678065 1.495895 1.388159 48.32862 0.6237533
##        EWW      EWZ      XLB      XLV       XLU   XLF      XLI      XOP
## 1 4.414454 1.189949 6.290867 1.965305 0.9599455 4.549 4.712251 1.221854
##        GLD       SLV      TLT      HYG
## 1 0.371338 0.2026978 1.096202 7.565716

Right away we can create a tool more useful than what I can currently get from my Fidelity Active Trader Pro software. Instead of just displaying the put-call ratio for a single underlying, we can do this for every underlying in our watchlist. Since our watchlist consists of ETFs covering certain sectors and asset classes this can give us a pretty good picture of what is going on in the market today. First let’s melt our data.

# melt our data for our plot
melted_pcr_list <- reshape2::melt(
  pcr_list[, colSums(is.na(pcr_list)) < nrow(pcr_list)],
  variable.name = "Underlying",
  value.name = "put_call_ratio"
)

## No id variables; using all as measure variables

# turn the underlying ticker into a factor
melted_pcr_list$Underlying <- factor(
  melted_pcr_list$Underlying, 
  levels = melted_pcr_list$Underlying[order(melted_pcr_list$put_call_ratio)]
  )

print(melted_pcr_list)

##    Underlying put_call_ratio
## 1         QQQ      1.9151548
## 2         SPY      1.7154418
## 3         IWM      3.0689800
## 4         FXI      0.4678065
## 5         DIA      1.4958950
## 6        ARKK      1.3881586
## 7         FEZ     48.3286219
## 8         EEM      0.6237533
## 9         EWW      4.4144543
## 10        EWZ      1.1899488
## 11        XLB      6.2908665
## 12        XLV      1.9653047
## 13        XLU      0.9599455
## 14        XLF      4.5489996
## 15        XLI      4.7122507
## 16        XOP      1.2218543
## 17        GLD      0.3713380
## 18        SLV      0.2026978
## 19        TLT      1.0962025
## 20        HYG      7.5657157

Next let’s plot our data as a bar chart so we can really see the differences. I used the Rushmore1 palette from the wesanderson package because I am a shameless millennial. Until the next post, “I’ll just go back out the window.”

ggplot(
  data =melted_pcr_list,
  aes(
    y = Underlying,
    x = put_call_ratio,
    fill = Underlying
  )
) +
  geom_bar(stat = "identity") +
  ggtitle("Watchlist Put-Call Ratios") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_fill_manual(values = colorRampPalette(wes_palettes$Rushmore1)(nrow(melted_pcr_list))) +
  xlab("Put Call Ratio")

Part 2: Basic Options SME

Blogging for Algorithms