This assignment attempts to solve the 2021 IEEE Visual Analytics Science and Technology (VAST) Challenge: Mini-Challenge 2 by applying different visual analytics concepts, methods, and techniques with relevant R data visualisation and data analysis packages.
Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data?
It is challenging to infer the owners of the credit card and loyalty card since there is no data field to join the credit card and gps data.
One possible approach to plot the gps path and check it against timestamp of the credit card transactions. This approach assumes that the person driving the car is the same person making the credit card transaction. Additionally, it assumes that the gps coordinates, timestamp as well as the credit location and timestamp are accurate.
To implement this proposed approach, it assumes the geospatial tracking software installed in the employees car will stop tracking if the vehicle is not moving. Borrowing from concept of ‘Point of Interest’ (POI) from Virginia Tech, POI is identified if the employee stops for more than 5 minutes.
Identify the POIs by computing the difference of gps timestamp. If the difference is greater than 5 minutes, it will be set to poi = TRUE.
gps_poi_sf <- car_gps_sf %>%
group_by(CarID) %>%
mutate(diff = timestamp - lag(timestamp, order_by=CarID)) %>%
mutate(poi = if_else(diff > 60*5, TRUE, FALSE)) %>%
filter(poi == TRUE) %>%
ungroup()
glimpse(gps_poi_sf)
Rows: 3,067
Columns: 11
$ timestamp <dttm> 2014-01-06 06:53:01, 2014-01-06 07:05:01, 2014-01~
$ CarID <fct> 4, 35, 4, 10, 34, 26, 20, 19, 18, 12, 32, 33, 3, 7~
$ date <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, 2~
$ day <ord> Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, ~
$ hour <int> 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,~
$ geometry <POINT [°]> POINT (24.86419 36.07333), POINT (24.87336 3~
$ Deparment <chr> "Executive", "Executive", "Executive", "Executive"~
$ Title <chr> "SVP/CFO", "Environmental Safety Advisor", "SVP/CF~
$ FullName <chr> "Ingrid Barranco", "Willem Vasco-Pais", "Ingrid Ba~
$ diff <drtn> 1057 secs, 1920 secs, 1320 secs, 1980 secs, 1633 ~
$ poi <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR~
Plot the POIs on the backgroup tourist map and identity the employee locations. From the Combined Credit Card & Loyalty Transactions by Hour of Day heatmap of Question 1, the earliest transaction is around 3:00 AM from Kronos Mart while the last transaction is 10:00 PM from Hippokampos. This information can be used to limit the number of POIs.
gps_poi_points <- gps_poi_sf %>%
filter(hour >= 2 & hour <= 23) %>%
select(timestamp,
CarID,
Deparment,
Title,
FullName)
tmap_mode("plot")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_poi_points) +
tm_dots(col = 'red', border.col = 'black', size = 1, alpha = 0.5, jitter = .8) +
tm_facets(by = "FullName", ncol = 2)
Alternatively, plot the gps path of each employee.
gps_path <- car_gps_sf %>%
group_by(CarID) %>%
summarize(m = mean(timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
tmap_mode("plot")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_path) +
tm_lines() +
tm_facets(by = "CarID", ncol = 2)
Create an interactive data table based on the joint credit card and infer the owner based on the POI plot.
cc_owner <- cc_data %>%
select(timestamp, location, last4ccnum)
DT::datatable(cc_owner)
Finally, manually map the credit card transaction purchases timestamp against the POI map. This approach is very time consuming since it entails manual effort. As for most cases, the POI map may show certain points around the vicinity of the location but does not have any corresponding credit card purchases.
The owner of the loyalty card will be known after the credit card owner has been identified. As noted earlier, there is no one-to-one correspondence between the credit and loyalty card.
The heatmap below visualizes the number of transactions between specific credit card and loyalty card to know the highest likelihood of the loyalty card owner.
cc_loyal_count <- cc_loyalty_data %>%
group_by(last4ccnum,loyaltynum) %>%
summarise(count=n())
cc_loyal_correlate <- ggplot(data = cc_loyal_count,
aes(x=loyaltynum, y=as.factor(last4ccnum),
fill = count,
text = paste("Last 4 Credit Card Number :", last4ccnum,"\n",
"Loyalty Card Number:", loyaltynum,"\n",
"Number of transactions :", count))) +
geom_tile()+
scale_fill_gradient(low = "lightsteelblue1", high = "royalblue4") +
ggtitle("Correlation of Credit Card & Loyalty Card by Number of Transactions") +
labs(x = "Loyalty Card Number",y = "Credit Card Number") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90))
ggplotly(cc_loyal_correlate, tooltip = "text")
From the heatmap, it can seen that certain combination of credit card number and loyalty card are most often used together. Example of which are 6901 and L9363 with 28 transactions; 7117 and L6417 also with 28 transactions.
Nevertheless, even for credit card 6901, there are still 9 transactions when loyalty card was not used. Similarly, credit card 7117 has 3 transactions when loyalty card was not used.
Click HERE to view the Visual Detective Assignment Part 4.
For attribution, please cite this work as
Dolit (2021, July 25). Visual Analytics & Applications: Visual Detective Assignment Part 3. Retrieved from https://adolit-vaa.netlify.app/posts/2021-07-26-assignment-3/
BibTeX citation
@misc{dolit2021visual, author = {Dolit, Archie}, title = {Visual Analytics & Applications: Visual Detective Assignment Part 3}, url = {https://adolit-vaa.netlify.app/posts/2021-07-26-assignment-3/}, year = {2021} }