LVD | Analytics Validation Service

Summary

I built a product that validates analytics which is triggered by Android devices. It originally started as a prototype from the Data Team which was only a script that performed 3 validations against a set of analytics in production. I took over that project and made it work end to end via a push of a button. This includes creating an Android Simulation for multiple Android products, wire the scripts onto Jenkins, building on top of the prototype, and integrating the results to S3/Slack. Now, this task runs on a nightly basis, with the option of running it manually. It has caught well over 50 bugs in relation to product analytics.

Android Simulation

To trigger events sent to Treasure Data, I had to write a script to navigate through the Android app. This script uses the Selenium library to click buttons, links, watch videos, navigating from page to page, etc., adding up to 20 steps per scenario. There are about 4-5 scenarios per Android product, in order to cover each event at least once, as well as covering the high priority user scenarios.

There are 4 different custom-built Android devices, each with its own unique mobile app. I had to leverage the Selenium Grid to manage different devices to be used for the simulation. Each device is hooked up to an Appium Server, which is needed to interpret the Selenium script for the Android device. The script takes a variable input to determine which device to run against.

At the end of the job run, the Android log from when the test starts and ends are sent to S3. The successful Jenkins job notification is sent to Slack.

Sample Selenium Run:

Home.navigate_to_google_homepage()
# Event 1) Homepage event loaded

Home.search("Lam Doan")
# Event 2) Searching event
# Event 3) Google Search button clicked

Analytics Validation

The validation expected results reside on Google Sheet. This is definitely not the best practice since it relies on Google being up, definitely a tech debt to move it somewhere else. The script authenticates into Google and extracts all of the contents needed. This is further explained below under the Validation section. Once it fetches the data, it gets converted to a dataframe. Panda is an excellent Python library for handling dataframes.

Next, the script queries Treasure Data (TD) based on device type and the time range and converts the result set into a dataframe.

There is a lot of data transformation going on for both of these dataframes so that they can be used to compare against each other. Treasure Data dataframe is the actual results whereas the Google data Dataframe is the expected results. When any of the results don't match, the following steps happen:

Send a failure slack notification to the appropriate channel
Include a link to the Jenkins analytics job run
Include a link to the Jenkins Android Simulation job run
The dataframe with the bad results get converted to a CSV file and is sent to S3 based on device_type/date_time_job_run.

Regardless of if the run is successful or not, the TD result is always uploaded to S3 for reference.

AVS workflow

Validation

As mentioned earlier, there are 4 specific validations done for each Android device. Each device type has about 30 columns used. Having good data is necessary for the Data team to use and predict user experience.

Count of Events Check

A specific event can be triggered in different ways. It is also important to note the common user paths in triggering these events. Once all details are taking into consideration, then a series of steps are incorporated. Because the Android devices are unstable and slow, I had to break them up into multiple scenarios so that if the test breaks, re-running that scenario wouldn't take a long time. After steps are completed, I counted each of the events and recorded them into the Google Sheets. A flaw in this version is that if event A was not triggered in step 6 and event A was trigger twice accidentally during step 12, then this validation wouldn't have caught that. The fix would be to compare the timestamp of when the before and after events were triggered, like a dual check to make sure the count is what we expected.

Null Check

Empty column values can be bad if data is supposed to exist! However, some columns are allowed to be null depending on other column values. It's hard to tell which columns should or should not be null. In the Google Doc, there is a tab that lists the requirements for all column values. The script harnesses those requirements and then creates a new dataframe of column values and specifies which columns should or should not be null. The TD result set is iterated over this dataframe to perform the null check on the corresponding columns.

Enumerated Value Type Check

Some columns should only contain enumerated values e.g. column A can only have one of the following values ("D", "E", or "F") if column B contains "Z". How do we spot these column values for 30 columns over 4 different devices? Since the requirements are already incorporated in the Google Docs, the script uses that as a reference and generate a new Dataframe. For each column value, we make sure that it can only be one of the enumerated values.

Data Validity Check

There are some columns with device_id = 5 versus device_id = 999999999999999. Let's be real, there aren't that many devices out there to even reach such a high device_id. I filled the range of valid values for each column values for columns that can be ambiguous. There is definitely a better way to do this e.g. query data in the database and look for min/max of values. But for version 1.0, it will be a manual step of coming up with these data just so that we have some validation up and running. If any columns have outrageous value, it will raise a flag.

Jenkins

The Jenkins job has an agent which points to a VM that docker creates when ran. The VM runs the script and then shuts down afterward. The Android Simulation and Analytics Validation runs are stored in Jenkins for the purpose of referencing when the last job ran successfully or first failed. It makes it easier to know when a specific bug started surfacing. Because these scripts run on a nightly basis, it's easy to pinpoint which development build broke analytics. UPDATE: After I left the company, the dev team was able to hook this test into the development build process so that it runs right after the Android build is finished.

The Android Simulation script runs first so that the events can be sent to Treasure Data. After the run is complete and successful, it kicks off the Analytics job. The Analytics job sleeps for 5 minutes, the time it takes for Treasure Data to ingest the events and make the results query-able. Then the script runs and validates the results based on the 4 validation check.