HIPAA, or Health Insurance Portability and Accountability Act, requires firms to make use of safeguards for your medical data. Imagine in case your on-line order for drug prescriptions or medical provides was leaked onto the Internet! Consumers need firms to be accountable stewards of medical data.
For Iterable to serve the medical business, HIPAA compliance is a should. To grow to be HIPAA compliant, Iterable must signal Business Associate Agreements (BAAs) with any vendor to whom we ship Personally Identifiable Information (PII), reminiscent of email and IP addresses. BAAs be sure that our companions additionally make use of the suitable safeguards for PII dealing with. If any vendor refuses to signal, then Iterable can’t ship PII to the seller.
DataDog, the APM vendor Iterable makes use of, is crucial for how Iterable operates. DataDog gives metrics and graphs that present the well being of the Iterable service and assist pinpoint any efficiency issues that come up. At the time, DataDog couldn’t signal a BAA with Iterable. So this implies Iterable couldn’t ship any PIIs to DataDog. Any PIIs despatched over to DataDog would grow to be a safety incident in which Iterable would wish to rapidly stop sending the PIIs and request DataDog to erase the information on their finish, a method or one other.
I used to be to steer the trouble to strip all PIIs despatched to DataDog to meet our contractual obligation for a brand new buyer.
The Scope
Iterable sends spans to DataDog. The span consists of timing data, tags, and little one spans. See the next instance:
The tags include any data that Iterable sends, a few of which can embody PIIs. So the scope boiled down to making sure no PIIs remained in tags for HIPAA clients.
Tags had been created in roughly 2 teams of the way. The first group seemed like the next.
setTags
known as setTag
beneath. So far so good, the construction was very predictable.
The second group seemed like this.
hint.setError(msg)
contained the next line:
So once more, it will be a matter of not passing PII in msg
to DataDog for HIPAA clients.
hint.setError(exception)
boiled right down to the next:
So 3 tags had been set in this case. The exception may very well be a 3rd social gathering one for which Iterable had no management. So the exception message was additionally one thing that Iterable can’t management. How would builders validate all of the doable messages?
In addition, there have been some engineering necessities. PII filtering ought to be easy and principally computerized. That method, builders largely didn’t have to consider it, making improvement straightforward and protected.
Moreover, spans had been used in every single place in the code base, together with latency-sensitive providers like API. Thus, the efficiency overhead should be minimal.
In conclusion, there are three main necessities for this mission:
- No PIIs can stay in tags from spans for HIPAA clients.
- PII filtering ought to be easy and principally computerized.
- Performance overhead should be minimal.
The Solutions
Our staff had a decent deadline to finish this work earlier than the shopper signed their contract. There had been a variety of artistic options that we experimented with earlier than we discovered the precise strategy. Let’s take a look at how the mission unfolded.
Automatic Hashing
Some tags with PIIs, reminiscent of emails, are necessary for debugging however can’t be despatched to DataDog. Hashing was a suitable resolution to cover PII as a result of deriving the unique string from hashes is tough. Yet, hashes are straightforward to derive from the unique string, permitting an engineer to debug what occurred to a selected recipient.
To routinely hash tags containing PII, an engineer created a brand new class, Sha256
. Such tags would take a Sha256
kind, which as soon as created contained solely the SHA-256 of the unique String. This made it fool-proof for such tags to at all times include a SHA-256 hash.
Other Attempts
Then one other engineer created a TraceFilter
class to obfuscate PIIs inside a tag. He created common expressions to go looking for email username or area inside a string. The drawback, nonetheless, was that the answer was sort of sophisticated, so it was exhausting to validate the correctness. What occurs when there are two or extra email addresses in the tag? When the implications are safety incidents, an advanced resolution wouldn’t do. The PR was closed.
In one other PR, the identical engineer gave totally different TraceTag
s differing types. And every kind implements a technique for writing its tag content material. A sort with out PII would simply write its worth into the tag. A sort with PII would obfuscate one way or the other. After a name for evaluate to all of the engineers went out, there have been so many feedback that nobody might agree on precisely what to do, and nobody permitted the PR. That PR additionally ended up getting scrapped.
A Nuclear Solution
With solely a month left earlier than the deadline, my supervisor and I got here up with a PII Kill Switch, a worldwide characteristic flag, to meet the requirement. Turning on the characteristic flag would redact all hint tags and exception messages. This would be sure that no PII would ever get despatched to DataDog. Then I might slowly enable extra PIIs from non-HIPAA clients to be despatched to DataDog.
As PII Kill Switch was going to be a characteristic flag, hint creation would have a dependency on this characteristic flag, which might be saved in the database. Since hint creation code has no dependency on any knowledge retailer, this alteration required injecting a brand new dependency to create traces.
Originally, the code seems like
For this PR, I created a traceFactory
to create the top-level traces, which might move down whether or not the kill change is on or off. So I can have one thing like
Note that this isn’t the one option to cope with the brand new dependency. Another method could be to easily inject a TraceService
into each technique of TraceUtils
as an implicit argument. But on the time, the above technique of injecting dependency appeared extra pure. The end result, nonetheless, is over 1,000 added or modified traces!
When I later had an opportunity to go over the PR at our Architecture Support Group (ASG), the adjustments had been thought-about too intensive. And the PR additionally eliminated doubtlessly an excessive amount of data as all traces with PIIs could be affected, not simply for the brand new buyer!
The Quick Fix
An different resolution is proposed that will require far fewer adjustments. First, the brand new buyer will be hard-coded. This means no new dependency injected. So the change is minimal. This would assist us make the deadline coming in one week! Then a subsequent resolution shall be launched afterward to permit extra clients to be HIPAA compliant in a extra normal method.
The new hard-coding resolution additionally launched a option to filter out PII. Many traces include org.id
(group id), although many solely have mission.id
(mission ID). And a minority has neither.
In Iterable, a buyer (an org) can have a number of tasks, and every mission belongs to an org. So for the primary move with the hard-coded resolution, the choice on whether or not a hint can include PII includes checking whether or not a hint and its ancestors has an org.id
tag.
If an org.id
tag is discovered, the org.id
is checked in opposition to the hard-coded org ID. If the org ID matches the hard-coded org ID, PII is prohibited.
Any tag which will have PII merely has its content material changed with redacted
. If no org.id
tag is discovered, then assume the worst case and redact potential PII. Otherwise, maintain the tag content material. As for exceptions, the previous exception dealing with code was copied from the previous to the brand new PR.
As a results of the smaller scope of adjustments, the PR was submitted and permitted in just a few days, and the contract was fulfilled. On the day of the contractual obligation, the engineers accountable for making Iterable HIPAA compliant had a brief social gathering with our CEO to rejoice the success!
Follow-Ups
Next, work commenced to generalize to arbitrary orgs. A brand new column, data_policy
, is added to the organizations desk in the database, taking the doable values Unrestricted
and HIPAA
, encoded as enums in Scala. This permits totally different values for data_policy
in the long run. To guarantee database entry would introduce minimal overhead, a refreshing cache is used.
The in-memory cache would fetch the record of all orgs with data_policy
worth of HIPAA
initially and periodically fetch the entire record once more. And the cache then atomically swaps out the entire record by switching references. From the angle of a consumer of the cache, the worth is at all times in reminiscence. Thus, the refreshing cache minimizes the efficiency hit from utilizing the database.
A brand new dependency, TraceService
, is launched and injected into each technique of TraceUtils
as an implicit argument. This TraceService
has a dependency on the refreshing cache to determine which orgs require HIPAA compliance. Coincidentally, Iterable already makes use of hint for construction logging known as occasion stream, so in every single place traces are used, eventStreamer
is injected as a dependency.
So a fast change is made by changing eventStreamer
with traceService
and making eventStreamer
a member of traceService
. This search-and-replace modified about 300 traces in about 20 minutes. All these adjustments successfully take about one other week or so to complete, nicely inside the deadline for the subsequent buyer to be HIPAA compliant.
Eventually, just a few extra optimizations had been made. A option to whitelist exceptions whose messages don’t include PII is added. A whitelist exception merely wants to increase ThrowableWithoutPii
for its exception message to be despatched to DataDog even for traces doubtlessly from HIPAA clients.
Conclusion
So far, we’ve seen no safety incidents from injecting PII into hint tags.
Recently, we determined that we really needed to keep away from sending any PII to DataDog. And it took solely a one-line change to make this work. We merely flipped the PII coverage calculation to at all times return PiiProhibited
, a win for modularity.
Eventually, an identical refactoring to TraceFactory
and one other refactoring to group hint tags by sorts had been rewritten and merged by an architect. Looking again, I wanted I had taken a extra iterative strategy utilizing smaller PRs, with refactorings submitted individually from implementations. Even although it’s extra effort to create a number of PRs, they’d be permitted extra rapidly, ensuing in a a lot sooner time to deploy.
In the tip, there are not any safety incidents from injecting PII into hint tags for HIPAA clients. I attribute our success to picking less complicated options. The concept of filtering hint content material is overly sophisticated and makes it exhausting to ensure accuracy, and so we discarded the thought. Having PII filtering that the developer doesn’t have to consider is a giant win each for developer productiveness and for safety.