Building Understanding Out of Incomplete and Biased Datasets using Machine Learning and Databricks

At Atlassian, product analytics exists to help our teams build better products by capturing and describing in-product behaviour. Within our on-premise products, only a subset of customers choose to send us anonymised event data, meaning we have an incomplete and biased dataset. In this world, something as simple as ’what percentage of customers use feature X’ then becomes a non-trivial estimation task. This world becomes further complex when a metric is subadditive, such as estimating distinct users of a pr

1 view