Deepening The Data Lake: How Second-Party Data Increases AI For Enterprises

chrisohara_managingdata_updated

I have been hearing a lot about data lakes lately. Progressive marketers and some large enterprise publishers have been breaking out of traditional data warehouses, mostly used to store structured data, and investing in infrastructure so they can store tons of their first-party data and query it for analytics purposes.

“A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed,” according to Amazon Web Services. “While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.”

A few years ago, data lakes were thought to be limited to Hadoop applications (object storage), but the term is now more broadly applied to an environment in which an enterprise can store both structured and unstructured data and have it organized for fast query processing. In the ad tech and mar tech world, this is almost universally about first-party data. For example, a big airline might want to store transactional data from ecommerce alongside beacon pings to understand how often online ticket buyers in its loyalty program use a certain airport lounge.

However, as we discussed earlier this year, there are many marketers with surprisingly sparse data, like the food marketer who does not get many website visitors or authenticated customers downloading coupons. Today, those marketers face a situation where they want to use data science to do user scoring and modeling but, because they only have enough of their own data to fill a shallow lake, they have trouble justifying the costs of scaling the approach in a way that moves the sales needle.

chris_ohara1

Figure 1: Marketers with sparse data often do not have enough raw data to create measureable outcomes in audience targeting through modeling. Source: Chris O’Hara.

In the example above, we can think of the marketer’s first-party data – media exposure data, email marketing data, website analytics data, etc. – being the water that fills a data lake. That data is pumped into a data management platform (pictured here as a hydroelectric dam), pumped like electricity through ad tech pipes (demand-side platforms, supply-side platforms and ad servers) and finally delivered to places where it is activated (in the town, where people live).

As becomes apparent, this infrastructure can exist with even a tiny bit of water but, at the end of the cycle, not enough electricity will be generated to create decent outcomes and sustain a data-driven approach to marketing. This is a long way of saying that the data itself, both in quality and quantity, is needed in ever-larger amounts to create the potential for better targeting and analytics.

Most marketers today – even those with lots of data – find themselves overly reliant on third-party data to fill in these gaps. However, even if they have the rights to model it in their own environment, there are loads of restrictions on using it for targeting. It is also highly commoditized and can be of questionable provenance. (Is my Ferrari-browsing son really an “auto intender”?) While third-party data can be highly valuable, it would be akin to adding sediment to a data lake, creating murky visibility when trying to peer into the bottom for deep insights.

So, how can marketers fill data lakes with large amounts of high-quality data that can be used for modeling? I am starting to see the emergence of peer-to-peer data-sharing agreements that help marketers fill their lakes, deepen their ability to leverage data science and add layers of artificial intelligence through machine learning to their stacks.

chris_ohara2

Figure 2: Second-party data is simply someone else’s first-party data. When relevant data is added to a data lake, the result is a more robust environment for deeper data-led insights for both targeting and analytics. Source: Chris O’Hara.

In the above example (Figure 2), second-party data deepens the marketer’s data lake, powering the DMP with more rich data that can be used for modeling, activation and analytics. Imagine a huge beer company that was launching a country music promotion for its flagship brand. As a CPG company with relatively sparse amounts of first-party data, the traditional approach would be to seek out music fans of a certain location and demographic through third-party sources and apply those third-party segments to a programmatic campaign.

But what if the beer manufacturer teamed up with a big online ticket seller and arranged a data subscription for “all viewers or buyers of a Garth Brooks ticket in the last 180 days”? Those are exactly the people I would want to target, and they are unavailable anywhere in the third-party data ecosystem.

The data is also of extremely high provenance, and I would also be able to use that data in my own environment, where I could model it against my first-party data, such as site visitors or mobile IDs I gathered when I sponsored free Wi-Fi at the last Country Music Awards. The ability to gather and license those specific data sets and use them for modeling in a data lake is going to create massive outcomes in my addressable campaigns and give me an edge I cannot get using traditional ad network approaches with third-party segments.

Moreover, the flexibility around data capture enables marketers to use highly disparate data sets, combine and normalize them with metadata – and not have to worry about mapping them to a predefined schema. The associative work happens after the query takes place. That means I don’t need a predefined schema in place for that data to become valuable – a way of saying that the inherent observational bias in traditional approaches (“country music fans love mainstream beer, so I’d better capture that”) never hinders the ability to activate against unforeseen insights.

Large, sophisticated marketers and publishers are just starting to get their lakes built and begin gathering the data assets to deepen them, so we will likely see a great many examples of this approach over the coming months.

It’s a great time to be a data-driven marketer.

Follow Chris O’Hara (@chrisohara) and AdExchanger (@adexchanger) on Twitter.

(Interview) On Beacons and DMPs

how-beacons-might-alter-the-data-balance-between-manufacturers-and-retailersHow Beacons Might Alter The Data Balance Between Manufacturers And Retailers

As Salesforce integrates DMP Krux, Chris O’Hara considers how proximity-based personalization will complement access to first-party data. For one thing, imagine how coffeemakers could form the basis of the greatest OOH ad network.

Data Science is the New Measurement

tumblr_m9hc4jz_pp_x1qg0ltco1_400It’s a hoary old chestnut, but “understanding the customer journey” in a world of fragmented consumer attention and multiple devices is not just an AdExchanger meme. Attribution is a big problem, and one that marketers pay dearly for. Getting away from last touch models is hard to begin with. Add in the fact that many of the largest marketers have no actual relationship with the customer (such as CPG, where the customer is actually a wholesaler or retailer), and its gets even harder. Big companies are selling big money solutions to marketers for multi-touch attribution (MTA) and media-mix modeling (MMM), but some marketers feel light years away from a true understanding of what actually moves the sales needle.

As marketers are taking more direct ownership of their own customer relationships via data management platforms, “consumer data platforms” and the like, they are starting to obtain the missing pieces of the measurement puzzle: highly granular, user-level data. Now marketers are starting to pull in more than just media exposure data, but also offline data such as beacon pings, point-of-sale data (where they can get it), modeled purchase data from vendors like Datalogix and IRI, weather data and more to build a true picture. When that data can be associated with a person through a cross-device graph, it’s like going from a blunt 8-pack of Crayolas to a full set of Faber Castells.

Piercing the Retail Veil

Think about the company that makes single-serve coffee machines. Some make their money on the coffee they sell, rather than the machine—but they have absolutely no idea what their consumers like to drink. Again, they sell coffee but don’t really have a complete picture of who buys it or why. Same problem for the beer or soda company, where the sale (and customer data relationship) resides with the retailer. The default is to go to panel-based solutions that sample a tiny percentage of consumers for insights, or waiting for complicated and expensive media mix models to reveal what drove sales lift. But what if a company could partner with a retailer and a beacon company to understand how in-store visitation and even things like an offline visit to a store shelf compared with online media exposure? The marketer could use geofencing to understand where else consumers shopped, offer a mobile coupon so the user could authenticate upon redemption, get access to POS data from the retailer to confirm purchase and understand basket contents—and ultimately tie that data back to media exposure. That sounds a lot like closed-loop attribution to me.

Overcoming Walled Gardens

Why do specialty health sites charge so much for media? Like any other walled garden, they are taking advantage of a unique set of data—and their own data science capabilities—to better understand user intent. (There’s nothing wrong with that, by the way). If I’m a maker of allergy medicine, the most common trigger for purchase is probably the onset of an allergy attack, but how am I supposed to know when someone is about to sneeze? It’s an incredibly tough problem, but one that the large health site can solve, largely thanks to people who have searched for “hay fever” online. Combine that with a 7-day weather forecast, pollen indices, and past search intent behavior, and you have a pretty good model for finding allergy sufferers. However, almost all of that data—plus past purchase data—can be ingested and modeled inside a marketer DMP, enabling the allergy medicine manufacturer to segment those users in a similar way—and then use an overlap analysis to find them on sites with $5 CPMs, rather than $20. That’s the power of user modeling. Why don’t site like Facebook give marketers user-level media exposure data? The question answers itself.

Understanding the Full Journey

Building journeys always falls down due to one missing piece of the puzzle or another. Panel-based models continually overemphasize the power of print and linear television. CRM-based models always look at the journey from the e-mail perspective, and value declared user data above all else. Digital journeys can get pretty granular with media exposure data, but miss big pieces of data from social networks, website interactions, and things that are hard to measure (like location data from beacon exposure). What we are starting to see today is, through the ability to ingest highly differentiated signals, marketers are able to combine granular attribute data to complete the picture. Think about the data a marketer can ingest: All addressable media exposure (ad logs), all mobile app data (SDKs), location data (beacon or 3rd party), modeled sales data (IRI or DLX), actual sale data (POS systems), website visitation data (javascript on the site), media performance data (through click and impression trackers), real people data through a CRM (that’s been hashed and anonymized), survey data that been mapped to a user (pixel-enabled online survey), and even addressable TV exposure (think Comscore’s Rentrak data set). Wow.

Why is “data science the new measurement?” Because, when a marketer has all of that data at their fingertips, something close to true attribution becomes possible. Now that marketers have the right tools to draw with, the winners are going to be the ones with the most artists (data scientists).

It’s a really interesting space to watch. More and more data is becoming available to marketers, who are increasingly owning the data and technology to manage it, and the models are growing more powerful and accurate with every byte of data that enters their systems.

It’s a great time to be a data-driven marketer!

[This post originally appeared in AdExchanger on 8/12/16]

New Whitepaper: Agencies and DMP!

RoleOfTheAgencyInDataManagementWe’ve just published our latest best practice guide, entitled ‘The Role of the Agency in Data Management.’

The report looks at the challenges and opportunities for agencies that want to become trusted stewards of their clients’ data.

I sat down with the author, Chris O’Hara, to find out more.

Q. It seems like the industry press is continually heralding the decline of media agencies, but they seem to be very much alive. What’s your take on the current landscape?

For a very long time, agencies have been dependent upon using low-cost labor for media planning and other low-value operational tasks.

While there are many highly-skilled digital media practitioners – strategists and the like – agencies still work against “cost-plus” models that don’t necessarily map to the new realities in omnichannel marketing.

Over the last several years as marketers have come to license technology – data management platforms (DMP) in particular – agencies have lost some ground to the managed services arms of ad tech companies, systems integrators, and management consultancies.

Q. How do agencies compete?

Agencies aren’t giving up the fight to win more technical and strategic work.

Over the last several years, we have seen many smaller, data-led agencies pop up to support challenging work – and we have also seen holding companies up-level staff and build practice groups to accommodate marketers that are licensing DMP technology and starting to take programmatic buying “in-house.”

It’s a trend that is only accelerating as more and more marketer clients are hiring Chief Data Officers and fusing the media, analytics, and IT departments into “centers of excellence” and the like.

Not only are agencies starting to build consultative practices, but it looks like traditional consultancies are starting to build out agency-like services as well.

Not long ago you wouldn’t think of names like Accenture, McKinsey, Infinitive, and Boston Consulting Group when you think of digital media, but they are working closely with a lot of Fortune 500 marketers to do things like DMP and DSP (demand-side platform) evaluations, programmatic strategy, and even creative work.

We are also seeing CRM-type agencies like Merkle and Epsilon acquire technologies and partner with big cloud companies as they start to work with more of a marketer’s first-party data.

As services businesses, they would love to take share away from traditional agencies.

Q. Who is winning?

I think it’s early days in the battle for supremacy in data-driven marketing, but I think agencies that are nimble and willing to take some risk upfront are well positioned to be successful.

They are the closest to the media budgets of marketers, and those with transparent business models are really strongly trusted partners when it comes to bringing new products to market.

Also, as creative starts to touch data more, this gives them a huge advantage.

You can be as efficient as possible in terms of reaching audiences through technology, but at the end of the day, creative is what drives brand building and ultimately sales.

Q. Why should agencies embrace DMPs? What is in it for them? It seems like yet another platform to operate, and agencies are already managing DSPs, search, direct buys, and things like creative optimization platforms.

Ultimately, agencies must align with the marketer’s strategy, and DMPs are starting to become the single source of “people data” that touch all sorts of execution channels, from email to social.

That being said, DMP implementations can be really tough if an agency isn’t scoped (or paid) to do the additional work that the DMP requires.

Think about it: A marketer licenses a DMP and plops a pretty complicated piece of software on an agency team’s desk and says, “get started!”

That can be a recipe for disaster. Agencies need to be involved in scoping the personnel and work they will be required to do to support new technologies, and marketers are better off involving agencies early on in the process.

Q. So, what do agencies do with DMP technology? How can they succeed?

As you’ll read in the new guide, there are a variety of amazing use cases that come out of the box that agencies can use to immediately make an impact.

Because the DMP can control for the delivery of messages against specific people across all channels, a really low-hanging fruit is frequency management.

Doing it well can eliminate anywhere from, 10-40% of wasteful spending on media that reaches consumers too many times.

Doing analytics around customer journeys is another use case – and one that attribution companies get paid handsomely for.

With this newly discovered data at their fingertips, agencies can start proving value quickly, and build entire practice groups around media efficiency, analytics, data science – even leverage DMP tech to build specialized trading desks. There’s a lot to take advantage of.

Q. You interviewed a lot of senior people in the agency and marketer space. Are they optimistic about the future?

Definitely. It’s sort of a biased sample, since I interviewed a lot of practitioners that do data management on a daily basis.

But I think ultimately everyone sees the need to get a lot better at digital marketing and views technology as the way out of what I consider to be the early and dark ages of addressable marketing.

The pace of change is very rapid, and I think we are seeing that people who really lean into the big problems of the moment like cross-device identity, location-based attribution, and advanced analytics are future-proofing themselves.