Why Most of Today’s DMPs are Lousy

8022_sucklessHow Granular Data Collection and a Robust Second-Party Data Strategy Changes the Game

The world’s largest marketers and media companies have strongly embraced data management technology to provide personalization for customers that demand Amazon-like experiences. As a single, smart hub for all of their owned data (CRM, email, etc)—and acquired data, such as 3rd party demographic data —DMPs go a long way towards building a sustainable, modern marketing strategy that accounts for massively fragmented digital audiences.

The good news is most enterprises have taken a technological leap of faith, and embraced a data strategy to help them navigate our digital future. The bad news is, the systems they are using today are deeply flawed and do not produce optimal audience segmentation.  

A Little DMP History

Ten years ago, a great thing called the data management platform (DMP) started to power big publishers. These companies wanted to shift power away from ad networks (upon whom the publishers relied to monetize their sites) and give publishers the power to create relevant audiences directly for advertisers. By simply placing a bit of javascript in the header of their websites, DMPs could build audience segments using web cookies, turning the average $2 CPM news reader into a $15 CPM “auto-intender.” By understanding what people read, and the content of those pages,  DMPs could sort people in large audience “segments” and make them available for targeting. Now, instead of handing over 50% of their revenue to ad networks, publishers could pay a monthly licensing fee to a DMP and retain the lion’s share of their digital advertising dollars by creating their own segmented audiences to sell directly to advertisers.

Marketers were slower to embrace DMP technology, and they quickly grasped the opportunity too. Now, instead of depending on ad networks to aggregate reach for them, they started to assemble their own first-party data asset—overlapping their known users with publishers’ segments, and buying access to those more relevant audiences. The more cookies, mobile IDs, and other addressable keys they could collect, the bigger their potential reach. Since most marketers had relatively small amounts of their own data, they supplemented with 3rd-party data—segments of “intenders” from providers like Datalogix, Nielsen, and Acxiom.

The two primary use cases for DMPs have not changed all that much over the years: both sides want to leverage technology to understand their users (analytics) and grow their base of addressable IDs (reach). Put simply, “who are these people interacting with my brand, and how can I find more of them?” DMPs seem really efficient at tackling those basic use cases, until you find out that they were doing it the wrong way the whole time.

What’s the Problem?

To dig a bit deeper, the way first-generation DMPs go about analyzing and expanding audiences is through mapping cookies to a predetermined taxonomy, based on user behavior and context. For example, if my 17-year-old son is browsing an article on the cool new Ferrari online, he would be identified as an “auto intender” and placed in a bucket of other auto intenders. The system would not store any of the data associated with that browsing session, or additional context. It is enough that the online behavior met a predetermined set of rules for “auto-intender” to place that cookie among several hundred thousand other “auto- intenders.”

The problem with a fixed, taxonomy-based collection methodology is just that—it is fixed, and based on a rigid set of rules for data collection. Taxonomy results are stored (“cookie 123 equals auto-intender”)—not the underlying data itself. That is called “schema-on-write,” an approach that writes taxonomy results to an existing table when the data is collected. That was fine for the days when data collection was desktop-based and the costs of data storage were sky-high, but it fails in a mobile world where artificial intelligence systems crave truly granular, attribute-level data collected from all consumer interactions to power machine learning.

There is another way to do this. It’s called “schema-on-read,” which is the opposite of schema-on-write. In these types of systems, all of the underlying data is collected, and the taxonomy result is created upon reading all of the raw data. In this instance, say I collected everything that happened on a popular auto site like Cars.com? I would collect how many pages were viewed, dwell times on ads, all of the clickstream collected in the “build your own” car module, and the data from event pixels that collected how many pictures a user viewed of a particular car model. I would store all of this data so I could look it up later.

Then, if my really smart data science team told me that users who viewed 15 of the 20 car pictures in the photo carousel in one viewing session were 50% more likely to buy a car in the next 30 days than the average user, I would build a segment of such users by “reading” the attribute data I had stored.  This notion—total data storage at the attribute (or “trait”) level, independent of a fixed taxonomy—is called completeness of data. Most DMPs don’t have it.

Why Completeness Matters

Isn’t one auto-intender as good as another, despite how those data were collected? No. Think about the other main uses of DMPs: overlap reporting and indexing. Overlap reporting seeks to overlay an enterprise’s first party data asset with another. This is like taking all the visitors to Ford’s website, and comparing that audience to every user on a non-endemic site, like the Wall Street Journal. Every auto marketer would love to understand which high-income WSJ readers were interested in their latest model. But, how can they understand the real intent of users if they are just tagged as “auto intenders?” How did the publisher come to that conclusion? What signals contributed to having that those users qualify as “intenders” in the first place? How long ago did they engage with an auto article? Was it a story about a horrific traffic crash, or an article on the hottest new model? Without completeness, these “auto intenders” become very vague. Without all of the attributes stored, Ford cannot put their data science team to work to better understand their true intent.

Indexing, the other prominent use case, scores user IDs based on their similarity to a baseline population. For example, a popular women’s publisher like Meredith might have an index score of 150 against a segment of “active moms.” Another way of saying this is that indexing helps understand the “momness” of those women, based on similarity to the  overall population. Index scoring is the way marketers have been buying audience data for the last 20 years. If I can get good reach with an index score above 100 at a good price, then I’m buying those segments all day long. Most of this index-based buying happens with 3rd-party data providers who have been collecting the data in the same flawed way for years. What’s the ultimate source of truth for such indexing? What data underlies the scoring in the first place? The fact is, it is impossible to validate these relevancy scores with the granular, attribute-level data being available to analyze.

Therefore, it is entirely fair to say that most DMPs have excellent intentions, but lack the infrastructure to perform 100% of the most important things DMPs are meant to do: understand IDs, and grow them through overlap analysis and indexing. If the underlying data has been improperly collected (or not there at all), then any type of audience profiling by any means is fundamentally flawed.

Whoops.

What to do?

To be fair, most DMPs were architected during a time when it was unnecessary to collect data through a schema-on-read methodology—and extremely costly. Today’s unrelenting shift to AI-driven marketing necessitates this approach to data collection and storage, and older systems are tooling up to compete. If you want to create a consumer data platform (“CDP”), the hottest new buzzword in marketing, you need to collect data in this way. So, the industry is moving there quickly. That said, many marketers are still stuck in the 1990s. Older DMPs are somewhat like the technology mullet of marketing—businesslike in the front, with something awkward and hideous hidden behind.

Beyond licensing a modern, schema-on-read system for data management so marketers can collect their own data in a granular way, there is another way to do things like indexing and overlap analysis well: license data from other data owners who have collected their data in such a way. This means going well beyond leveraging commoditized third-party data, and looking at the world of second-party data. Done correctly, real audience planning starts with collecting your own data effectively and extends to leveraging similarly collected data from others—second party data that is transparent, exclusive, and unique.

Advertisements

Data’s New Paradigm

there_will_be_blood_master__detail_carouselThe Rules of Data-Driven Marketing are Changing as Data Rights Management Takes Center Stage

Unless you’ve been living off the grid, you’ve seen the promise of “data as the new oil” slowly come to fruition over the last five years. Connected devices are producing data at a Moore’s Law-like rate, and companies are building the artificial intelligence systems to mine that data into fuel that will power our ascension into a new paradigm we can’t yet understand.  Whether you are in the Stephen Hawking camp (“The development of full artificial intelligence could spell the end of the human race”) or the Larry Page camp (“artificial intelligence [is] the ultimate version of Google”), we can all agree that data is the currency in the AI future.

In our world, we are witnessing an incredible synthesis of fast-moving, data-driven advertising technology coming rapidly together with the slower (yet still data-driven) world of marketing technology. Gartner’s Marty Kihn thinks the only way these two worlds tie the knot for the long term is centered around data management platforms. I think he’s right, but I also think what we know as a DMP today will evolve quickly as the data it manages grows and its applications evolve alongside it.

I think the most immediate changes we will bear witness to in this ongoing evolution are the changes in how data—the lifeblood of modern marketing—will be piped among data owners and those who want to use it. Why? Because the way we have been doing it for the past 20 years in incredibly flawed, and second- and third-party data owners are getting the short end of the stick..

Unless you are Google, Facebook, Amazon or the United States government, you will never have enough data as a marketer. Big CPG companies have been collecting data for years (think of rewards programs and the like), but the tens or even hundreds of millions of addressable IDs they have managed to gather often pales in comparison to the billions of people who interact with their brands every day across the globe. To fill the gaps, they turn to second- and third-party sources of data for segmentation, targeting and analytics.

In the wild and wooly early days, where most digital consumers were targeted on desktop devices using cookies, this meant buying pre-packaged segments of audience data. Website owners were happy to have some javascript placed on their pages, and let data companies gather cookies in return for a monthly rent check—as long as it didn’t interfere with the revenue from their direct sales efforts. Part of that deal included offering their data as a mechanism of insights and discovery for marketers and agencies. Adtech companies would showcase their data in various ways, or use it as an input to lookalike modeling. In the end, data owner would infrequently be rewarded if the data found its way into a delivered advertising impression.

The real usage of the data was sometimes unknown. Many cookies got hijacked for use into other—even competitive—systems, and there was little transparency into what was happening with the underlying data asset. But, the checks still came every single month. The approach worked when the best data owners (quality publishers) had a thriving direct sales channel.

Fast-forward to today, the game has changed considerably. More than half of enterprise marketers own a DMP, and even smaller mid-market advertisers are starting to license data technology. Data is being valued as a true financial asset and differentiator. On the publisher’s side, manual sales continue to plummet as programmatic evolves and header bidding supercharges the direct model with big data technology. In short, marketers need more and more quality data to feed the machines they are building to compete, and publishers are getting better and more granular control over their data.

More importantly, data owners are beginning to organize around a core principle: Any system that uses my data for insights that doesn’t result in a purchase of that data is theft.

Theft is a strong word but, if we truly value data and agree that it’s a big differentiator, it’s hard to argue with. For years, data owners have accepted a system that allowed wide access to their data for modeling and analytics in return for the occasional check. For every cookie targeted in programmatic that was activated to create revenue, a million more were churned to power analytics in another system. Put simply from the data owner’s perspective, if you are going to use my data for analytics and activation, but only pay me for activation, that’s going to be a problem.

In order to fix this, the systems of the future have to offer the ability for data owners to provision their data in more granular ways. Data owners need complete control of the following:

How is the data being used? Is it for activation, lookalike modeling, analytics in a data warehouse, user matching, cross-device purposes or another use case? Data owners need to be able to approve the exact modalities in which the data are leveraged by their partners.

What is the business model? Is this a trade deal, paid usage, fixed-price or CPM? How long is the term—a single campaign, or a year’s worth of modeling? Data owners should be able to set their own price—directly with the buyer—with full transparency into all fees associating with piping the data to a partner,.

What is being shared? What attributes or traits are being shared? Is it just user IDs, or IDs loaded with valuable attributes, such as a device graph that links an individual to all the devices they use? Data owners need powerful tools that offer a granular level of control for controlling data at the attribute level, and deciding how much of their data they are willing to share–and at what price.

Outside of big data and blockchain conversations, the phrase “data provisioning” is rarely heard, but it’s about to be a big part of our advertising ecosystem. However, it is those very security concerns that have kept data sharing at scale from becoming a reality. The answer is an ecosystem that offers complete control and transparency–and a smart layer of software-enabled governance tools that can stay ahead of nuances in law, such as the new GDPR requirements require. As adtech and marketing tech continue to come together, and systems evolve in parallel with their ability to make the best use of data, the systems of the future must first ensure data security before data innovation can truly happen.

Data may be the new oil, but will it be run by adtech wildcatters, or will the rules be governed by the data owners themselves?

[This was originally published in AdExchanger on 9/26/17]

What is the future of DMPs?

In the 1989 film “Back to the Future II,” Marty McFly traveled to Oct. 21, 2015, a future with flying cars, auto-drying clothes, and shoes that lace automatically. Sadly, none of these things happened. 

What is the future of data management platforms? This is a question I get asked a lot.

The short answer is that DMPs are now part of larger marketing stacks, and brands realize that harnessing their data is a top priority in order to deliver more efficient marketing.

This is a fast-moving trend in which companies are licensing large enterprise stacks and using systems integrators to manage all marketing—not just online advertising.

As detailed in Ad Age (Marketing clouds loom), the days of turning to an agency trade desk or demand side platform (DSP) to manage the “digital” portions of advertising are fading rapidly as marketers are intent on having technology that covers more than just advertising.

Building consumer data platforms

A few years ago, a good “stack” might have been a connected DMP, DSP and ad server. A really good stack would feature a viewability vendor and start a dynamic creative optimization (DCO). The focus then was on optimizing for the world of programmatic buying and getting the most out of digital advertising as consumers’ attention shifted online, to mobile and social, rather than television.

Fast forward a few years, and the conversations we are having with marketers are vastly different. As reported in AdExchanger, more than 40% of enterprise marketers license a DMP, and another 20% will do so within the next 12 months. DMP owners and those in the market for one are increasingly talking about more than just optimizing digital ads. They want to know how to put email marketing, customer service and commerce data inside their systems. They also want data to flow from their systems to their own data lakes.

Many are undertaking the process of building internal consumer data platforms (CDPs), which can house all of their first-party data assets—both known and pseudonymous user data.

We are moving beyond ad tech. Quickly.

Today, when those in the market are considering licensing a “DMP” they are often thinking about “data management” more broadly. Yes, they need a DMP for its identity infrastructure, ability to connect to dozens of different execution systems and its analytical capabilities. But they also need a DMP to align with the systems they use to manage their CRM data, email data, commerce systems, and marketing automation tools.

Data-driven marketing no longer lives in isolation. After I acquire a “luxury sedan intender” online, I want to retarget her—but I also want to show her a red sedan on my website, e-mail her an offer to come to the dealership, serve her an SMS message when she gets within range of the dealership to give her a test drive incentive, and capture her e-mail address when she signs up to talk to a salesperson. All of that needs to work together.

Personalization demands adtech and martech come together

We live in a world that demands Netflix and Amazon-like instant gratification at all times. It’s nearly inconceivable to a Millennial or Generation Z if a brand somehow forgets that they are a loyal customer because they have so many choices and different brands that they can switch to when they have a bad experience.

This is a world that requires adtech and martech to come together to provide personalized experiences—not simply to create more advertising lift, but as the price of admission for customer loyalty.

So, when I am asked, what is the future of DMPs, I say that the idea of licensing something called a “DMP” will not exist in a few years.

DMPs will be completely integrated into larger stacks that offer a layer of data management (for both known and unknown data) for the “right person;” an orchestration layer of connected execution systems that seek to answer the “right message, right time” quandary; and an artificial intelligence layer, which is the brains of the operation trying to figure out how to stitch billions of individual data points together to put it all together in real time.

DMPs will never be the same, but only in the sense that they are so important that tomorrow’s enterprise marketing stacks cannot survive without integrating them completely, and deeply.

[This post was originally published 11 May, 2017 by Chris O’Hara in Econsultancy blog]

Deepening The Data Lake: How Second-Party Data Increases AI For Enterprises

“A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed,” according to Amazon Web Services. “While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.”A few years ago, data lakes were thought to be limited to Hadoop applications (object storage), but the term is now more broadly applied to an environment in which an enterprise can store both structured and unstructured data and have it organized for fast query processing. In the ad tech and mar tech world, this is almost universally about first-party data. For example, a big airline might want to store transactional data from ecommerce alongside beacon pings to understand how often online ticket buyers in its loyalty program use a certain airport lounge.
However, as we discussed earlier this year, there are many marketers with surprisingly sparse data, like the food marketer who does not get many website visitors or authenticated customers downloading coupons. Today, those marketers face a situation where they want to use data science to do user scoring and modeling but, because they only have enough of their own data to fill a shallow lake, they have trouble justifying the costs of scaling the approach in a way that moves the sales needle.

chris_ohara1

Figure 1: Marketers with sparse data often do not have enough raw data to create measureable outcomes in audience targeting through modeling. Source: Chris O’Hara.

In the example above, we can think of the marketer’s first-party data – media exposure data, email marketing data, website analytics data, etc. – being the water that fills a data lake. That data is pumped into a data management platform (pictured here as a hydroelectric dam), pumped like electricity through ad tech pipes (demand-side platforms, supply-side platforms and ad servers) and finally delivered to places where it is activated (in the town, where people live).As becomes apparent, this infrastructure can exist with even a tiny bit of water but, at the end of the cycle, not enough electricity will be generated to create decent outcomes and sustain a data-driven approach to marketing. This is a long way of saying that the data itself, both in quality and quantity, is needed in ever-larger amounts to create the potential for better targeting and analytics.
Most marketers today – even those with lots of data – find themselves overly reliant on third-party data to fill in these gaps. However, even if they have the rights to model it in their own environment, there are loads of restrictions on using it for targeting. It is also highly commoditized and can be of questionable provenance. (Is my Ferrari-browsing son really an “auto intender”?) While third-party data can be highly valuable, it would be akin to adding sediment to a data lake, creating murky visibility when trying to peer into the bottom for deep insights.So, how can marketers fill data lakes with large amounts of high-quality data that can be used for modeling? I am starting to see the emergence of peer-to-peer data-sharing agreements that help marketers fill their lakes, deepen their ability to leverage data science and add layers of artificial intelligence through machine learning to their stacks.

chris_ohara2

Figure 2: Second-party data is simply someone else’s first-party data. When relevant data is added to a data lake, the result is a more robust environment for deeper data-led insights

for both targeting and analytics. Source: Chris O’Hara.
In the above example (Figure 2), second-party data deepens the marketer’s data lake, powering the DMP with more rich data that can be used for modeling, activation and analytics. Imagine a huge beer company that was launching a country music promotion for its flagship brand. As a CPG company with relatively sparse amounts of first-party data, the traditional approach would be to seek out music fans of a certain location and demographic through third-party sources and apply those third-party segments to a programmatic campaign.But what if the beer manufacturer teamed up with a big online ticket seller and arranged a data subscription for “all viewers or buyers of a Garth Brooks ticket in the last 180 days”?
Those are exactly the people I would want to target, and they are unavailable anywhere in the third-party data ecosystem.The data is also of extremely high provenance, and I would also be able to use that data in my own environment, where I could model it against my first-party data, such as site visitors or mobile IDs I gathered when I sponsored free Wi-Fi at the last Country Music Awards. The ability to gather and license those specific data sets and use them for modeling in a data lake is going to create massive outcomes in my addressable campaigns and give me an edge I cannot get using traditional ad network approaches with third-party segments.Moreover, the flexibility around data capture enables marketers to use highly disparate data sets, combine and normalize them with metadata – and not have to worry about mapping them to a predefined schema. The associative work happens after the query takes place. That means I don’t need a predefined schema in place for that data to become valuable – a way of saying that the inherent observational bias in traditional approaches (“country music fans love mainstream beer, so I’d better capture that”) never hinders the ability to activate against unforeseen insights.Large, sophisticated marketers and publishers are just starting to get their lakes built and begin gathering the data assets to deepen them, so we will likely see a great many examples of this approach over the coming months.
It’s a great time to be a data-driven marketer..

(Interview) On Beacons and DMPs

how-beacons-might-alter-the-data-balance-between-manufacturers-and-retailersHow Beacons Might Alter The Data Balance Between Manufacturers And Retailers

As Salesforce integrates DMP Krux, Chris O’Hara considers how proximity-based personalization will complement access to first-party data. For one thing, imagine how coffeemakers could form the basis of the greatest OOH ad network.