Why Most of Today’s DMPs are Lousy

8022_sucklessHow Granular Data Collection and a Robust Second-Party Data Strategy Changes the Game

The world’s largest marketers and media companies have strongly embraced data management technology to provide personalization for customers that demand Amazon-like experiences. As a single, smart hub for all of their owned data (CRM, email, etc)—and acquired data, such as 3rd party demographic data —DMPs go a long way towards building a sustainable, modern marketing strategy that accounts for massively fragmented digital audiences.

The good news is most enterprises have taken a technological leap of faith, and embraced a data strategy to help them navigate our digital future. The bad news is, the systems they are using today are deeply flawed and do not produce optimal audience segmentation.  

A Little DMP History

Ten years ago, a great thing called the data management platform (DMP) started to power big publishers. These companies wanted to shift power away from ad networks (upon whom the publishers relied to monetize their sites) and give publishers the power to create relevant audiences directly for advertisers. By simply placing a bit of javascript in the header of their websites, DMPs could build audience segments using web cookies, turning the average $2 CPM news reader into a $15 CPM “auto-intender.” By understanding what people read, and the content of those pages,  DMPs could sort people in large audience “segments” and make them available for targeting. Now, instead of handing over 50% of their revenue to ad networks, publishers could pay a monthly licensing fee to a DMP and retain the lion’s share of their digital advertising dollars by creating their own segmented audiences to sell directly to advertisers.

Marketers were slower to embrace DMP technology, and they quickly grasped the opportunity too. Now, instead of depending on ad networks to aggregate reach for them, they started to assemble their own first-party data asset—overlapping their known users with publishers’ segments, and buying access to those more relevant audiences. The more cookies, mobile IDs, and other addressable keys they could collect, the bigger their potential reach. Since most marketers had relatively small amounts of their own data, they supplemented with 3rd-party data—segments of “intenders” from providers like Datalogix, Nielsen, and Acxiom.

The two primary use cases for DMPs have not changed all that much over the years: both sides want to leverage technology to understand their users (analytics) and grow their base of addressable IDs (reach). Put simply, “who are these people interacting with my brand, and how can I find more of them?” DMPs seem really efficient at tackling those basic use cases, until you find out that they were doing it the wrong way the whole time.

What’s the Problem?

To dig a bit deeper, the way first-generation DMPs go about analyzing and expanding audiences is through mapping cookies to a predetermined taxonomy, based on user behavior and context. For example, if my 17-year-old son is browsing an article on the cool new Ferrari online, he would be identified as an “auto intender” and placed in a bucket of other auto intenders. The system would not store any of the data associated with that browsing session, or additional context. It is enough that the online behavior met a predetermined set of rules for “auto-intender” to place that cookie among several hundred thousand other “auto- intenders.”

The problem with a fixed, taxonomy-based collection methodology is just that—it is fixed, and based on a rigid set of rules for data collection. Taxonomy results are stored (“cookie 123 equals auto-intender”)—not the underlying data itself. That is called “schema-on-write,” an approach that writes taxonomy results to an existing table when the data is collected. That was fine for the days when data collection was desktop-based and the costs of data storage were sky-high, but it fails in a mobile world where artificial intelligence systems crave truly granular, attribute-level data collected from all consumer interactions to power machine learning.

There is another way to do this. It’s called “schema-on-read,” which is the opposite of schema-on-write. In these types of systems, all of the underlying data is collected, and the taxonomy result is created upon reading all of the raw data. In this instance, say I collected everything that happened on a popular auto site like Cars.com? I would collect how many pages were viewed, dwell times on ads, all of the clickstream collected in the “build your own” car module, and the data from event pixels that collected how many pictures a user viewed of a particular car model. I would store all of this data so I could look it up later.

Then, if my really smart data science team told me that users who viewed 15 of the 20 car pictures in the photo carousel in one viewing session were 50% more likely to buy a car in the next 30 days than the average user, I would build a segment of such users by “reading” the attribute data I had stored.  This notion—total data storage at the attribute (or “trait”) level, independent of a fixed taxonomy—is called completeness of data. Most DMPs don’t have it.

Why Completeness Matters

Isn’t one auto-intender as good as another, despite how those data were collected? No. Think about the other main uses of DMPs: overlap reporting and indexing. Overlap reporting seeks to overlay an enterprise’s first party data asset with another. This is like taking all the visitors to Ford’s website, and comparing that audience to every user on a non-endemic site, like the Wall Street Journal. Every auto marketer would love to understand which high-income WSJ readers were interested in their latest model. But, how can they understand the real intent of users if they are just tagged as “auto intenders?” How did the publisher come to that conclusion? What signals contributed to having that those users qualify as “intenders” in the first place? How long ago did they engage with an auto article? Was it a story about a horrific traffic crash, or an article on the hottest new model? Without completeness, these “auto intenders” become very vague. Without all of the attributes stored, Ford cannot put their data science team to work to better understand their true intent.

Indexing, the other prominent use case, scores user IDs based on their similarity to a baseline population. For example, a popular women’s publisher like Meredith might have an index score of 150 against a segment of “active moms.” Another way of saying this is that indexing helps understand the “momness” of those women, based on similarity to the  overall population. Index scoring is the way marketers have been buying audience data for the last 20 years. If I can get good reach with an index score above 100 at a good price, then I’m buying those segments all day long. Most of this index-based buying happens with 3rd-party data providers who have been collecting the data in the same flawed way for years. What’s the ultimate source of truth for such indexing? What data underlies the scoring in the first place? The fact is, it is impossible to validate these relevancy scores with the granular, attribute-level data being available to analyze.

Therefore, it is entirely fair to say that most DMPs have excellent intentions, but lack the infrastructure to perform 100% of the most important things DMPs are meant to do: understand IDs, and grow them through overlap analysis and indexing. If the underlying data has been improperly collected (or not there at all), then any type of audience profiling by any means is fundamentally flawed.

Whoops.

What to do?

To be fair, most DMPs were architected during a time when it was unnecessary to collect data through a schema-on-read methodology—and extremely costly. Today’s unrelenting shift to AI-driven marketing necessitates this approach to data collection and storage, and older systems are tooling up to compete. If you want to create a consumer data platform (“CDP”), the hottest new buzzword in marketing, you need to collect data in this way. So, the industry is moving there quickly. That said, many marketers are still stuck in the 1990s. Older DMPs are somewhat like the technology mullet of marketing—businesslike in the front, with something awkward and hideous hidden behind.

Beyond licensing a modern, schema-on-read system for data management so marketers can collect their own data in a granular way, there is another way to do things like indexing and overlap analysis well: license data from other data owners who have collected their data in such a way. This means going well beyond leveraging commoditized third-party data, and looking at the world of second-party data. Done correctly, real audience planning starts with collecting your own data effectively and extends to leveraging similarly collected data from others—second party data that is transparent, exclusive, and unique.

Advertisements

Signal to Noise

What Data Should Inform Media Investment Decisions?

The other day, I was updating my Spotify app on my Android device. When it finally loaded, I was asked to log in again. I immediately loaded up a new playlist that I had been building—a real deep dive into the 1980s hardcore music I loved back in my early youth. I’m not sure if you are familiar with the type of music that was happening on New York City’s lower east side between 1977 and 1986, but it was some pretty raw stuff…bands like the Beastie Boys (before they went rap), False Prophets, the Dead Boys, Minor Threat, the Bad Brains, etc. They had some very aggressive songs, with the lyrics and titles to match.

Well, I put my headphones in, and started walking from my office on 6th Avenue and 36th street across to Penn Station to catch the 6:30 train home to Long Island…all the while broadcasting every single song I was listening to on Facebook. Among the least offensive tunes that showed up within my Facebook stream was a Dead Kennedys song with the F-word featured prominently in the song title.  A classic, to be sure, but probably not something all of my wife’s friends wanted to know about.

As you can imagine, my wife (online at the time), was frantically e-mailing me, trying to tell me to stop the offensive social media madness that was seemingly putting a lie to my carefully cultivated, clean, preppy, suburban image.

So why, as a digital marketer, would you care about my Spotify Facebook horror story?

Because my listening habits (and everything else you and I do online, for that matter) are considered invaluable social data “signals” that you are mining to discover my demographic profile, buying habits, shoe size, and (ultimately) what banner ad to serve me in real time. The only problem is that, although I love hardcore music, it doesn’t really define who I am, what I buy, or anything else about me. It is just a sliver of time, captured digitally, sitting alongside billions of pieces of atomic level data, captured somewhere in a massive columnar database.

Here are some other examples of data that are commonly available to marketers, and why they may not offer the insights we think they might:

— Zip Code: Generally, zip codes are considered a decent proxy for income, especially in areas like Alpine, New Jersey, which is small and exclusive. But how about Huntington, Long Island, with an average home value of $516,000? That zip code contains the village of Lloyd Harbor (average home value of $1,300,000) and waterside areas in Huntington Bay like Wincoma, where people with lots of disposable income live).

— Income: In the same vein, income is certainly important and there are a variety of reliable sources that can get close to a consumer’s income profile, but isn’t disposable income a better metric? If you earn $250,000 per year, and your expenses are $260,000, then you are not exactly Nordstrom’s choicest customer. In fact, you are what we call “broke.” Maybe that was okay back in the good old days of government-style deficit spending but, these days, luxury marketers need a sharper scalpel to separate the truly wealthy from the paper tigers.

— Self-Declared Data: We all like to put a lot of emphasis on the answers real consumers give us on surveys, but who hasn’t told a little fib from time to time? If I am “considering a new car” is my price range “$19,000 – $25,500” or “35,000 – $50,000?” This type of social desirability bias is so common that reaearchers have sought other ways of inferring income and purchase behavior. When people lie about themselves, to themselves (in private, no less)  you must take a good deal of self-declared data with a hearty grain of salt.

— Automobile Ownership: Want to know how much dough a person has? Don’t bother looking at his home or zip code. Look at his car. A person who has $1,800 a month to burn on a Land Rover is probably the same person liable to blow $120 on mail order steaks, or book that Easter condo at Steamboat. Auto ownership, among other things, is a great proxy for disposable income.

It would be overly didactic to rehearse all of the possible iterations of false data signals that are being used by marketers right now to make real-time bidding decisions in digital media. There are literally thousands—and social “listening” is starting to make traditional segmentation errors look tame. Take a recent Wall Street Journal article that reported that the three most widely socially-touted television shows fared worse than those than shows which received far less social media attention.

Sorry, but maybe that hot social “meme” you are trying to connect with just isn’t that valuable as a “signal.” We all hear the fire truck going by on 7th Avenue. The problem is that the only people who turn to look at it are the tourists. So what is the savvy marketer to do?

Remember that all data signals are just that: Signals. Small pieces of a very complicated data puzzle that you must weave together to create a profile. Unless you are leveraging reliable first-party data, second-party data, and third party data, and stitching that data together, you cannot get a true view of the consumer.

In my next column, we’ll look at how stitching together disparate data sources can reveal new, more reliable, “signals” of consumer interest and intent.

[This article was originally published in ClickZ on 12/2/2011]