Big Data (for Marketing) is Real!

MachineLearningWe’ve been hearing about big data driving marketing for a long time, and to be honest, most is purely aspirational.

Using third-party data to target an ad in real time does deploy some back-end big-data architecture for sure. But the real promise of data-driven marketing has always been that computers, which can crunch more data than people and do it in real time, could find the golden needle of insight in the proverbial haystack of information.

This long-heralded capability is finally moving beyond the early adopters and starting to “cross the chasm” into early majority use among major global marketers and publishers. 

Leveraging Machine Learning For Segmentation 

Now that huge global marketers are embracing data management technology, they are finally able to start activating their carefully built offline audience personas in today’s multichannel world.

Big marketers were always good at segmentation. All kinds of consumer-facing companies already segment their customers along behavioral and psychographic dimensions. Big Beer Company knows how different a loyal, light-beer-drinking “fun lover” is from a trendsetting “craft lover” who likes new music and tries new foods frequently. The difference is that now they can find those people online, across all of their devices.

The magic of data management, however, is not just onboarding offline identities to the addressable media space. Think about how those segments were created. Basically, an army of consultants and marketers took loads of panel-based market data and gut instincts and divided their audience into a few dozen broad segments.

There’s nothing wrong with that. Marketers were working with the most, and best, data available. Those concepts around segmentation were taken to market, where loads of media dollars were applied to find those audiences. Performance data was collected and segments refined over time, based on the results.

In the linear world, those segments are applied to demographics, where loose approximations are made based on television and radio audiences. It’s crude, but the awesome reach power of broadcast media and friendly CPMs somewhat obviate the need for precision.

In digital, those segments find closer approximation with third-party data, similar to Nielsen Prizm segments and the like. These approximations are sharper, but in the online world, precision means more data expense and less reach, so the habit has been to translate offline segments into broader demographic and buckets, such as “men who like sports.”

What if, instead of guessing which online attributes approximated the ideal audience and creating segments from a little bit of data and lot of gut instinct, marketers could look at all of the data at once to see what the important attributes were?

No human being can take the entirety of a website’s audience, which probably shares more than 100,000 granular data attributes, and decide what really matters. Does gender matter for the “Mom site?”Obviously. Having kids? Certainly. Those attributes are evident, and they’re probably shared widely across a great portion of the audience of Popular Mom Site.

But what really defines the special “momness” of the site that only an algorithm can see? Maybe there are key clusters of attributes among the most loyal readers that are the things really driving the engagement. Until you deploy a machine to analyze the entirety of the data and find out which specific attributes cluster together, you really can’t claim a full understanding of your audience.

It’s all about correlations. Of course, it’s pretty easy to find a correlation between only two distinct attributes, such as age and income. But think about having to do a multivariable correlation on hundreds of different attributes. Humans can’t do it. It takes a machine-learning algorithm to parse the data and find the unique clusters that form among a huge audience.

Welcome to machine-discovered segmentation.

Machines can quickly look across the entirety of a specific audience and figure out how many people share the same attributes. Any time folks cluster together around more than five or six specific data attributes, you arguably have struck gold.

Say I’m a carmaker that learned that some of my sedan buyers were men who love NASCAR. But I also discovered that those NASCAR dads loved fitness and gaming, and I found a cluster of single guys who just graduated college and work in finance. Now, instead of guessing who is buying my car, I can let an algorithm create segments from the top 20 clusters, and I can start finding people predisposed to buy right away.

This trend is just starting to happen in both publishing and marketing, and it has been made available thanks to the wider adoption of real big-data technologies, such as Hadoop, Map Reduce and Spark.

This also opens up a larger conversation about data. If I can look at all of my data for segmentation, is there really anything off the table?

Using New Kinds Of Data To Drive Addressable Marketing 

That’s an interesting question. Take the company that’s manufacturing coffee machines for home use. Its loyal customer base buys a machine every five years or so and brews many pods every day.

The problem is that the manufacturer has no clue what the consumer is doing with the machine unless that machine is data-enabled. If a small chip enabled it to connect to the Internet and share data about what was brewed and when, the manufacturer would know everything their customers do with the machine.

Would it be helpful to know that a customer drank Folgers in the morning, Starbucks in the afternoon and Twinings Tea at night? I might want to send the family that brews 200 pods of coffee every month a brand-new machine after a few years for free and offer the lighter-category customers a discount on a new machine.

Moreover, now I can tell Folgers exactly who is brewing their coffee, who drinks how much and how often. I’m no longer blind to customers who buy pods at the supermarket – I actually have hugely valuable insights to share with manufacturers whose products create an ecosystem around my company. That’s possible with real big-data technology that collects and stores highly granular device data.

Marketers are embracing big-data technology, both for segmentation and to go beyond the cookie by using real-world data from the Internet of Things to build audiences.

It’s creating somewhat of a “cluster” for companies that are stuck in 2015.

Rise of the Machines

Where do People Fit into a World that Promises Endless Media Automation?

Ever since man tied a rope to an ox, there has been a relentless drive to automate work processes. Like primitive farming, digital media buying is a thankless, low-value task where results (and profits) do not often match the effort involved. Many companies are seeking to alleviate much of the process-heavy, detail-oriented tasks involved in finding, placing, serving, optimizing, tracking, and (most importantly) billing digital media campaigns with various degrees of success.

Let’s take the bleeding edge world of real-time audience buying. Trading desk managers are often working in multiple environments, on multiple screens. On a typical day, he may be logging into his AppNexus account, bidding on AdBrite for inventory, bidding for BlueKai stamps in that UI, looking for segmentation data in AdAdvisor, buying guaranteed audience on Legolas, trafficking ads in Atlas, and probably looking at some deep analytics data as well. If he is smart, he is probably managing that through a master platform, where he can look at performance of guaranteed display and even other media types. How efficient does that sound?

To me, it sounds like six logins too many. Putting aside the obvious fact that an abundance of technology doesn’t lead to efficiency (how’s “multitasking” working out for your 12 year old, by the way?), I wonder we aren’t asking too much of digital as a whole. How many ads have you clicked on lately? If the answer is zero, then you are in a large club. Broken down to its most basic level, we are working in a business that believes a 0.1% “success” rate is reason to celebrate. But the “click is a dead metric” some say. Really? Isn’t the whole point of a banner ad to drive someone to your website? When did that change?

All of this is simply to illustrate the larger point that the display advertising industry, for all of its supposed efficiencies, is really still in its very nascent stages. Navigating the commoditized world of banner advertising is still very much a human task, and the many machines we have created to wrestle the immense Internet into delivering an advertiser the perfect user are still primitive. For a short while longer, digital media is still the game of the agency media buyer…but not for long.

Let’s look at the areas in which smart media people add value to digital campaigns: site discovery, pricing, analytics and optimization, and billing.

Site Discovery

In the past, half the battle was knowing where to go. Which travel sites sold the most airline tickets? Which sites indexed most highly against men of a certain age, looking for their next automobile? What publisher did you call to get to IT professionals who made purchasing decisions on corporate laptops? Agencies had (and still have) plenty of institutional knowledge to help their clients partner with the right media to reach audiences efficiently and—even with the abundance of measurement tools out there—a lot of human guidance was needed. Now, given the ability to purchase that audience exactly using widely available data segments, the trick is simply knowing where to log in. I just found the latter IT professional segment in Bizo in less than 2 minutes. So the question becomes: how are you leveraging data and placement to achieve the desired result, and how efficiently are you doing it?


It used to be that the big agencies could gain a huge pricing advantage through buying media in bulk. Holding company shops leveraged their power and muscled down publisher rate card by (sometimes) 80% or more with promised volume commitments, leaving smaller media agencies behind. Then, a funny thing happened: ad exchanges. All of the sudden, nearly all of the inventory in the world was available, and ready to be had in a second-price auction environment. Now, any Tom , Dick, and Harry with a network relationship could access relatively high quality impressions at prices that were guaranteed never to be too high (in a second-price auction, the winning bid is placed at the second highest price, meaning runaway “ceiling” bids are collapsed). Whoops. With their pricing advantage eliminated, large agencies did the next best thing: eliminated the middleman by building their own exchanges, which we have been calling “DSPs.” So, you don’t need human intervention to ensure pricing advantages.

Analytics and Optimization

What about figuring out what all the data means? After all, spreadsheets don’t optimize media campaigns. Don’t you need really smart, analytical media people to draw down click- and view-based data, sift through conversion metrics, and build attribution models? Maybe not. Not only are incredible algorithms taking that data and using machine learning to automatically optimize against clicks or conversions—but programmatic buying is slowly coming to all digital media as well.  In the future, smart technology will enable planners to create dynamic media mixes that span guaranteed and real-time, and apply pricing across multiple methodologies (CPM, CPC, CPA). Much of that work is being done manually right now, but not for long.


Sadly, much of the digital media business comes down to billing at the end of the day. Media companies struggle tremendously with reconciling numbers across multiple systems, and agency ad servers don’t seem to speak the same language as publisher ones. The bulk of a media company’s time seems to be spend just trying to get paid, and an incredible amount of good salary gets burnt in the details of reconciliation and reporting. This is slowly changing, but the advent of good API development is starting to make the machines talk to each other more clearly. The platforms that can “plug in” ad serving and data APIs most easily have a lot to gain, and the industry as a whole will benefit from interoperability.

So, are people doomed in digital media? Not at all. There are going to be a lot less digital media buyers and planners needed—but what agencies are really going to need are smart media people. Right now, you need 4 people to manage 10 machines. In the near future, you will need 1 smart person to manage 1 platform—and the other three people can focus on something else. Maybe like talking to their clients.

[This article originally appeared in ClickZ on 4/14/11]