Is your data yours?

Personal responsibility extends to data governance.

Jul 23, 2019

Hi, Can here. Today, we talk data. Nothing personal.

Data as a Liability, Again

We talk a lot about data governance here. The main argument is that data comes not just as an asset, but also as a liability. The asset part is part is obvious. If you are in the ads business, data about your customers allows you to target them better, and charge more for your ads.

The liability part is less obvious. For years, the thinking was that data is an unadulterated good. Yet, the tide is slowly turning. For example, as GDPR came into power and its enforcement picked up, even companies that you’d not immediately associate with data collection are and up having to pony up millions in fines. It’s not like airlines are particularly profitable to begin with.

Again, my argument has never been that data is not without value. Rather, we simply haven’t been able to account for its liabilities until now. Storage is practically free nowadays but governance is not. How do you make sure the data is accessed only by those who need to? Or more subtly, how do you make sure it is only used in the way you said you would use it? We are still figuring out these answers.

For example, if you are asking people to enter (and verify!) their phone numbers to improve your security, you end up with a bunch of phone numbers in your database. Now, can you use those phone numbers to target people as an advertiser? You probably shouldn’t. It’s tempting, however, to think that you have all this data in your database, so why not use it? It’s so much work to build all that controls and flags and tags to make sure data is not just stored, but also used properly. Engineers aren’t cheap, after all.

Of course, that data has to come from somewhere. And it’s not that users are always losing something when they part with their data. Again, some of the examples are more obvious than others. Majority of people do not pay a dime to Google or Facebook to use their services. If you want to be academic about it, you could argue that people who advertise on Google and Facebook charge you higher prices, but how do you know they’d not charge you more if they bought ads on the local paper.

Cheap, Smart, Private. Choose Two?

There are more interesting cases, however. For example, if you bought a Smart TV in the last few years, it’s probably subsidized to some degree thanks to the data it collects on you. That’s why sometimes non Smart TVs are cheaper than the smart ones. The tech that goes into making a TV “smart” is not expensive, but the data you get from figuring what people watch is enough to make the TV cheaper. Is that a win-win?

That’s more of a philosophical question, and hard to answer. How you approach it really comes down to your personal politics too. If you are an individualist, you could argue that people who bought those TV sets consented to various privacy policies and knew what they were getting themselves into. Don’t want to get your data collected? There’s no dearth of dumb TVs to be bought.

But, that sounds a bit simplistic. First of all, It doesn’t sound reasonable to argue that we should subject people to hundreds of pages of legalese to catch the latest episode of The Bachelor (or The Bachelorette, I don’t judge). Smart TVs might be an improvement over dumb TVs (bear with me) in terms of functionality, but maybe we can take a more holistic approach.

The more subtle argument is whether collecting this much personal data is something we should as a society put some brakes on. When you give up your data, it’s not just *you* that’s affected. When you end up with a huge collection of personal data in one place, there are larger risks to society. You don’t have to go that far back, even, to come up with examples here.

Data, In Your Face

Facial recognition is an interesting case. I remember when identifying a face in a photo (i.e. “is there a face in this photo?) seemed like magic, but now it seems like table stakes; a first year in a CS program could do it. Then the problem became facial recognition (i.e. “who is in this photo?”). Again, for a while, it seemed like you would really need a lot of computation power and millions of tagged photos to figure that out. But now, the smartphone I carry in my pocket is able to miraculously tag my parents in photos that I took several years ago, without ever having to talk to a bunch of expensive servers in the cloud. Is there a cloud in my pocket?

Again, one of my beats is that people (or me, at least) really under appreciate how fast technology develops sometimes. What is a hard CS problem that takes a whole bunch of PhDs is now a library that you can import into your application. This stuff is going to be everywhere. Last week, internet was abuzz with FaceApp, a fun app allows you see yourself (or anyone really) aged. Of course, it didn’t take people to realize it’s an app made in Russia. Since all-things-Russia is obviously bad, there were even American senators involved figuring out what’s going on.

This is an interesting moment for me to ponder. I do not have an antagonistic view of the government, like many Americans do. Generally, I believe most people in public service have good intentions, are capable, and have the best interest of people they serve at heart. So, while I may not agree it’s the best use of a senator’s time to worry about FaceApp, given, ehem, today, but it’s fine.

At the same time, however, I worry where as individuals our responsibilities start and end. If millions of people willingly download an app from various app stores, and then willingly upload their photo (yes, I know, you can upload any photo), who is at fault? Is it Apple’s fault for allowing such easy access to the device camera, or not disclosing to people that it’s a Russian made app?

It simply seems hard for me to imagine a regulatory framework by which we can prevent people from uploading their selfies. How do you make sure that FaceApp’s activity is restricted but people can still use Instagram freely? There are millions of selfies there too, and most of them even have the location tagged along also. Isn’t that even creepier? You could make the argument that Instagram is equally creepy and we shouldn’t let Facebook have access to so many photos either, but I don’t think you’d get a lot of support for that one.

Does Data Have Borders?

Or do we go full nationalist and segregate by origin? That doesn’t seem right either. What does the origin even mean, anyway, if the application is made by a Russian company but the photos are uploaded to a public cloud operated by an American company, on American soil? Or what would it mean if it was an American company that operated those servers in some other country, say, Singapore? Would it make it less creepy, or more so?

There are societal benefits to collecting a lot of data, but there are also risks. My personal view is that we can mitigate a lot of the risks by making sure the data doesn’t get stored forever, and is responsibly discarded. Moreover, there are probably ways to get the value of the data, even in aggregate form, without building dossiers on every mere mortal on planet, so we should invest more in those.

However, how we can generate, send over, and store that data is also a personal question. It’s tempting and largely valid to point the lens of scrutiny on those who have the data and the power, but as individuals, we are also responsible. We control what apps we use, and what we do with them. It’s fun to enjoy the fruits of technology, but part of the entry fee is to be a knowledgable consumer of it.

What I’m Reading

A Spreadsheet Way of Knowledge: There’s no end to how people use and abuse spreadsheets. Steve Ballmer, reportedly, uses Excel as a personal calendar, and I’ve met people who use it to track the personal favors they’ve given and received in a Google Spreadsheet. Fun! There are few pieces of software that have changed our world as deeply and widely as spreadsheets. They are everywhere, and they model not just our businesses but our life, our thinking. This is a good historical narrative on how they came out to be.

[…] He ran fifteen different scenarios on his computer, including one in which he took the money set aside for renovation and invested it elsewhere. What Maxwell found was startling: Not only would renovation be foolhardy, but “even the ‘best case’ showed I’d get nearly as good a rate of a return on my investment in a money market fund as staying in the restaurant business.” Get out of the restaurant business! the spreadsheet said. What the spreadsheet left out, of course was the unquantifiable emotional factor — Maxwell loved what he did. He kept the restaurant (though scuttled the renovation)

How to Hire: There are many ways tech companies compete with each other, but there’s no competition like there’s one for talent. It’s hard to hire good people, harder to keep the better ones. But then, there are few more important decisions than who to hire and how. What’s a company, other than a bunch of people working together on a shared mission? This is a good talk by the Carta CEO in text format about how the company makes decisions. Nothing too controversial, but it’s a good overview and some bits are interesting.

I want to repeat this point. We are increasing overhead by 50% because we failed to execute. It is not something to be proud of. It is humbling to go back to the labor market, hat-in-hand, asking for help. We did this when we hired you. We asked each of you to help us. You did not need us. There are plenty of great jobs. But we needed you. And thank goodness you came. We wouldn’t be here without you. But each of you was hired because the team before you failed to execute without you. And this is still true today.

Margins by Ranjan Roy and Can Duruk

Discussion about this post

Ready for more?