PCLinuxOS Magazine

PCLinuxOS

De-Googling Yourself, Part 2

by Alessandro Ebersol (Agent Smith)

Google is a threat to the privacy and the very daily life of its users. Why? Because Google profiles its users. And, how did it all begin?

Profiling to better serve ... (?)

To get better search results, Google started profiling its users. And thanks to that, the search results have been improving over the years.

But Google needed a better way to capture users' personal information, to better work the search results. And, what better way than the users themselves giving their personal data to Google?

So, in 2004, Gmail was born. This was the beginning of the Google Office suite (G-suite), initially a service that only accepted users by invitation (in a period when it was labeled as a beta program). It was opened to the general public in 2009, five years later.

And so, Google began to profile the users of its search service through its email service. To get an idea of how profiling works, try searching Google when you're logged in your account, and then search the same terms when you're logged out of your account to see how the results look different.

How Gmail reads emails (and, time to freak out now)

Google's email servers automatically check email for a variety of purposes, including adding contextual ads alongside emails and spam and malware filtering. This is the official version of why your messages are read. But the innocent and noble causes can lead to much darker consequences.

Privacy advocates have raised many concerns about this practice. The concerns include that since email content is read by a machine (as opposed to a person), it may allow Google to keep unlimited amounts of information forever. Automated background scanning increases the risk that the expectation of privacy in the use of e-mail will be reduced or eroded. The information collected from emails can be retained by Google for years after its current relevance to create complete user profiles. Emails sent by users of other email providers are verified, although they have never agreed to Google's privacy policy or terms of service. Google may change its privacy policy unilaterally, and for minor changes to the policy, it may do so without informing users. In court cases, governments and organizations may find it easier to monitor e-mail communications legally. At any time, Google may change the company's current policies to allow the combination of e-mail information with data collected from the use of other services. And, any internal security issues on Google systems can expose many - or all -- of their users.

So much so that in January 2010, Google detected a "highly sophisticated" cyber attack on its infrastructure originating in China. The targets of the attack were Chinese human rights activists, but Google found that accounts owned by European, American and Chinese human rights activists in China were "routinely accessed by third parties." Thus, data collected by Google has been abused by third parties.

Google, on June 23, 2017, announced that by the end of 2017 it would phase out the scanning of email content to generate contextual advertising, relying on personal data collected through its other services. The company said that this amendment was intended to clarify its practices and lessen the concerns of G Suite business customers, who felt an ambiguous distinction between the free consumer and the paid professional variants, the latter being free of publicity.

However, the above highlight demonstrates that, rather than diminishing, Google's meddling has increased. This is because of all the products Google offers, and its ability to monitor the use of those services.

Google creates many tools that are useful to users and administrators, but they are designed to collect user data and create as much of a profile as possible, which is sold to advertisers, governments, other data brokers, or anyone else who wants to pay. The difference is that the method Google uses is aggregation.

I'll list some of Google's popular (and some even transparent to the end user) services and products that profile their users, and help the company catalog the online habits and actions of those who use them.

Google AMP (Accelerated Mobile Pages)

Google AMP is a service that stores data, usually media, on Google servers around the world. This means that when you upload a website with AMP enabled, the images and media come from Google's servers. This means that when you visit a website with AMP enabled, Google knows all the features you've loaded on the page. Interestingly, this gives Google access to substantially more information than your internet provider could get, because HTTPS encryption prevents the provider from seeing the specific pages you visit. They can only see the domain. For example, your ISP might see that you visited Reddit, but not what you visited on Reddit. The linked Google AMP content on Reddit (and there's a ton of it) gives Google a direct IP link, content that they can document and use to map user behavior and activity profiling.

This problem is widespread. WordPress sites, which are the most popular content management system in the world, have AMP enabled by default.

Worse, Google has recently announced that Chrome mobile users will not even know when they're using amp-served content. Chrome will hide the AMP content behind the original URL.

Google Analytics

Google Analytics uses cookies and cross-site tracking to identify and track users as they browse the Web in their daily routines. Google Analytics works by assigning a user a cookie with a unique ID number, and every time a user visits a site with Google Analytics enabled, Google records that activity and links it to the user with that particular cookie. Many times, this is done without the user being aware.

Recently, the European General Data Protection Regulation (GDPR) has created many warnings to users about this type of analytical software, but has been inefficient in restricting its use because Google Analytics is so ubiquitous that it bothers users with cookie warnings at almost all sites visited. Many do not follow GDPR's rules and allow users to browse the site or use services without tracking.

Google Cloud

Another great data point for Google is the Cloud. As of 2018, Google hosts about 9.5% of all cloud content in the Google Cloud (by revenue, most Google cloud services are "free" and they can host much more by volume). If you're using an application or website that uses the Google Cloud infrastructure, this is another gateway for your information. A user does not grant Google permission to retain data about them to use Google Cloud services.

Google Maps API

Every time you visit a company website and use Google Maps (not a screenshot), it uses the Google Maps API. This data is combined with Google Analytics to attach location information to your Google profile.

Google FireBase

FireBase is a tool that allows developers to easily synchronize data between different sites, applications and services. The caveat is that this data is synchronized through Google's servers, which record all this data and profile without the user's knowledge.

Google Chrome

Google Chrome records everything you've searched for on Google Search, all the websites you've visited or tagged, all the YouTube videos you've watched, all the ads you've clicked on, and how many passwords were automatically filled by Google Chrome.

These registered browser habits of Google Chrome include:

Everything you've searched using Google Search or YouTube
Your YouTube history
How many Google searches you've done during this month
All the sites you've already clicked on
Every website address that you have inserted in the address bar
All the sites that you have already checked
All Google Chrome tabs that are open on all your devices.
How many Gmail conversations you've had
The apps you've downloaded from the Chrome Web Store and Google Play store
Your Chrome Web Store extensions
Your Google Chrome browser settings
All email addresses, street addresses, phone numbers you've set to fill automatically on Chrome
All the usernames and passwords you've asked Chrome to save
All the sites you've asked Chrome to not save passwords

And, by the end of 2018, Google Chrome automatically logs you in when you access Google sites. That is, almost mandatorily, the user is forced to share his/her activity online with Google.

Android devices

We come to a very important point, and that is also perhaps the most vulnerable point of the user's relationship with Google, since the Android operating system, the Google application store(Google Play) and most (if not all) of the services of Google (GSF: Google Service Framework) are an inexhaustible source of enterprise intrusion into the personal lives of its users.

To download and use Google Play Store apps on an Android device, you must have (or create) a Google Account, which becomes an important gateway through which Google collects personal information including user name, email, and phone number. If an user signs up for services like Google Pay, Android will also collect credit card information, zip code and the user's date of birth. All of this information becomes part of an user's personal information associated with his/her Google Account.

In addition to personal data, Chrome and Android send information to Google about browsing activities and mobile apps, respectively. Any visit to a webpage is tracked and automatically collected under Google's user credentials. If you sign in to Chrome, the browser will also collect information about browsing history, passwords, site-specific permissions, cookies, download history, and additional user data.

Android sends periodic updates to Google's servers, including device type, mobile service provider name, crash reports, and information about apps installed on the phone. It also notifies Google whenever an application is accessed on the phone (for example, Google knows when an Android user accesses the Uber application).

The Android and Chrome platforms meticulously collect user location and motion information using a variety of sources, as shown in the figure below.

For example, a "rough location" rating can be done using GPS coordinates on an Android phone or via a laptop's network IP address. User location accuracy can be further improved ("good location") by using nearby cell tower IDs or by scanning device-specific BSSIDs or basic service set identifiers assigned to the radio chipset used in access points near WiFi hotspots. Android phones can also use information from registered Bluetooth beacons with the Google Proximity Beacon API. These beacons not only provide the geolocation coordinates of the user, but can also identify the exact floor levels in buildings.

It is difficult for an Android mobile user to "refuse" location tracking. For example, on an Android device, even if an user turns off WiFi, the location of the device will still be monitored through WiFi. To avoid this tracking, WiFi scanning must be explicitly disabled in a separate action as shown below:

The omnipresence of WiFi hubs has made location tracking quite frequent.

For example, in a study conducted by Professor Douglas C. Schmidt, of Vanderbilt University, during a short 15-minute walk around a residential neighborhood, an Android device sends location 50requests to Google. The request contained collectively about 100 unique BSSIDs from public and private WiFi access points.

Google can check with a high degree of confidence whether a user is standing, walking, running, cycling or riding a train or car. It achieves this by tracking the location coordinates of an Android mobile user at frequent time intervals in combination with panel sensor data (such as an accelerometer) on mobile phones.

But it does not stop there. Because of the infrastructure that Google services enable, there is a whole host of enterprise products that are used daily by advertisers and internet advertising companies.

Google products for online ads and advertisers

An important source for Google user activity data collection mechanisms are its publisher and advertiser tools such as Google Analytics, DoubleClick, AdSense, AdWords, and AdMob. These tools have a huge reach, for example. more than a million mobile apps use AdMob, over 1 million advertisers use Google AdWords, more than 15 million websites use Google AdSense, and more than 30 million websites use Google Analytics.

There are two main groups of users of the Google tools that are focused on publishers and advertisers:

Editors of sites and applications, which are organizations that have websites and create applications for mobile devices. These entities use Google's tools to make money by allowing ads to appear on their websites or applications, and to improve tracking and understanding who is visiting their websites and using their applications. Google's tools place cookies and run scripts on website visitors' browsers that help determine a user's identity, track their interest in the content, and track their behavior online. Google's mobile application libraries track application usage on mobile devices.
Advertisers, which are organizations that pay to display banners, videos, or other ads to users while browsing Internet or, use applications. These entities apply Google's tools to target specific people profiles in ads to increase the return on their marketing spend (better targeted ads generally lead to higher click-throughs and conversions). These tools also allow advertisers to analyze their audiences and evaluate the effectiveness of their digital advertising by tracking which ads were clicked on that frequency and providing information about the profiles of the people who clicked the ads.

Together, these tools collect information about user activity on websites and applications, such as visited content and clicked ads. They work in the background - largely imperceptible to users. The figure below shows some of these main tools, with arrows indicating data collected from users and ads displayed to users.

Conclusions

It's frightening how much intrusion Google has in our lives, whether we know it, or worse, without knowing anything. Without suspicion, we live in a birdcage, where advertisers, big corporations and governments know of all our steps, our tastes, our political and religious orientation.

Our secrets are no longer ours, but, they are with the corporations now: Google, Apple and Facebook know when a woman visits an abortion clinic, even if she does not tell anyone else. The GPS coordinates on the phone do not lie. Extramarital affairs are an easy thing to imagine: two smartphones that have never met before, cross into a bar and then head to an apartment across the city, stay together at night and leave in the morning ... well, you can imagine the rest.

It's a sad brave new world. But not everything is lost. It is possible to improve privacy and lead a more reserved life. We will see how, in the next article. So, stay with us, next month for more.

Previous Page Top Next Page