How Computers Trawl a Sea of Data for Stock Picks


An example from the Wall Street Journal of more data improving efficiency – in this case, allowing firms to more efficiently allocate resources in our economy to bring about the best growth. By leveraging increasingly large and diverse datasets, they can get a more accurate picture of the world’s needs and various risks. The equality of who gets the gains from these efficiencies can be a matter of debate, but more data are unequivocally more efficient than less data.

Two Sigma’s funds all take a big-data approach. Among its data sources are news bulletins, National Weather Service reports, market data, tweets and information from smartphone users who have agreed to be tracked by a retail-trend-analysis company.

Thirty years ago, it was easier to make investment picks because the world wasn’t as interconnected, Mr. Siegel says. “Here’s the problem: What affects the price of a share of Apple stock? The answer: Pretty much everything. Absolutely every little thing has some effect. Every sale, every earthquake.”

To comb through data 24 hours a day, the firm has more than 100 teraflops of power—more than 100 trillion calculations a second—and more than 11 petabytes of storage, the equivalent of five times the data stored in all U.S. academic libraries.

Visualization Reimagined


Database visualization seems to be particularly relevant to discussion about life and death represented through digital media. Nick Gagnon made an interesting observation through his analysis of Jonathan Harris’ Whale Hunt. Gagnon notes that he was initially inclined to interpret the way the data were visualized to represent excitement in the photo series or death of the whale, only to realize after investigation that “spikes” in the visualization connoted frequency of photos over time.

Gagnon’s observations highlight how subjective observation and experience of database representation can be. In class, we touched on the fact that that Adobe’s Flash standard is used to codify the digital information of Harris’ work. Because users are asked to install Flash software to properly compile and represent the work they are able to see it largely as Harris intended, but viewed critically one could call Harris only one “reader” of his own work in the post-structuralist tradition. He can say that his narrative can only be represented in one way, but one can argue that if his work is understood of as a database, it can be represented in many ways.

Screen Shot 2015-03-31 at 9.00.43 PM


Flash rendering: narrative? database?

Screen Shot 2015-03-31 at 9.00.57 PMSource code: narrative? database?

Is the source code for a program a valid representation for the data therein? If a programmer can visualize the code in her mind’s eye, does that constitute a narrative? These questions remind us that representation is a multifaceted process with many layers; equally important to the form of a work is its function.

Disintermediated Existence


Distraction becomes all too easy as we are awash in the cornucopia of advancements spurred by the digital revolution. With eyeballs fixed to headlines about the next iPhone or pictures of crushes on FaceBook, existential questions can tacitly seep into the shadows. It is often said that nothing can hide in the watchful eye of the NSA in today’s database-driven world, but a few new things are born when the entropic force of time is abated even slightly. Death is a fundamental element of human existence, but it would seem that even the dissolution of existence into the sands of time can be partially abated.

While a best friend may pass away in the real world, their FaceBook profile, emails, game avatars, and other digital paraphernalia continue to live on in cyberspace. The mourning process is also evolved. Joyce Walker notes that,

Not only did the Web allow for the creation of both public and private spaces for the activities of mourning, it also allowed these spaces to exist in direct cohabitation with sites developed to meet other rhetorical goals (i.e., information sharing, news, and political discussions).

In other words, with a transformative medium of information exchange and culture creation comes a new paradigm for existence, mourning, and death. Cultural cross-politinaton takes place between political and personal spheres. These new modes of sociocultural interface were brought to the forefront by the events of September 11, 2001. For the first time, news and images of these events were disseminated globally over a medium that not only allowed for democratized consumption but also democratized creation. Lee Manovich considers this new form of interaction as “telepresence” describing it as

one example of representational technologies used to enable action, that is, to allow the viewer to manipulate reality through representations.

The key element of this new reality lies in its negotiability. People don’t have to experience these events and mourn them through one-way channels like television and radio (transformative in their own right), but they are able to actively participate in defining the life, death, and memory. Because this cultural exchange itself takes place over digital mediums in databases, these processes can themselves be studied and evolve understanding over time.


Data is the Oxygen to Machine Learning’s Fire

cat detection
YouTube’s Conception of “Cat”

This past week’s discussion of Big Data seems incomplete without mentioning machine learning. For the uninitiated, machine learning is a method of computing that “trains” algorithms to achieve a desired result using large amounts of data. While machine learning has been around since 1959, it has only recently come back into fashion as businesses are waking up to its potential in the age of Big Data.

MLB post 11 - Image 1.JPG-550x0

The key ingredient in effective machine learning applications, from FaceBook to Baidu, is data. Lots of data. Because of their scale, these companies are able to gather trillions of data points for everything, including individuals’ emotions, shopping habits, facial features and much more. To a human, or even an army of humans, one trillion pictures of peoples’ faces would yield little utility. But, to an elite team of computer scientists, such pictures allow them to construct systems that can recognize identity and emotion more accurately, and at exponentially greater scale, than human beings.

The paradigmatic shift from silos of individual networks to aggregated data and computing resources known as cloud computing brings with it greater efficiency and more data. Joseph Sirosh, Microsoft’s VP of Machine Learning, was recently interviewed by the cloud and data experts at GigaOM. Sirosh explains that Microsoft is rapidly transitioning from an operating system provider to a cloud provider with expertise in big data and machine learning. He goes so far as to state that computing itself is less important than the data that it provides:

“I think you should even first ask, ‘How big is the world of data to computing itself?’” he said. “I would say that in the future, a huge part of the value being generated in the field of computing . . . is going to come from data, as opposed to storage and operating systems and basic infrastructure. It’s the data that is most valuable.”

Microsoft has made its billions by providing software and services. Pivoting the business model of a $360B company is a herculean task; it’s safe to say that Microsoft and every other major player in technology wouldn’t be chasing desperately after data science if it wasn’t a huge deal. Data is often described as the new oil of the 21st century – machine learning is the new refinery.

Continue reading Data is the Oxygen to Machine Learning’s Fire



DATACIDE: The Total Annihilation of Life as We Know It

An excellent article I recently read talks about how data, algorithms, and connectedness have changed our world and made us aware of heretofore hidden aspects of our collective selves..

I’ve seen the best minds of my generation sucked dry by the economics of the infinite scroll. Amidst the innovation fatigue inherent to a world with more phones than people, we’ve experienced a spectacular failure of the imagination and turned the internet, likely the only thing between us and a very dark future, into little more than a glorified counting machine.

Am I data, or am I human? The truth is somewhere in between. Next time you click I AGREE on some purposefully confusing terms and conditions form, pause for a moment to interrogate the power that lies behind the code. The dream of the internet may have proven difficult to maintain, but the solution is not to dream less, but to dream harder.

Invisible Observations: Week of February 2nd

Tasked with making observations and gathering data for two DIG 210 classes, February 3rd and 5th, I sought to try something unconventional. Rather than focus on visible characteristic(s), I thought I might gather data about the auditory aspect of our class. Given that we exchange knowledge in our class primary over the medium of sound, through our voices, I thought it might be interesting to transpose the sound waves that reverberated through Studio D of the Davidson library from 1:40pm to 2:55pm on those two days into a static 2D image that we can view and make observations about.

Sound Waves Captured on February 3rd
Sound Waves Captured on February 3rd
Sound Waves Captured on February 5th
Sound Waves Captured on February 5th

There are several limitations to this method of observation. One is the fact that my computer’s microphone was the sole conduit for generating data; being in one place and not designed with high-fidelity audio capture in mind, the data that my computer generates are limited in accuracy. Sounds that I or others at my table made register more prominently than equally loud sounds made by others. These data paint a picture from the vantage point of my computer, which on both occasions I tried to position as close to the center of the room as possible. Finally, my skills in audio analysis are extremely limited; with more expertise and better software, I would be able to provide many more statistics that might allow us to glean more information from these data.

The blue lines vertically indicate amplitude of the sound waves over the horizontal axis of time. We notice by comparing the two that the February 5th class was, on average, louder than the February 3rd class – given that a guest speaker spoke to the class through many speakers on the 5th, whereas Dr. Sample primarily lectured using only his vocal cords on the 3rd, this makes sense. We notice in both graphs more peaks toward the end of the class as opposed to the beginning, which could be the result of many factors – perhaps the conversation becomes more heated and interesting when the class is more involved in the material?

Note: I have chosen to not make the actual sound files available for a variety of reasons.

The Atlantic – ‘The Cloud’ and Other Dangerous Metaphors

Interesting read from The Atlantic:

Underlying the discussion has been a tangle of big, thorny questions: What policies should govern the use of online data collection, use, and manipulation by companies? Do massive online platforms like Google and Facebook, who now hold unprecedented quantities of sensitive behavioral data about people and groups, have the right to research and experiment on their users? And, if so, how and to what extent should they be permitted to do so?

Data escapes attempts to fit it neatly into a single conceptual box. Consider three phrases—now so commonplace as to be unremarkable—that we use to talk about data:

  • Data Stream,” which refers to the delivery of many chunks of data over time;
  • Data Mining,” which refers to what we do to get insightful information from data; and
  • The Cloud,” which refers to a place where we store data.

These tropes are notable because they use distinct, physical metaphors to try to make sense of data within a specific context. What’s more, all three impute radically different physical properties to data. Depending on the situation, data is either like a liquid (data streams), a solid (data mining), or a gas (the cloud). Why and how these metaphors get used when they do is not immediately obvious. There are tons of alternatives: Data could be stored in a “data mountain,” or data could be made useful through a process of “data desalination.”

And in all our talk about streams and exhaust and mines and clouds, one thing is striking: People are nowhere to be found. These metaphors overwhelmingly draw from the natural world and the processes we use to draw resources from it; because of this, they naturalize and depersonalize data and its collection. Our current data metaphors do us a disservice by masking the human behaviors, relationships, and communications that make up all that data we’re streaming and mining. They make it easy to get lost in the quantity of the data without remembering how personal so much of it is. And if people forget that, it’s easy to understand how large-scale ethical breaches happen; the metaphors help us to lose track of what we’re really talking about.

X-Post: Economic Adaptations of Big Data

Wanted to share a post that I wrote for my Digital Anthropology class at Davidson that discusses how business are attempting to shift to Big Data cultures:


“Big Data”

epitomizes a buzz word. Tech blogs began following the emerging field of analyzing very, very, very large amounts of data that humanity is generating at an unprecedented pace about 10 years ago. What was once an interesting exercise has become much more as startups have built tools that allow businesses to gather and analyze big data in real time. Tools like Hadoop and MongoDB are few among many of the entrants vying to lead the big data revolution that transcends mainstay relational SQL databases in favor of NoSQL databases that hoard every last datapoint that they can, even if those data provide no perceivable analytical value at present.

While many organizations are waking up to the potential of big data, most are slow to understand it. Companies are quick to advertise that they “leverage big data analytics” to “provide value” and discover “new synergies”, but more often than not they are putting a new dress on the same dog. While tools to harness big data are developing rapidly, people can be the limiting factor in catalyzing a shift to big data culture. This was recently espoused by Mark Gazit in an op-ed on TechCrunch. He cites research that catalogues the talent drought for data scientists, noting that there simply isn’t enough human horsepower available for companies attempting to integrate big data into their core business models. Rather than calling for more training programs, Gazit champions the efficacy of machine learning and artificial intelligence in bringing big data to the masses:

Ultimately, by solving the issues that prevent full optimization of big data analytics — especially the human factor and its disproportionate impact on the current-day process — organizations will be able to detect and address all types of threats and opportunities much more rapidly. This capability is becoming increasingly crucial in an era when data is being generated by both humans and machines, and is sure to become a pivotal way for businesses to create situational awareness, detect issues and optimize operations to achieve their business objectives. – Mark Gazit, CEO of ThetaRay

It remains to be seen how machines and humans will complement and subordinate one another as the data economy continues to evolve. Greater efficiencies, lower costs, and more predictability are inevitable – the question is, who will we have to thank?

Note: I am sharing this post out of interest and relevancy, not for any form of academic credit. This post was created as an assignment for a separate class. 

Organic Eyes, Electronic Glasses


In his analysis of surveillance, The Electronic Eye, David Lyon begins by noting that computer scientists back in the 1970’s were already beginning to imagine how plausible Orwell’s Nineteen Eighty-Four might be. An equal amount of time has passed since Lyon’s analysis, and time would seem to further confirm his suspicion that increasing computing capability is birthing an entirely new surveillance paradigm.

Lyon eludes to his thesis that a panoptic framework, derived practically from Foucault but originally from Bentham, goes further in articulating the modern multivalence of surveillance than an Orweillian one by noting that, “the old dichotomy between decentralization and centralization is itself  now questionable”. On face, such an observation might soothe an Orwell, who feared the emergence of a massive state surveillance apparatus, but pulling the string of decentralization further unravels an arguably more terrifying ball of yarn.

Rather than being dominated by a carefully controlled and maintained surveillance state, society today is impelled “forward” (if we grant  teleology to technology) by a hairy amalgam of economically coordinated actors that realize gains through strategic information gleaned from the firehose of data that humanity is pumping out. Google, FaceBook, and many others large and small thrive off of a business model that positions users as products for advertisers to consume. Indeed the state is more capable of surveilling, but so are its citizens. One shudders to think how well Orwell described the Microsoft Kinect, which in its second iteration is now capable of determining a user’s heart rate and mood by looking at their skin alone.



The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment.

Most tellingly of the prescience of the panopticon, the state doesn’t have to force citizens to have one of these boxes in their homes. Rather, in 2015, people willingly go out and spend $500 to set up surveillance boxes themselves.

Continue reading Organic Eyes, Electronic Glasses