Find a Professional

Find a Professional

Case Study: NYSE Trading Specialists



An interview with Sanjay Unni, Ph.D.

Our experts operate at the core of commerce, such as determining whether data patterns reveal foul play within NYSE stock trades. This analytics work is directly applicable to key elements of a firm’s growth strategy, including the prediction of customer behavior, pricing, and customer segmentation.

Describe for us the business scenario. What was the challenge that you encountered?

This particular engagement arose because of certain allegations made by the United States government against specialists on the New York Stock Exchange. Specialists are designated intermediaries whose role is to match buyers with sellers in order to accelerate and deepen the degree of public trading on the exchange. Specialists have an obligation to match public buyers with public sellers whenever such participants are available on both sides of the trade.

The government alleged that for several years the specialists on the exchange were giving their own transaction preferences with respect to public orders—in effect, trading ahead of their public orders or interposing themselves unnecessarily between public orders to make additional commission on the trade, or in order to profit from short-term price movements in the stock.

What was novel about these allegations was that they were brought substantially on the basis of a detailed and intricate data analysis conducted by the government on transactions taking place in the New York Stock Exchange, rather than claiming that they had taken secret videotapes of the NYSE floor and caught specialists conducting in such trade. The government alleged that the underlying transactions data of the NYSE had been carefully analyzed and processed through detailed computerized algorithms to identify when specialists were trading improperly. Complex data analytics was at the heart of allegations brought against financial market participants for improper trading behavior.

What role did you specifically play in this situation?

I was brought in to evaluate the merits of the evidence that the government had claimed to have developed against the specialists on the basis of data analytics. In particular, I was brought in to evaluate whether the underlying data used for this algorithm was sufficiently reliable to support the allegations, and whether there were other ways to view the evidence the government had claimed to have found in their algorithmic analysis.

There were separate SEC proceedings and a shareholder class action, and therefore these allegations became the core of a wider range of legal proceedings against specialists and the firms that hired them.

The New York Stock Exchange provided us with data on all the transactions taking place in the underlying stocks. An enormous amount of data is generated by the transactions that occur in just one typical stock on the NYSE. Every electronic order that arrives in the exchange is tracked through various posts and data servers as the order passes the NYSE's electronic system, gradually making its way to the post where the specialist is trading the stock, where it receives execution, and then goes back to being reported by the exchange as a completed trade.

Each point of passage through the NYSE's electronic data system generates time stamps for the order and other relevant metadata. The government sent data simply as a small sample of the underlying stocks at issue. But that data alone ran to hundreds of gigabytes of raw data, on every conceivable attribute of the orders. The data contained information on the shares involved, the price at which the execution took place, which party submitted the orders and received execution, what role the specialist played in the trade, and a variety of other details that helped the exchange understand how the trade went down. You can imagine that in a market when millions of transactions occur on a given stock, in just a given month, this truly became a Big Data issue to resolve.

The first challenge, quite simply, was to understand how these enormous datasets could be organized in forms where they could start providing insights into some of the patterns they contained. The second challenge—after having discerned potential patterns in this data—was how to draw from them simple communicable insights into what may have happened around these disputed trades, and what possible explanations could be attached to those trades.

A fair evaluation of these trades hinges upon the quality of the underlying data and how precisely it has been measured; essentially, the millisecond timing of when orders arrived and trades were executed could crucially affect the assessment of these trades. And when electronic systems are required to track data with that degree of precision, an important question arises: is the degree of accuracy in the capturing of that data sufficiently high to allow you to make inferences about improper trading?

In order to analyze trends within your data, or extract meaningful intelligence from it, you must understand the degrees of precision or error built into your data, so that you do not capture noise in the belief that you are capturing true intelligence. An important technical challenge that is often misunderstood or underappreciated by the business community is how well you know the degrees of accuracy or error present in your data, so that you can differentiate true intelligence from noise.

Once you've done that, you can move on to analyzing the substance of the issue in the engagement, here an allegation of criminal violation. Is it true that certain trades were indeed improperly executed by the specialist? And if they were improper in some form, is it possible from the data to address the question of why this incorrect trade may have occurred?

Here is where we began analyzing the story revealed by the patterns in the data. We reasoned that when trading is particularly hectic and the specialist has to cross public buy and sell orders at the frequency of four or five a minute, it is more likely that he or she makes mistakes than at times when trading is modest and there's much less room for error. Similarly, when the underlying trading becomes more complex, in order to pair buyers and sellers, a trader has to line up different buyers on one side and different sellers on the other side to make their volumes match. The task of matching buyers and sellers is inherently more complex for such trades, and there is a greater likelihood of error at that point.

End results?

The ultimate results had to be easily communicable. Jury members are typically not experts in statistical research. It was particularly important not to present complex statistical models that made tremendous sense to us but were like a foreign language to the jury. The big challenge here was to develop statistically significant conclusions and make them understandable to an audience of non-statisticians.

Drawing upon the patterns we observed in the data, we found several simple, intuitive facts that demonstrated the possibility of human error. We concluded that the erroneous transactions were occurring at times when humans were particularly susceptible to make mistakes. As well, the manner in which the specialist implemented these disputed trades in some sense left a significant amount of money on the table, which was inconsistent with trying to deliberately profit from the trading flow.

How is this case relevant to growth strategy of our client firms?

Although this case was executed in a litigation setting, and for financial market transactions, it drew upon many skills necessary to analyze the operations of a business environment on the basis of large, complex electronic datasets. Firms face very similar challenges when looking at the Big Data generated by their business flows and asking, “What do the patterns in this data tell us about how our business operates? What are the attributes of the customers we attract, or more importantly, fail to attract? What factors drive the purchasing choices of these customers on our website or physical stores? What can we learn from their recent consumption choices that can better predict the types of products they are more likely to choose in the future? What are the drivers of efficiencies or bottlenecks in our supply chain?”

In each instance, the challenge is to organize and evaluate very large bodies of data, recognize the patterns lying within this data, and differentiate insightful patterns from noise. By leveraging the data, we essentially recreate the business environment that was in place at the time customer decisions or business operations occurred in order to understand what happened and anticipate future behavior. This is the very definition of customer analytics.