Charley Ferrari

The Problem of Measuring Average Neighborhood Housing Prices

Even today, it's very normal to see high double digit housing price numbers in New York neighborhoods.The thought recently came to me however of new developments' effects on prices and what this means to homeowners.

Without taking development into account, housing prices get skewed. If newer buildings are added, the average price of a house will not be a reliable indicator of how an individual property is expected to perform.

To see how unclear this sort of measure makes things, consider a neighborhood with a group of equal units worth $300,000 each. Over a period of time, those units increase to $310,000. During that same period, the housing stock increases by 25%. and these new units are of higher quality and worth $600,000.

The average price of this neighborhood is now $368,000, and one would be tempted to say housing prices have increased by 23%. While true, this has absolutely no bearing on what will happen to your individual house. Based on the evidence, all one can say about the neighborhood is house prices increased by 3%,

This sort of mismeasurement has several implications. Expensive new development leads to a price differential between new and old units. This is a direct causal link between new development and increasing prices in existing units, which may or may not be counteracted by the increasing supply further satisfying demand.

More importantly, this does something to the psychology of real estate. In an area with increasing housing values and at least some development, the average price of a neighborhood will ALWAYS overestimate what's actually happening to each individual house. But, you can be sure that developers will point to this number when trying to sell $600,000 units.

Overall it just proves how important it is to really think about what measurements mean. Something that may seem clear may actually be systematically misrepresenting what you truly would like to know, and uncovers unseen mechanisms in a market.

A Few Thoughts on Satellite Sourced Economic Data

Ever since Richard Florida cited this data set in Who's Your City, I knew this was both a novel way to look at economic data, and would have problems that need to be sorted out. Florida just wrote a feature about this data set for city lab, and it really highlights a few of the easy improvements that can be done to improve this data. With a little training, this can very accurately predict economic activity.

The first paragraph highlights one of the first problems one would run into:

Looking at aerial images of nighttime lights can tell us a surprising amount about human activity on the ground. Satellite images of the world at night have been used to see how North Korea's political isolation has left its residents in the dark, and how Texas's booming oil industry has spread across the landscape. As Yale University economist William Nordhaus has noted, roughly 3,000 studies have used nighttime lights as a proxy for various economic activities just since 2000.

Looking at a picture of Texas they linked to in another article, the difference between these sorts of economic activity becomes apparent:

The potential problem this can cause is explained further into the piece:

Their research found that the satellite data correlated more strongly with population density than with economic measures, finding close statistical associations between luminosity levels and population levels, population density, the number of establishments, and the number of employees. But the satellite data were considerably less accurate in estimating for a key measure of the level of economic activity—wages. Based on a geographically weighted regression analysis, they found that nighttime light levels overestimated wages for the largest cities like Stockholm, Gothenburg, and Malmo, where together more than 40 percent of Sweden’s population resides. In contrast, satellite images generally underestimated wages for smaller towns and rural areas.

My first impression of this statement is what would lead someone to think wages are correlated with light levels within a country? I could definitely think of reasons why wages would be correlated with light levels when comparing countries, and would definitely be useful in comparing wages between urban and rural areas. When comparing urban areas within a country however, I'm not surprised smaller cities are underrepresented. I definitely believe that, within the same country, larger cities would have higher wages than smaller cities. I simply think that this is not proportional to the amount of light you emit. To put it another way, adding one more household to a small city will have a greater impact on the light output than adding one more household to a large city, since larger cities are more efficient.

This may seem unrelated to the energy economy in Texas, but that picture allows me to highlight how to fix this problem. The new energy economy in the United States is perhaps one of the biggest examples of economic activity taking a unique spatial form in a very short time. Over a period of years, the prospect of fossil fuel independence became very real for North America.

The pinprick pattern of energy development would be very easy to pick apart from more general urban economic activity. Combined with existing data of well production (full disclosure: I work for a company that could provide this data!) A computer could look at these satellite images and automatically pick up energy regions.

This is a very clear example, but this could also be used to try to improve this model for wages. Once again, there's a fairly clear pattern: smaller cities are getting overestimated and larger cities are getting underestimated. One could train a model to pick out contiguous metro areas, measure their size, and properly control for the efficiency effect I hypothesized above.

In this way, satellite data can be fine tuned to the economic concepts we want to. Premise has done amazing work in analyzing pictures to track price and quality of goods data across the developing world. With a similar sort of evolving model, other macro concepts can be predicted as well.

One more thing satellite data could be used for is determining whether growth is "healthy." In Cities and the Wealth of Nations, Jane Jacobs explains 5 ways city regions can grow. When they act in concert, they can transform formerly inert land into economically productive city economies. When they act in an unbalanced way, they lead to economic imbalances that may seem healthy in the short term, but in the long run are damaging and can only lead to ruin.

In the context of satellite data, this sort of balanced city growth should have a very noticeable pattern. There's a reason why slime molds can accurately predict train systems. If done in a healthy way, balanced growth should follow an organic pattern.

Compared to organic looking cities, the growth in the US energy sector seems very haphazard at first glance. Using Jacob's categorization, these regions look like supply regions, which instead of replacing their imports become addicted to the outside city economic activity that led to its founding. Some of the effects of this are positive: Texas and North Dakota have some of the healthiest job markets in the country, but it won't lead to long lasting economic development, and once this activity dries up, the regions will return to inert and we'll have a lot of upset unemployed oil workers.

The Open Source Economics That Caused Heartbleed and How to Prevent it From Happening Again

The Heartbleed bug has seemingly shone a light on the dark side of our open source architecture. The benefits of open source have long been known: it's free and you have a large community of users that will continuously improve the software. It's a naturally occurring collaborative arrangement that seems to beat the other corporate options out there.

The downsides have previously been less publicized, but have always been there. Many large corporations have moved to open source, but there is a reason why some stick to commercial solutions. I work for a company that creates commercial data analytics software, and while I am a huge proponent of open source alternatives, I see why clients turn to us rather than open source alternatives like R, Python, or Gretl. If you're using an open source option, answers are almost always a google or openstack search away. If you're using a commercial option like my company's, you'll be able to get an expert on the phone who can show you what to do (that expert would probably be me.) At its core, the main service my company provides when compared with open source options is our company assumes responsibility for the software we produce.

This dynamic is why it took two years to notice Heartbleed. There was always an active development community surrounding it that in many ways is more dynamic than any commercial community can be, but no one ultimately can be held responsible if things go wrong. Cyber security is something that users won't notice unless it fails. Open source dynamics do many things right, but this is not one of them.

In my industry there is also the beginnings of a solution to this problem. Companies such as Revolution Analytics and Continuum Analytics have emerged as the commercial face for open source R and Python respectively. The underlying architecture is free, but companies like these are able to add consulting services or custom addins to open source software.

The dream of open source was that users will be actively maintaining the environment. While this has come to fruition in terms of upgrading user centric functionality, there are some holes, and ultimately no responsibility. This evolution in open source economics allows us to have it both ways. We can get large open source communities, but also have pay options available for those who need it. The providers of pay options can begin taking responsibility for software, and care about it in the same way commercial providers do. Large open source userbases provide the externality of a well maintained infrastructure that these consulting companies can take advantage of. Consulting companies, worried that their paying clients wouldn't trust the software if it had security and other non-user centric bugs that would never be noticed by volunteer communities will work to fix them, providing an externality to the free user community.

Much has already been written about how we need to pay people to solve security issues. Grants might be feasible in the short term, but the industry arrangement I describe above came about fairly naturally in data analytics. I don't see why something like this can't be encouraged elsewhere.

Uber Wants to Replace Your Local Department of Transportation

If you were unsure about what Uber's long term goals are, their April Fool's joke makes things crystal clear. Along with surge pricing, Uber is testing long held consumer assumptions about transportation.

It's not acknowledged as such, but there are truly limited choices when it comes to transportation. You have your selection among available cars, but car ownership itself is a bundled package. You're paying a hefty premium to be able to use your car whenever you want. On the other side of the spectrum, you have the option of a completely fixed public transit system. For a very low price, you're able to get from fixed points on a preset schedule. Compared to your other options you're getting a bargain, but you are constrained by the lines and timetables on your transit map. Cabs have always been in the mix as well, but are closer to cars on this spectrum: you're paying a premium to choose your pickup and dropoff location on your schedule.

I previously thought it would take a widescale adoption of driverless cars to bring about this sort of change. Uber is trying to break long standing consumer expectations and make people truly think about what they want when it comes to transportation. Looking at private cars and public transit as two sides of a spectrum, Uber is trying to give consumers a choice of any point between them. You can pay through the nose for a cab to pick you up right outside the bar now. Paying $2.50 is great, but you may not want to wait 45 minutes for a subway. I'm sure there are lots of people out there who would be willing to pay something in the middle to walk a block and wait 8 minutes for the next shared cab to come. Especially when you know exactly where it is on your phone.

This system is not even novel. In New York, several major outer borough streets leading to express subway stations have informal rush hour cab share systems. Over time, cabbies have begun picking up people at bus stops, and usually charge around $2 per person. Someone waiting for the bus in the morning can choose to pay $2 more than they normally would to get a faster ride now rather than a slower trip 8 minutes from now to the subway station.

When Uber builds up its fleet of cars and institutes this sort of pricing en masse, this April Fool's joke will look hilarious in hindsight.

Surge Pricing, Markets, and Expectation in the Taxi Industry

Recently Uber's practice of "surge pricing" during storms or other periods of increased demand. On one level, this is a fairly standard story of a more efficient market beginning to form in a highly regulated and currently distorted market.

But, I have to admit, seeing receipts like these just lead to a visceral reaction that you just can't control if you're a New Yorker. Cab rides just SHOULDN'T cost that much!

This uncovers a few more layers of what's going on. There aren't just entrenched interests that want to keep the medallion system in place, but deeply entrenched expectations among consumers.

First, consumers have an aversion to what's perceived as unfair price gauging. Gothamist asks the question in this way:

But when does surge pricing become price gouging?
Basically, there is no discernible difference.
"The best way to look at it is during Hurricane Sandy," says one financial expert. "You had these long lines because there was a limited amount of gas. But if you had surge pricing, gas could have been raised to $10 a gallon and there would have been less people on the line."

Raising gas prices during hurricane Sandy has to be the most egregious example of gauging. But, it's still true that if gas prices could have been raised to $10 a gallon there would have been no lines. Those who absolutely needed gas would have been able to get it without an issue, and those who could have done without would be able to wait for prices to come down and find other ways to get around.

This begs the question of what is a fair way to ration things during situations like this. Is it ethically more desirable to give gas only to those willing to put in the time to wait in line?

Sandy was a disaster however, surge pricing happens more regularly. If Uber continues its growth and eventual domination of car services, it won't take long for things to find a new equilibrium.

Lets say you're a potential UberX driver, and you see how this surge pricing happens. As a driver who has control over their schedule, you begin taking note of when these surge prices occur, and will begin to act accordingly. Surge pricing will train drivers to allocate themselves when their needed, and eventually the surges will be minimal.

There are still ethical questions that need to be answered. This still won't solve the problem of ensuring universal access of cabs for example, as the introduction of metered green outer borough cabs have. Even in a perfect Uber world, cabs will still be incentivized to cluster in downtowns.

On any side of a transaction though it's always healthy to try to get rid of a sense of entitlement. Medallion owners have no legitimate reason to keep their system in place, and consumers don't have an intrinsic right to a cheap cab ride regardless of the weather. An Uber like system will allow these unsustainable expectations to come to light, and end up in an overall better system.

Premise and The Hurdles Big Data Will Face With Economics

Premise is a fascinating new company that aims to disrupt how macroeconomic data is collected. Individual companies are using every conceivable aspect of consumer data to predict their own sales. Why can't this same insight be used to measure the economy and revolutionize economic indicators?

There is a reason Big Data started out in the collection and analysis of the data within individual companies. A huge component of the value of information is access. Within a company, this variable is taken out of the equation. All the data is proprietary to the company and everyone within the company is working towards a common goal.

The value of information about the economy depends on the information everyone else has, and this value expresses itself in a few unique ways.

The first and relatively simple way this expresses itself is the value of exclusive information. It's clear to anyone working on Wall Street how valuable having an information edge is. This drives researchers to search for more and more esoteric datasets to try to tease out a relationship no one has ever thought of. As a relationship becomes more known, its value quickly drops to zero as stock prices or any other market being watched already has this information baked into it.

There is a related converse dynamic to this value of uniqueness that comes into play once data becomes widely known. People use information to make decisions. Once information is "out there," its value increases the more people use it. This is because multiple actors are all making decisions that affect each other.

Think of a specific industry as an example. If you're an automotive company and have information that no other automotive company has access to, you can use that as an information edge. This edge pretty much disappears the minute one other company learns of it.

At this point the second dynamic takes precedence. If 10% of the industry is using a certain piece of information, there's not too much of an incentive to use that information. On the other hand, if 95% of an industry is using a certain piece of information, the 5% who aren't are at a distinct disadvantage, and if you're a new entrant or analyzing the industry from the outside, you're going to want this information too.

Economic data responds to these dynamics in some weird ways. Headline indicators are venerated as describing the pulse of the economy, and maintain their momentum through their own heavy use. GDP is a deeply flawed measure of the economy. There are some alternatives out there, and there is more and more data every day that will allow us to create better measures of the economy, but GDP is still the agreed upon standard and continues to be because of inertia.

A company like Premise can take advantage of these dynamics in two ways. The first is to simply create indicators that compete with the standard indices we now have available. If Premise creates a few "rockstar" economic indicators, it will have a stable business as long as there exists an economy to be measured.

This isn't truly unleashing the power of big data into the economy. In my opinion this would be an upsetting outcome: using all of that power to simply create a new static status quo in how we measure the economy.

I see true success coming in a second path: the creation of a platform where anyone can create their own indicators. Instead of using a headline CPI number that is created using certain statistical methods from the Bureau of Labor Statistics, these methods would be available to anyone using the platform.

With this platform in place, the dynamics that now allow certain data to become stodgy and out of touch will start leading towards innovation. All the platform will have to do is link to already existing economic data.

How will this process play out? A headline indicator like CPI has a gravitational pull that makes it a benchmark of the economy. However, if a user is able to make a few tweaks to that CPI data using its atomized components, they will gain the benefits of BOTH dynamics. Their measurement is close enough to CPI to benefit from its wide use, but different enough to create an informational edge. Instead of crystallizing into a series of discrete indicators that never change despite the changing economy, the dynamics will now be incentivizing the discovery of new and unique indicators.

Our understanding of the economy as a whole will be revolutionized, and we'll all be making smarter decisions. People will be encouraged to have a deeper understanding of the economy, instead of relying on measures of the economy that are used simply because everyone uses them.

Apple and the End of Technological Progress

Hyperbolic title, but as I'm just starting to read Average is Over by Tyler Cowen, I couldn't help but think about what sort of limits there are to some of the technologies Cowen predicts in his new book.

All of the smart technologies we can hopefully expect depend on interconnectivity, and any sort of technology that depends on this sort of interconnectivity faces the same monopolistic temptations as any other ambitious company. Every company in a competitive market, at least on some level, has the goal of destroying its competition.

If you have a traditional monopoly, you have traditional consequences. A utility company with a natural monopoly doesn't face the same price pressures that a competitive market does, and won't innovate as fast as if it were in a competitive market. for these traditional monopolies, we also have traditional solutions: governments come in to regulate prices. In terms of innovation, I'd argue that natural monopolies are also industries in which innovation has a lower value. Think about how much improvement one can reasonably expect in a city's water infrastructure (a contrary example to his might be the fact that we don't have true high speed rail in the United States yet.)

Things are different in the tech sector. The temptation to create "walled gardens" and closed app ecosystems could be compared to the natural utility monopolies. Prices have completely different meanings in the tech sector and discussing that would probably warrant its own post, but there is a very clear link between the creation of these walled gardens and innovation.

Consider Apple. Apple makes several products that they sell directly to consumer, and maintain an ecosystem surrounding these products. They don't charge directly for the ecosystem, but the ecosystem adds value to Apple's products, and therefore Apple is incentivized to make this ecosystem as flawless as possible, in order to make their products more valuable. This is a positive externality: assuming the price of the products remain the same, consumers get the benefits of the ecosystem for free. Of course, Apple recaptures some of this value by selling its products at a premium.

Part of this value is a "stickiness." Consumers begin to depend on the ecosystem. The longer a consumer remains in the ecosystem, and the more consumers are in the ecosystem, the harder it is for that consumer to leave the ecosystem. As an ecosystem becomes more successful, the company faces fewer incentives to improve it. This is a classic negative feedback loop, and acts as a drag on innovation. The ecosystem sows the seeds of its own destruction.

Google addresses this problem as a company. It realizes it is in its own best interest to continue to innovate, and while a walled ecosystem will lead to monopoly power, it cannot afford to stultify its own evolution. This is why Android is open, why Google lets you (relatively) easily get your data out of their services. It's far from perfect, but at least it's a recognition that this is an issue, rather than the blatant ignorance of the issue by a company like Apple.

There are unfortunately more Apples out there than Googles. At this point in our economic development, the benefits that Google is foreseeing are too long term for a vast majority of companies. Even Google is only to a degree paying lip service to this concept rather than fully embracing the concept (cough Google+.) As far as I'm aware, there does not exist a way to properly align incentives within our economy to avoid this drag on innovation.

Pages