Where is the Amazon Marketplace?

Uncategorized — kpw @ July 12, 2012

After listing to the incredible On Point show on the “Amazon Economy” and discovering Barney Jopson’s series on the same topic I decided to dig up this research project I put together on Amazon Marketplace.

This project was inspired by a series of confounding Amazon Prime purchases — purchases that clearly cost Amazon far more money to fullfil than I paid for the goods (in one case I estimated that Amazon spent $100+ just to ship a $30 purchase).

At the prompting of friends curating a gallery show in NYC on “work” I turned this curiosity into a small experiment: buying several of the same item in hopes of mapping out some of the internal dynamics of the Amazon Marketplace supplychain.

Not sure how successful I was but here’s the text I drafted for that show:


How does Amazon Marketplace compare with physical marketplaces?

Where is the Amazon Marketplace?

How do sellers participate and how are they paid?

Who makes money and how much?


Buy low-cost commodity item (HTC Micro-USB Charger U250), from multiple sources. Compare shipping times, postage and seller fees to understand the dynamics of Amazon Marketplace.

Price similar items from local retail outlets.


January 2nd: Purchased three new USB chargers from five vendors. One vendor sold a complete charger, others sold components (plug/adaptor and cable) separately. Vendor price included shipping.

Average price paid for complete charger including shipping and tax: $5.88. Total time spent selecting items and submitting order: 30 minutes.

January 4th: First shipment arrives via UPS. Amazon-fulfilled order from 11th Street Wireless. Cardboard box containing a used HTC plug. Scratches evident and one plug prong is bent. Will require minor repair before using. Checked order records look for notation that the item would be used condition. Could find no indication that the item would be used.

Researched Amazon Fulfillment service. Discovered that the seller, 11th Street Wireless, shipped its inventory to an Amazon warehouse (Lexington, KY). Although the item was “sold by 11th Street Wireless,” Amazon was responsible for storing, packing and shipping the item.

In return for this service the seller paid $2.40 in fees to Amazon ($1.00 per order fee + $0.75 pick & pack fee + $0.60 referral fee + estimated $0.05 per month storage fee). Amazon covered the shipping costs as part of their Amazon Prime program. Minimum estimated shipping charges for Amazon: $2.50 (UPS charges $10 retail for an equivalent shipment, estimating Amazon’s rates are 25% of retail).

Estimated that 11th Street Wireless made profit $2.59 on a $4.99 sale. Amazon lost, at minimum, $0.10 on the sale, however, the free shipping was available as a result of paying a $79 annual Amazon Prime membership fee.

January 6th: Received two envelopes via USPS: one contains a cable, the other a complete cable and charger set.

Cable received from Seller1on1 (Brooklyn, NY), used condition. Checked order records, no indication item would be used. Seller paid first-class postage ($0.44) plus an estimated $0.20 in fees to Amazon for listing the item. Seller made $1.86 profit on a $2.50 item.

Complete charger/cable set from item44less (Sunrise, FL), new condition in original packaging. Seller paid first class postage ($0.64) plus an estimated $0.36 in fees to Amazon for listing the item. Seller made a $3.51 profit item on a $4.51item.

January 8th: Still waiting on delivery of remaining plug (shipped from California, estimated delivery time: 5 business days) and cable (shipped from Hong Kong estimate delivery time: 3-4 weeks).

Priced equivalent items at local stores:

RadioShack does not carry the HTC charger. Found equivalent PointMobl™ Micro USB AC Charger for $19.79 + tax.

T-Mobile store does not carry the HTC charger. Lowest-price solution: T-Mobile 2-in-1 Wall/Car charge for $39.95 + tax. Equivalent charger: $39.95 for wall plug + $19.95 for USB cable + tax.

BestBuy does not carry the HTC charger. Found equivalent Rocketfish™ Wall Charger for $21.99 + tax.

Re-priced item on Amazon. Original lowest price of $4.51 from item44less is now $4.75 from same seller.

Addendum: Item from China arrived January 16th shipped from Shenzhen. Postage appeared to be less than $1. Tried and failed to understand Chinese postage rates. Curious how government postage rates are set: is there a subsidy for international shipments?

Why ESRI (as is) can’t be part of the open government movement

open access — kpw @ February 3, 2012

This post has been floating around in email conversations with colleagues for the past month, however, last week a semi-related Twitter discussion prompted me to finally document my thoughts in a public form. For those interested in that conversation you should also read Andrew Hoppin’s excellent post on the CivicCommons blog. I’m a big fan of what CivicCommons is doing but think the situation with ESRI demonstrates some of the challenges in evaluating civic software options and opportunities for further refinement of the CC Marketplace.


I recently made a frustrating–and failed–attempt to access a government data set encoded in ESRI’s proprietary and closed File Geodatabase format.

This led to an email exchange with Jack Dangermond at ESRI asking about the difference between ESRI’s stated desire to help open up government data and my experience where ESRI’s commitment to closed formats prevented me from accessing data–at least without first buying a copy of ArcGIS.

Jack wrote a lengthy and thoughtful response that can be summarized as follows:


1) ESRI’s closed formats shouldn’t prevent me from accessing the data–there are many options for extracting what I need without buying their software. While true in concept, given the particularities of the data I needed this turned out not to be the case. There was absolutely no way for me to read the data I needed without buying ESRI software.

Sure, the government could intervene and reformat their data to better serve me, however, they were unable or unwilling. And from their perspective had already met the spirit of the Open Government Directive by releasing a bulk download of what I needed. Unfortunately for me that download was in ArcGIS’ default format, a format that can only be opened with ESRI tools.


2) ESRI isn’t interested in repeating the experience with Shapefile where any developer could read or write to the files. In Jack’s words, this created “lots of work for users ‘fixing’ these corrupt files because some software developers did not implement the specification correctly.”

That is pretty different from my experience where the open Shapefile spec ushered in an era of collaboration and led to a proliferation of new tools. Never once in my fifteen years in the industry have I encountered a corrupted file caused by a bad tool.

Jack claims this as a technical consideration, however, anyone that has familiarity with geospatial software should also notice a strong business case for this perspective: ESRI doesn’t benefit from others participating in the software marketplace or building these “bad tools.”

Closed formats result in closed data. Despite the availability of interchange formats no one is using them and it’s not clear if they are a complete solution. In fact, I couldn’t find any open source tools that read or write ESRI’s XML-based interchange format, nor could I find anything on the web that had been released in that format. ESRI may think they’ve checked the data portability box but no one else seems to agree. Same is true for proprietary drivers for file access: ESRI showed up several years late with a binary driver solution that is still incomplete (it couldn’t open my data).

There’s simply no technical excuse for closed formats. If ESRI doesn’t want to maintain its own format spec it should use someone else’s: SQLite or many other formats like it could offer a great foundation for an open file-based spatial data store.


3) ESRI is a private company and has every right to create proprietary formats.


But I have a problem when ESRI also claims to be an enabler of open government, facilitating data sharing and collaboration. I have an even bigger problem when it takes taxpayer money to manage public data, and then requires taxpayers to buy ESRI software in order to access that data.

One of the fundamental tenants of the OGD is providing data in bulk, platform-independent formats. Instead of supporting this outcome, Jack explained his preferred approach: APIs exposed via ESRI software that allow users to access what they need from data stored within closed ESRI file formats.

This is a great strategy if you’re in the business of selling GIS software, however, this is a bad idea if you’re interested in sharing data. APIs aren’t the same as bulk access (bulk access is critical for many GIS applications). And APIs with EULAs != open data.


ESRI has chosen to vertically integrate their GIS platform and has no interest in allowing others in. This is a fundamental incompatibility with the current conception of open government.  They have every right to follow this course but there’s no reason why those of us that have invested years specifying the terms for dissemination of public data should celebrate their participation until they follow the guidelines we’ve set.

It doesn’t have to be this way. Their software is incredibly useful and in many cases has no counterpart, open or closed. ESRI could become a valuable partner in opening government data–if they embrace open formats and let others in.


For those actually interested in the gory details, here’s the TL;DR’er version of my recent encounter:


Stealing Ideas

open access — kpw @ July 19, 2011

Reading about Aaron Swartz’s most recent run-in with the law dredged up all kinds of feelings. I’m a long-time admirer of his work and was obviously saddened to hear of his troubles. At the same time, reading the indictment I was surprised by the seriousness of the charges and evidence against him.

I was also reminded of my own attempts at similar work, collecting and analyzing journal articles, patents,  and various forms of metadata. I’ve lost count of how many hours I’ve spent sitting in basements of academic buildings, breaking federal laws in the pursuit of answers. And I was reminded of my colleagues who still spend their days painstakingly scraping data off the web–sometimes legally sometimes not–the name of academic inquiry.

None of us want to break the law. It’s simply that we don’t have a choice.

The mechanisms for sharing academic discourse are broken. They barely even function as systems for connecting interested parties within existing disciplines. Ask just about anyone who spends their time writing or consuming scholarly work and you will hear a litany of complaints about how poorly suited the academic publishing industry is to modern day collaboration.

I’ve spent most of my professional career just outside of the academy but have seen the failures of these systems first hand. I formed my opinion on the matter as a undergraduate assistant in a major neuroscience laboratory–building publishing tools to help the lab’s director break copyright law.

His work regularly appeared in and on the cover of major journals. Yet he was in a field that was moving faster than the journals could help facilitate. He took matters into his own hands by publishing the articles on the laboratory’s site, almost always violating the licensing terms of his own work (rights now held by Elsevier or AAAS, not the author). I asked about the legality of what we were doing and was told not to worry. If the journals didn’t like him bending or breaking the law he’d publish elsewhere and it would be their loss.

As far as I know the publishers understood the bargain and never complained. Unfortunately this sort of non-aggression pact is available only to a select few. Your average untenured neuroscience professor doesn’t have the luxury of pissing off Science or Nature.

But for those of us interested in meta-analysis–these questions about questions that people like Aaron and myself are forced to pursue from basement wiring cabinets, scraping large swaths of text from the web–the hobbled and clunky tools for downloading PDFs through research library proxy servers, one poorly OCR’ed page at a time, simply do not work.

If you want to understand the collaborative nature of a specific field or follow the trajectory of an idea across disciplines a reference librarian can’t help you. Instead, you have to become a felon.

What’s missing from the news articles about Aaron’s arrest is a realization that the methods of collection and analysis he’s used are exactly what makes companies like Google valuable to its shareholders and its users. The difference is that Google can throw the weight of its name behind its scrapers, just as my former boss used his name to set the terms with those publishing his work.

Aaron and the other “hackers and thieves” like him don’t have that option. But their work is no less important–they are collecting and organizing information in order to ask deep questions about the nature of academic discourse. Unfortunately for most, the structure of the publishing industry and the laws that surround creative works prevent these questions from being asked, at least without taking sometimes substantial risks.

It shouldn’t and doesn’t have to be this way but there are at least two main issues holding back progress:

First, as a society we’ve forgotten the Jeffersonian ideal that intellectual property laws should enable and encourage the spread of ideas and creative pursuits rather than lock them away.  Many have fought for a return to this vision, however, the prospects for such change seem dim. If there’s anywhere this idea should still have a fighting chance, it’s within the walls of universities.

However, it is this most basic failure, our inability to create a rational set intellectual property laws, that necessitates the creation of things like JSTOR. We shouldn’t need it in the first place. Nor should anyone curious enough to ask questions as big as Aaron’s ever need to break JSTOR or the law to find answers.

We should offer people with big questions more than a trip to jail–we should celebrate their willingness to explore our collective intellectual heritage. Universities should take the lead in building the platforms needed to support such inquiry. It is an embarrassment that JSTOR is the best the academy has to offer.

But this leads to the second and perhaps more fundamental problem: journals are only partly about communicating. They’re also about controlling academic discourse. The editorial power held by journals and those that run them (quite different from those that own them) shapes most academic careers and the very structure of disciplines. It’s almost certain that pursuing new forms of collaboration and communication will reshape these power structures–sometimes subtly, sometimes not. That’s the nature of change.

Change, however, doesn’t come easily within academic communities. It should be no surprise that universities have done far more to free the content of their courses than they have the content of their publications. The former has economic value, however, the latter holds the keys to the academy itself.

This conservatism is at least in part responsible for why, despite the new possibilities offered by the web, most scholarly work is still published as though it were 1580. It’s also responsible for allowing a handful of powerful corporations to gate access to this knowledge and make authors pay for the privilege of signing away rights to their own work.

Sir Tim Burners Lee invented the web to solve this very problem. Twenty years later it allows us to do almost everything imaginable–except get unfettered access to scholarly communication.

It is not technology that holds us back.

Aaron’s arrest should be a wake up call to universities–evidence of how fundamentally broken this core piece of their architecture remains despite decades of progress in advancing communication and collaboration.

The MIT staff who called the FBI police* would have been served better by calling the chancellor to ask, “How have we created a system that forces 24 year-olds to sneak around in the basement, hiding hard-drives in closets in order to ask basic and important questions about our work? Can’t we do better?”


Update: I’m not OK with scraping JSTOR or any other copyrighted data source for the purpose of re-distribution. Some, including the FBI federal prosecutors, have made the claim that’s what Aaron planned to with the data. Others have pointed to his past research analyzing influence in academic writing. I have no insight into his real intentions, however, I do believe the latter goal is important and likely not possible without breaking the kinds of laws discussed above.

Also, it’s true that JSTOR does offer a bulk interface for research users. That interface didn’t exist when I was doing my work. But it’s not clear it would have made any difference. There are many, many research applications, including mine, that are still not possible with approved means of accessing data. Giving researchers a straw is not a useful response to requests for open and complete access. We shouldn’t settle for less.


* For those interested in the blow by blow: since writing this post I’ve learned that no one at MIT called the FBI–in fact it’s not clear the FBI was ever involved. As I now understand it, the local police were called to investigate a break-in. Because this involved network equipment the Secret Service were called by the Cambridge police. After that the investigation took on a life of its own outside the MIT campus.


A version of this essay appeared on Reuters MediaFile under the title “The difference between Google and Aaron Swartz.”

Measuring Centrality in Tacit Social Networks

tacit social networks — kpw @ January 21, 2009

There’s an interesting new paper (Maslov, arXiv:0901.2640v1) up on the arXiv this month about using centrality metrics (in their case a modified PageRank) to analyze citation graphs in academic publishing. I’ll refrain from summarizing the paper as a related post on the arXiv physics blog has already done a great job. But the upshot is that there’s a lot of value in applying these kinds of metrics to citation networks.

This paper fit closely with work I did in the past looking at citation graphs in patent data (the complete set back to the 1970s) . In my case I was trying to assess the importance of inventors within a given field of innovation using a betweenness centrality metric (though PageRank/eiganvector centrality would have also been an appropriate choice). Like the Maslov paper illustrated, this approach had a very high degree of success in finding key individuals in given fields. As an example, I did a test on patents issued for technologies related to video games and the betweenness centrality metric showed Shigeru Miyamoto, the lead designer at Nintendo, as the top innovator in an inventor-to-inventor citation graph. This result appears to be supported by his biography which includes such honors as being named the “Walt Disney of electronic gaming” by TIME Magzine.

One problem not addressed in the Marsolv paper, however, is the translation from papers to people. The Marsolv approach only ranks papers though it makes inference about the rank of the people that wrote them. I considered this in my work with patents and found it problematic. Rather than looking at centrality across a paper-to-paper citation graph I decided to first derive a person-to-person graph that summed citation edges between inventors across the complete body of each inventor’s work. I was fortunate enough that the data I was working with had already attempted to disambiguate the inventors (no small feat!) so it was possible to translate between a paper citation graph and a people citation graph with relative ease. In a sense the person-to-person citation graph forms a sort of tacit social network extracted from the patent data.

I’m very much aware of the challenges in doing this with journals so I don’t fault them for not addressing this in their paper. However, it would be a great follow-on study to explore the difference in rankings using these approaches. I believe that there’s a need to continue thinking about the value of tacit social networks derived from sources like journals and patents, particularly in cases where the data is used to generate sociometric values like impact and importance.

Getting started…

house keeping — kpw @ January 7, 2009

I’ll spare you any prognostication about what’s to come and simply list a few of the things I’m thinking about these days:

See you soon.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2015 Structural Knowledge | powered by WordPress with Barecity