Crowdsourced vision for visually impaired

Read an article the other day in Christian Science Monitor (CSM) on the Be My Eyes App. The app is from and is available for the iPhone and Android smart phones.

Essentially there are two groups of people that use the app:

  • Visually helpful volunteers – these people signup for the app and when a visually impaired person needs help they provide visual aid by speaking to the person on the other end.
  • Visually impaired individuals – these people signup for the app and when they are having problems understanding what they are (or are not) looking at they can turn on their camera take video with their phone and it will be sent to a volunteer, they can then ask the volunteer for help in deciding what they are looking at.

So, the visually impaired ask questions about the scenes they are shooting with their phone camera and volunteers will provide an answer.

It’s easy to register as Sighted and I assume Blind. I downloaded the app, registered and tried a test call in minutes. You have to enable notifications, microphone access and camera access on your phone to use the app. The camera access is required to display the scene/video on your phone.

According to the app there are 492K sighted individuals, 34.1K blind individuals and they have been helped 214K times.

Sounds like an easy way to help the world.

There was no requests to identify a language to use, so it may only work for English speakers. And there was no way to disable/enable it for a period of time when you don’t want to be disturbed. But maybe you would just close the app.

But other than that it was simple to use and seemed effective.

Now if there were only an app that would provide the same service for the hearing impaired to supply captions or a “filtered” audio feed to ear buds.

The world need more apps like this…


There’s a new cluster filesystem on the block, Elastifile

At SFD12 last month we talked with the team from Elastifile. They are a new startup out of Israel working on a better cluster file system.

Elastifile was designed to support 1000s of nodes, 100,000 of users/client and 1000s of data containers (file systems/mount points), together with an infinite (64 bit) number of files and directories and up to Exabytes (10**18) in capacity. They also offer a 100% SSD file store capability. I encourage you to view the videos of their presentations at SFD12 to learn more.

Elastifile features

Elastifile supports data compression and optionally deduplication with NAND/Flash (e. g., low-/high-endurance) storage tiering, cloud storage tiering and multi-site storage. They also provide NFSv3/v4, SMB, AWS S3 and HDFS as native access protocols for their file storage.

They also offer non-disruptive hardware/software upgrades, n-way (2- or 3-way) data and metadata redundancy, self-healing capabilities, snapshots, and synchronous/asynchronous data replication or mirroring. Further, they provide multi-tenancy and QoS support.

Elastifile can be used in hyper converged mode as well as a dedicated storage server mode. For backend storage, they support heterogeneous, physical (block, I think?) storage systems as well as direct access storage in cluster nodes

Internals matter

Elastifile’s architecture supports accessor, owner and data nodes. But these can all be colocated on the same server or segregated across different servers.

Owner nodes, own all the metadata objects for a file or directory and caches the metadata working set in i’s memory. Ownership file or directory metadata may change in the case of hardware failures.

Elastifile supports a dynamic write data path, which means they determine, in real time, where to write file data rather than having the data locations identified before hand. They call this distributed write anywhere semantics.

Notably they don’t do data caching (with NVMe it doesn’t make sense) however, as noted above, they do use metadata caching

Internally, Elastifile uses variable length objects for both file data and metadata.

  • File data is composed of three object types: a file metadata (FileMD) object, mapping data objects, and file data objects. FileMD’s hold the normal file metadata (name, file size, create, access & modify ToDs, etc.) as well as pointing to all the Mapping Object (OIDs). Mapping objects exist for each 0.5MB of file data and consist of a 128 element table, each element mapping 4KB of file address space to a data object (OID). Each  data object holds the 4KB of compressed file data and journal log entries.
  • Director metadata is composed of directory metadata (DirMD) object and Directory listing objects. Directory listing objects maps file/directory names to FileMD or DirMD OIDs. Directory listing objects are accessed via an extensible hash table and contain a list of filenames/directory names within the directory

The Elastifile software architecture consists of three layers:

  • A protocol layer which terminates file system access protocols and translates requests into internal requests. The hashing and data compression of file data occur at this level.
  • A metadata layer which provides file system/directory name mapping to objects for owned files/directories and maintains file/directory metadata updates/journals/checkpoints.
  • A data layer which provides transaction consistency and a n-way redundant persistent data storage for (file or metadata) objects.

Metadata operations are persisted via journaled transactions and which are distributed across the cluster. For instance the journal entries for a mapping data object updates are written to the same file data object (OID) as the actual file data, the 4KB compressed data object.

There’s plenty of discussion on how they manage consistency for their metadata across cluster nodes. Elastifile invented and use Bizur, a key-value consensus based DB. Their chief architect Ezra Hoch (@EzraHoch) did a blog post and paper on Bizur for more information


New file systems generally take many years to mature and get out into the market, cluster file systems even longer. Elastifile started in 2013, by some very smart engineers, is already on the market, just 4 years later. That’s impressive enough, but with their list of advanced functionality plus cloud storage tiering and multi-site operations all shipping in the current product is mind-blowing.

One lingering question is, does a market exist for another cluster file system? All flash is interesting but most of the current CFS’s do this and ship this today. Cloud storage tiering is interesting and a long term need but some CFSs already have this and others are no doubt implementing it as we speak. CFS’s use of objects for internal data and metadata management is not new and may make internals cleaner but don’t really provide a lot of customer benefit.

Exascale raw capacity, support for 100K users, 1000s of nodes, 1000s of file systems and an infinite # of files/directories is interesting. But most CFSs claim this level of support already, although this is more aspirational for some. And proving support at this scale is difficult, if not impossible.

On the other hand, Bizur is really neat. Its primary benefit is during recovery from hardware failures. For a CFS with 1000s of nodes, failures likely occur quite often. So Bizur’s advantage here may pay significant customer dividends.

Is that enough to to market a new CFS?

To see what other SFD12 bloggers have written on Elastifile, please see:

AI’s Image recognition success feeds sound recognition improvements

I must do reCAPTCHA at least a dozen times a week for various websites I use. It’s become a real pain. And the fact that I know that what I am doing is helping some AI image recognition program do a better job of identifying street signs, mountains, or shop fronts doesn’t reduce my angst.

But that’s the thing with deep learning, machine learning, re-inforcement learning, etc. they all need massive amounts of annotated data that’s a correct interpretation of a scene in order to train properly.

Computers to the rescue

So, when I read a recent article in MIT News that Computers learn to recognize sounds by watching video, I was intrigued. What the researchers at MIT have done is use advanced image recognition to annotate film clips with the names of things that are making sounds on the film. They then fed this automatically annotated data into a sound identifying algorithm to improve its recognition capability.

They used this approach to train their sound recognition system to be  able to identify natural and artificial sounds like bird song, speaking in crowds, traffic sounds, etc.

They tested their newly automatically trained sound recognition against standard labeled sound sets and was able to categorize sound with a 92% accuracy for a 10 category data set and with a 74% accuracy with a 50 category dataset. Humans are able categorize these sounds with a 96% and 81% accuracy, respectively.

AI’s need for annotation

The problem with machine learning is that it needs a massive, properly annotated data set in order to learn properly. But getting annotated data takes too long or is too expensive to do for many things that we want AI for.

Using one AI tool to annotate data to train another AI tool is sort of bootstrapping AI technology. It’s acute trick but may have only limited application. I could only think of only a few more applications of similar technology:

  • Use chest strap or EKG technology to annotate audio clips of heart beat sounds at a wrist or other appendage to train a system to accurately determine pulse rates through sound alone.
  • Use wave monitoring technology to annotate pictures and audio clips of sea waves to train a system to accurately determine wave levels for better tsunami detection.
  • Use image recognition to annotate pictures of food and then use this train a system to recognize food smells (if they ever find a way to record smells).

But there may be many others. Just further refinement of what they have used could lead to finer grained people detection. For example, as (facial) image recognition gets better, it’s possible to annotate speaking film clips to train a sound recognition system to identify people from just hearing their speech. Intelligence applications for such technology are significant.

Nonetheless, I for one am happy that the next reCAPTCHA won’t be having me identify river sounds in a matrix of 9 sound clips.

But I fear there’s enough GreyBeards on Storage podcast recordings and Storage Field Day video clips already available to train a system to identify Ray’s and for sure, Howard’s voice anywhere on the planet…


Photo Credit(s): Wave by Matthew Potter; Waves crashing on Puget Sound by mikeskatieDay 16: Podcasting by Laura Blankenship

The fragility of public cloud IT

I have been reading AntiFragile again (by Nassim Taleb). And although he would probably disagree with my use of his concepts, it appears to me that IT is becoming more fragile, not less.

For example, recent outages at major public cloud providers display increased fragility for IT. Yet these problems, although almost national in scope, seldom deter individual organizations from their migration to the cloud.

Tragedy of the cloud commons

The issues are somewhat similar to the tragedy of the commons. When more and more entities use a common pool of resources, occasionally that common pool can become degraded. But because no-one really owns the common resources no one has any incentive to improve the situation.

Now the public cloud, although certainly a common pool of resources, is also most assuredly owned by corporations. So it’s not a true tragedy of the commons problem. Public cloud corporations have a real incentive to improve their services.

However, the fragility of IT in general, the web, and other electronic/data services all increases as they become more and more reliant on public cloud, common infrastructure. And I would propose this general IT fragility is really not owned by any one person, corporation or organization, let alone the public cloud providers.

Pre-cloud was less fragile, post-cloud more so

In the old days of last century, pre-cloud, if a human screwed up a CLI command the worst they could happen was to take out a corporation’s data services. Nowadays, post-cloud, if a similar human screws up a CLI command, the worst that can happen is that major portions of the internet services of a nation go down.

Strange Clouds by michaelroper (cc) (from Flickr)

Yes, over time, public cloud services have become better at not causing outages, but they aren’t going away. And if anything, better public cloud services just encourages more corporations to use them for more data services, causing any subsequent cloud outage to be more impactful, not less

The Internet was originally designed by DARPA to be more resilient to failures, outages and nuclear attack. But by centralizing IT infrastructure onto public cloud common infrastructure, we are reversing the web’s inherent fault tolerance and causing IT to be more susceptible to failures.

What can be done?

There are certainly things that can be done to improve the situation and make IT less fragile in the short and long run:

  1. Use the cloud for non-essential or temporary data services, that don’t hurt a corporation, organization or nation when outages occur.
  2. Build in fault-tolerance, automatic switchover for public cloud data services to other regions/clouds.
  3. Physically partition public cloud infrastructure into more regions and physically separate infrastructure segments within regions, such that any one admin has limited control over an amount of public cloud infrastructure.
  4. Divide an organizations or nations data services across public cloud infrastructures, across as many regions and segments as possible.
  5. Create a National Public IT Safety Board, not unlike the one for transportation, that does a formal post-mortem of every public cloud outage, proposes fixes, and enforces fix compliance.

The National Public IT Safety Board

The National Transportation Safety Board (NTSB) has worked well for air transportation. It relies on the cooperation of multiple equipment vendors, airlines, countries and other parties. It performs formal post mortems on any air transportation failure. It also enforces any fixes in processes, procedures, training and any other activities on equipment vendors, maintenance services, pilots, airlines and other entities that can impact public air transport safety. At the moment, air transport is probably the safest form of transportation available, and much of this is due to the NTSB

We need something similar for public (cloud) IT services. Yes most public cloud companies are doing this sort of work themselves in isolation, but we have a pressing need to accelerate this process across cloud vendors to improve public IT reliability even faster.

The public cloud is here to stay and if anything will become more encompassing, running more and more of the worlds IT. And as IoT, AI and automation becomes more pervasive, data processes that support these services, which will, no doubt run in the cloud, can impact public safety. Just think of what would happen in the future if an outage occurred in a major cloud provider running the backend for self-guided car algorithms during rush hour.

If the public cloud is to remain (at this point almost inevitable) then the safety and continuous functioning of this infrastructure becomes a public concern. As such, having a National Public IT Safety Board seems like the only way to have some entity own IT’s increased fragility due to  public cloud infrastructure consolidation.


In the meantime, as corporations, government and other entities contemplate migrating data services to the cloud, they should consider the broader impact they are having on the reliability of public IT. When public cloud outages occur, all organizations suffer from the reduced public perception of IT service reliability.

Photo Credits: Fragile by Bart Everson; Fragile Planet by Dave Ginsberg; Strange Clouds by Michael Roper

Hardware vs. software innovation – round 4

We, the industry and I, have had a long running debate on whether hardware innovation still makes sense anymore (see my Hardware vs. software innovation – rounds 1, 2, & 3 posts).

The news within the last week or so is that Dell-EMC cancelled their multi-million$, DSSD project, which was a new hardware innovation intensive, Tier 0 flash storage solution, offering 10 million of IO/sec at 100µsec response times to a rack of servers.

DSSD required specialized hardware and software in the client or host server, specialized cabling between the client and the DSSD storage device and specialized hardware and flash storage in the storage device.

What ultimately did DSSD in, was the emergence of NVMe protocols, NVMe SSDs and RoCE (RDMA over Converged Ethernet) NICs.

Last weeks post on Excelero (see my 4.5M IO/sec@227µsec … post) was just one example of what can be done with such “commodity” hardware. We just finished a GreyBeardsOnStorage podcast (GreyBeards podcast with Zivan Ori, CEO & Co-founder, E8 storage) with E8 Storage which is yet another approach to using NVMe-RoCE “commodity” hardware and providing amazing performance.

Both Excelero and E8 Storage offer over 4 million IO/sec with ~120 to ~230µsec response times to multiple racks of servers. All this with off the shelf, commodity hardware and lots of software magic.

Lessons for future hardware innovation

What can be learned from the DSSD to NVMe(SSDs & protocol)-RoCE technological transition for future hardware innovation:

  1. Closely track all commodity hardware innovations, especially ones that offer similar functionality and/or performance to what you are doing with your hardware.
  2. Intensely focus any specialized hardware innovation to a small subset of functionality that gives you the most bang, most benefits at minimum cost and avoid unnecessary changes to other hardware.
  3. Speedup hardware design-validation-prototype-production cycle as much as possible to get your solution to the market faster and try to outrun and get ahead of commodity hardware innovation for as long as possible.
  4. When (and not if) commodity hardware innovation emerges that provides  similar functionality/performance, abandon your hardware approach as quick as possible and adopt commodity hardware.

Of all the above, I believe the main problem is hardware innovation cycle times. Yes, hardware innovation costs too much (not discussed above) but I believe that these costs are a concern only if the product doesn’t succeed in the market.

When a storage (or any systems) company can startup and in 18-24 months produce a competitive product with only software development and aggressive hardware sourcing/validation/testing, having specialized hardware innovation that takes 18 months to start and another 1-2 years to get to GA ready is way too long.

What’s the solution?

I think FPGA’s have to be a part of any solution to making hardware innovation faster. With FPGA’s hardware innovation can occur in days weeks rather than months to years. Yes ASICs cost much less but cycle time is THE problem from my perspective.

I’d like to think that ASIC development cycle times of design, validation, prototype and production could also be reduced. But I don’t see how. Maybe AI can help to reduce time for design-validation. But independent FABs can only speed the prototype and production phases for new ASICs, so much.

ASIC failures also happen on a regular basis. There’s got to be a way to more quickly fix ASIC and other hardware errors. Yes some hardware fixes can be done in software but occasionally the fix requires hardware changes. A quicker hardware fix approach should help.

Finally, there must be an expectation that commodity hardware will catch up eventually, especially if the market is large enough. So an eventual changeover to commodity hardware should be baked in, from the start.


In the end, project failures like this happen. Hardware innovation needs to learn from them and move on. I commend Dell-EMC for making the hard decision to kill the project.

There will be a next time for specialized hardware innovation and it will be better. There are just too many problems that remain in the storage (and systems) industry and a select few of these can only be solved with specialized hardware.


Picture credit(s): Gravestones by Sherry NelsonMotherboard 1 by Gareth Palidwor; Copy of a DSSD slide photo taken from EMC presentation by Author (c) Dell-EMC

4.5M IO/sec@227µsec 4KB Read on 100GBE with 24 NVMe cards #SFD12

At Storage Field Day 12 (SFD12) this week we talked with Excelero, which is a startup out of Israel. They support a software defined block storage for Linux.

Excelero depends on NVMe SSDs in servers (hyper converged or as a storage system), 100GBE and RDMA NICs. (At the time I wrote this post, videos from the presentation were not available, but the TFD team assures me they will be up on their website soon).

I know, yet another software defined storage startup.

Well yesterday they demoed a single storage system that generated 2.5 M IO/sec random 4KB random writes or 4.5 M IO/Sec random 4KB reads. I didn’t record the random write average response time but it was less than 350µsec and the random read average response time was 227µsec. They only did these 30 second test runs a couple of times, but the IO performance was staggering.

But they used lots of hardware, right?

No. The target storage system used during their demo consisted of:

  • 1-Supermicro 2028U-TN24RT+, a 2U dual socket server with up to 24 NVMe 2.5″ drive slots;
  • 2-2 x 100Gbs Mellanox ConnectX-5 100Gbs Ethernet (R[DMA]-NICs); and
  • 24-Intel 2.5″ 400GB NVMe SSDs.

They also had a Dell Z9100-ON Switch  supporting 32 X 100Gbs QSFP28 ports and I think they were using 4 hosts but all this was not part of the storage target system.

I don’t recall the CPU processor used on the target but it was a relatively lowend, cheap ($300 or so) dual core, Intel standard CPU. I think they said the total target hardware cost $13K or so.

I priced out an equivalent system. 24 400GB 2.5″ NVMe Intel 750 SSDs would cost around $7.8K (Newegg); the 2 Mellanox ConnectX-5 cards $4K (Neutron USA); and the SuperMicro plus an Intel Cpu around $1.5K. So the total system is close to the ~$13K.

But it burned out the target CPU, didn’t it?

During the 4.5M IO/sec random read benchmark, the storage target CPU was at 0.3% busy and the highest consuming process on the target CPU was the Linux “Top” command used to display the PS status.

Excelero claims that the storage target system consumes absolutely no CPU processing to service an 4K read or write IO request. All of IO processing is done by hardware (the R(DMA)-NICs, the NVMe drives and PCIe bus) which bypasses the storage target CPU altogether.

We didn’t look at the host cpu utilization but driving 4.5M IO/sec would take a high level of CPU power even if their client software did most of this via RDMA messaging magic.

How is this possible?

Their client software running in the Linux host is roughly equivalent to an iSCSI initiator but talks a special RDMA protocol (patent pending by Excelero, RDDA protocol) that adds an IO request to the NVMe device submission queue and then rings the doorbell on the target system device and the SSD then takes it off the queue and executes it. In addition to the submission queue IO request they preprogram the PCIe MSI interrupt request message to somehow program (?) the target system R-NIC to send the read data/write status data back to the client host.

So there’s really no target CPU processing for any NVMe message handling or interrupt processing, it’s all done by the client SW and is handled between the NVMe drive and the target and client R-NICs.

The result is that the data is sent back to the requesting host automatically from the drive to the target R-NIC over the target’s PCIe bus and then from the target system to the client system via RDMA across 100GBE and the R-NICS and then from the client R-NIC to the client IO memory data buffer over the client’s PCIe bus.

Writes are a bit simpler as the 4KB write data can be encapsulated into the submission queue command for the write operation that’s sent to the NVMe device and the write IO status is relatively small amount of data that needs to be sent back to the client.

NVMe optimized for 4KB IO

Of course the NVMe protocol is set up to transfer up to 4KB of data with a (write command) submission queue element. And the PCIe MSI interrupt return message can be programmed to (I think) write a command in the R-NIC to cause the data transfer back for a read command directly into the client’s memory using RDMA with no CPU activity whatsoever in either operation. As long as your IO request is less than 4KB, this all works fine.

There is some minor CPU processing on the target to configure a LUN and set up the client to target connection. They essentially only support replicated RAID 10 protection across the NVMe SSDs.

They also showed another demo which used the same drive both across the 100Gbs Ethernet network and in local mode or direct as a local NVMe storage. The response times shown for both local and remote were within  5µsec of each other. This means that the overhead for going over the Ethernet link rather than going local cost you an additional 5µsec of response time.

Disaggregated vs. aggregated configuration

In addition to their standalone (disaggregated) storage target solution they also showed an (aggregated) Linux based, hyper converged client-target configuration with a smaller number of NVMe drives in them. This could be used in configurations where VMs operated and both client and target Excelero software was running on the same hardware.

Simply amazing

The product has no advanced data services. no high availability, snapshots, erasure coding, dedupe, compression replication, thin provisioning, etc. advanced data services are all lacking. But if I can clone a LUN at lets say 2.5M IO/sec I can get by with no snapshotting. And with hardware that’s this cheap I’m not sure I care about thin provisioning, dedupe and compression.  Remote site replication is never going to happen at these speeds. Ok HA is an important consideration but I think they can make that happen and they do support RAID 10 (data mirroring) so data mirroring is there for an NVMe device failure.

But if you want 4.5M 4K random reads or 2.5M 4K random writes on <$15K of hardware and happen to be running Linux, I think they have a solution for you. They showed some volume provisioning software but I was too overwhelmed trying to make sense of their performance to notice.

Yes it really screams for 4KB IO. But that covers a lot of IO activity these days. And if you can do Millions of them a second splitting up bigger IOs into 4K should not be a problem.

As far as I could tell they are selling Excelero software as a standalone product and offering it to OEMs. They already have a few customers using Excelero’s standalone software and will be announcing  OEMs soon.

I really want one for my Mac office environment, although what I’d do with a millions of IO/sec is another question.


Ethereum enters the enterprise

Read an article the other day on NYT (Business Giants Announce Creation of … Ethereum).

In case you don’t know Ethereum is a open source, block chain solution that’s different than the software behind Bitcoin and IBM’s Hyperledger (for more on Hyperledger see our Blockchains at IBM post or our GreyBeardsOnStorage podcast with Donna Dillinger, IBM Fellow).

Blockchains are a software based, permanent ledger which can be used to record anything. Bitcoin, Ethereum and Hyperledger are all based on blockchains that provide similar digital information services with varying security, programability, consensus characteristics, etc.

Earth globe within a locked cageBlockchains represent an entirely new way of doing business in the digital world and have the potential to take over many financial services  and other contracting activities that are done today between organizations.

Blockchain services provide the decentralized recording of transactions into an immutable ledger.  The decentralized nature of blockchains makes it difficult (if not impossible) to game the system to record an invalid transaction.


Ethereum supports an Ethereum Virtual Machine (EVM) application which offers customers and users a more programmable blockchain. That is rather than just updating accounts with monetary transactions like Bitcoin does, one can implement specialized transaction processing for updating the immutable ledger. It’s this programability that allows for the creation of “smart contracts” which can be programmatically verified and executed.

MinerEthereum miner nodes are responsible for validating transactions and the state transition(s) that update the ledger. Transactions are grouped in blocks by miners.

Miners are responsible for validating the transaction block and performing a hard mathematical computation or proof of work (PoW) which goes along used to validate the block of transactions. Once the PoW computation is complete, the block is packaged up and the miner node updates its database (ledger) and communicates its result to all the other nodes on the network which updates their transaction ledgers as well. This constitutes one state transition of the Ethereum ledger.

Miners that validate Ethereum transactions get paid in Ethers, which are a form of currency throughout the Ethereum ecosystem.

Blockchain consensus

Ethereum ledger consensus is based on the miner nodes executing the PoW algorithm properly. The current Ethereal PoW algorithm is Ethash, which is an “ASIC resistant” algorithm. What this means is that standard GPUs and (less so) CPUs are already very well optimized to perform this algorithm and any potential ASIC designer, if they could do better, would make more money selling their design to GPU and CPU designers, than trying to game the system.

One problem with Bitcoin is that its PoW is more ASIC friendly, which has led some organizations to developing special purpose ASICs in an attempt to dominate Bitcoin mining. If they can dominate Bitcoin mining, this can  be used to game the Bitcoin consensus system and potentially implement invalid transactions.

Ethereum Accounts

Ethereum has two types of accounts:

  • Contract accounts that are controlled by the EVM application code, or
  • Externally owned accounts (EOA) that are controlled by a set of private keys and represent external agents (miner nodes, people, transaction generating entities)

Contract accounts really are code and data which constitute the EVM bytecode (application). Contract account bytecode is also stored on the Ethereum ledger (when deployed?) and are associated with an EOA that initiates the Contract account.

Contract functionality is written in Solidity, Serpent, Lisp Like Language (LLL) or other languages that can be compiled into EVM bytecode. Smart contracts use Ethereum Contract accounts to validate and execute contract actions.

Ethereum gas pricing

As EVMs contract accounts can consume arbitrary amounts of computation, bandwidth and storage to process transactions,   Ethereum uses a concept called “gas” to pay for their resource consumption.

When a contract account transaction is initiated, it identifies a gas price (in Ethers) and a maximum gas amount that it is willing to consume to process the transaction.

When a contract transaction takes place:

  • If the maximum gas amount is less than what the transaction consumes, then the transaction is executed and is applied to the ledger. Any left over or remaining gas Ethers is credited back to the EOA.
  • If the maximum gas amount is not enough to execute the transaction, then the transaction fails and no update occurs.

Enterprise Ethereum Alliance

What’s new to Ethereum is that Accenture, Bank of New York Mellon, BP, CreditSuisse, Intel, Microsoft, JP Morgan, UBS and many others have joined together to form the Enterprise Ethereum Alliance. The alliance intends to work to create a standard version of the Ethereum software that enterprise companies can use to manage smart contracts.

Microsoft has had a Azure Blockchain-as-a-Service online since 2015.  This was based on an earlier version of Ethereum called Project Bletchley.

Ethereum seems to be an alternative to IBM Hyperledger, which offers another enterprise class block chain for smart contracts. As enterprise class blockchains look like they will transform the way companies do business in the future, having multiple enterprise class blockchain solutions seems smart to many companies.


Photo Credit(s): Miner by Mark Callahan; Gas prices by; File: Ether pharmecie.jpg by Wikimedia


A college course on identifying BS

Read an article the other day from Recode (These University of Washington professors teaching a course on Calling BS) that seems very timely. The syllabus is online (Calling Bullshit — Syllabus) and it looks like a great start on identifying falsehood wherever it can be found.

In the beginning, what’s BS?

The course syllabus starts out referencing Brandolini’s Bullshit Asymmetry Principal (Law): the amount of energy needed to refute BS is an order of magnitude bigger than to produce it.

Then it goes into a rather lengthy definition of BS from Harry Frankfort’s 1986 On Bullshit article. In sum, it starts out reviewing a previous author’s discussions on Humbug and ends up at the OED. Suffice it to say Frankfurt’s description of BS runs the gamut from: Deceptive misrepresentation to short of lying.

They course syllabus goes on to reference two lengthy discussions/comments on Frankfurt’s seminal On Bullshit article, but both Cohen’s response, Deeper into BS and Eubank & Schaeffer’s A kind word for BS: …  are focused more on academic research rather than everyday life and news.

How to mathematically test for BS

The course then goes into mathematical tests for BS that range from Fermi’s questions, the Grim Test and Benford’s 1936 Law of Anomalous Numbers. These tests are all ways of looking at data and numbers and estimating whether they are bogus or not. Benford’s paper/book talks about how the first page of logarithms is always more used than others because numbers that start with 1 are more frequent than any other number.

How rumors propagate

The next section of the course (week 4) talks about the natural ecology of BS.

Here there’s reference to an article by Friggeri, et al, on Rumor Cascades, which discusses the frequency with which patently both true, false and partially true/partially false rumors are “shared” on social media (Facebook).

The professors look at a website called which evaluates the veracity of publishes rumors uses this to classify the veracity of rumors. Next they examine how these rumors are shared over time on Facebook.

Summarizing their research, both false and true rumors propagate sporadically on Facebook. But even verified false or mixed true/mixed false rumors (identified by continue to propagate on Facebook. This seems to indicate that rumor sharers are ignoring the rumor’s truthfulness or are just unaware of the assessment of the rumor.

Other topics on calling BS

The course syllabus goes on to causality (correlation is not causation, a common misconception used in BS), statistical traps and trickery (used to create BS), data visualization (which can be used to hide BS), big data (GiGo leads to BS), publication bias (e.g., most published research presents positive results, where’s all the negative results research…), predatory publishing and scientific misconduct (organizations that work to create BS for others), the ethics of calling BS (the line between criticism and harassment), fake news and refuting BS.

Fake news

The section on Fake News is very interesting. They reference an article in the NYT, The Agency about how a group in Russia have been reaping havoc across the internet with fake news and bogus news sites.

But there’s more another article on NYT website, Inside a fake news sausage factory, details how multiple websites started publishing bogus news and then used advertisement revenue to tell them which bogus news generated more ad revenue – apparently there’s money to be made in advertising fake news. (Sigh, probably explains why I can’t seem to get any sponsors for my websites…).

Improving the course

How to improve their course? I’d certainly take a look at what Facebook and others are doing to identify BS/fake news and see if these are working effectively.

Another area to add might be a historical review of fake rumors, news or information. This is not a new phenomenon. It’s been going on since time began.

In addition, there’s little discussion of the consequences of BS on life, politics, war, etc. The world has been irrevocably changed in the past  on account of false information. Knowing how bad this has been this might lend some urgency to studying how to better identify BS.

There’s a lot of focus on Academia in the course and although this is no doubt needed, most people need to understand whether the news they see every day is fake or not. Focusing more on this would be worthwhile.


I admire the University of Washington professors putting this course together. It’s really something that everyone needs to understand  nowadays.

They say the lectures will be recorded and published online – good for them. Also, the current course syllabus is for a one credit hour course but they would like to expand it to a three to four credit hour course – another great idea


Photo credit(s): The Donation of ConstantineNew York World – Remember the Maine, Public Domain; Benjamin Franklin’s Bag of Scalps letter;  fake-news-rides-sociales by Portal GDA