Analysis: Why is on-chain data analysis prone to misunderstanding?
Original link: https://twitter.com/tmel0211/status/1601068872710656000
Is the Dragonfly shipment really cutting leeks? What is the reason for the frequent transfer of funds by Amber executives? Someone has made FUD loosely many times recently. In fact, first, many labels on the chain are wrong; second, it is difficult to guess the intention behind it simply by looking at the data on the chain, and it is even possible that off-chain/exchanges are hedging at the same time. How to objectively understand the data on the chain is also a challenge for investors.
After the FTX collapse, I found that everyone has become sensitive to the on-chain data , such as the change of the address funds on the project party chain (going to run away?), the assets of large households flowed into the exchange (crush the market and ship?), and the change of institutional address funds (collapse?) are regarded as BreakNews to FUD. However, the data on the chain can only reflect the objective dynamics on the chain, which cannot 100% match the human motivation off the chain.
Post a few FUD facts first: 1) Dragonfly transferred to Binance PERP to be shipped, buying at high points and selling at low points is interpreted as the organization being cut leeks, but no one can prove that Dragonfly’s buying price is synchronized with the secondary market; 2 )Amber co-founder TTK received 5000ETH from the company, and was accused of enriching private pockets, but in fact there may be errors in the address label. There are a lot of similar news every day. Can on-chain data really prove the fact of FUD?
In fact, every Transfer Event of data on the chain exists objectively, but because the address tags are all off-chain attributes, real address tags such as exchanges are not fully disclosed. Therefore, the off-chain Entity tags may not be 100% accurate. To find out it requires the use of algorithmic deduction, offline verification and other social engineering, which can only be close to the facts, but it is difficult to truly equate them.
Regarding Address labels, major browsers and data service platforms have collected hundreds of millions of Entity address labels based on the technical principles of Common Spending and One-Time-Change of UTXO characteristics. In addition, exchange addresses, mining pool addresses, Mixer money laundering addresses, gambling addresses, etc. all have different business characteristics, which can be modeled and screened differently.
What is Common Spending: Simply put, if a (BTC) transaction has multiple input addresses at the same time, then it can be determined that these input addresses are controlled by the same Entity. We can continue to radiate by extracting the recharge seed addresses of exchange users, trace back through the upper and lower layers to mine more associated new addresses, and then classify cold and hot wallets based on the interaction between addresses, and then hoard more and more labels.
In any case, these tags are calculated by a third party through technical means. If the exchange frequently changes the seed address, or deliberately uses a currency mixer to cover up the address, the existing address tags will also become invalid. Common Spending deduces that the accuracy of the address label depends on the number of Entity’s seed addresses and its replacement frequency, which can be used for anti-reconnaissance. It’s just because the exchanges also have to engage in compliance in various places, so there is no need to do so.
Moreover, the business scenario of the exchange is very complicated, and it is difficult to guess the intention behind it simply by looking at the data on the chain:
1) Large-value transfers to the hot and cold wallets of the exchange may be collected and organized by the wallets;
2) Large transfers between Entities on exchanges may only be withdrawn by large accounts;
3) The exchange address funds flow into an unknown address, which may be the withdrawal or the wallet arrangement of the same subject; these dynamics can assist in the analysis, but it is not rigorous enough to infer human reasons such as running away and crushing the market.
In fact, the original intention of security and data companies to collect address tags was mainly to serve the AML asset tracking business, to assist the police in the first time when hackers launder money, and to conduct technical evidence collection so that the stolen money can be effectively blocked. Since hackers mostly use clean new addresses, and hackers need to rely on the transaction depth of CEX for money laundering, data changes on the chain have the greatest early warning significance in asset tracking business scenarios.
Now many alert robots on the chain send all kinds of large transfers and transfer transactions involving Entity every day. If everyone can capture and respond in time, it will avoid some risks. However, it does not rule out that individual data is misinterpreted or expanded, and the impact of market Fomo and runs will also affect everyone’s assets.