Kamu’s Open Data Fabric network brings the power of enterprise data pipelines into global trustless environment:
On-and off-chain data
Verifiably trustworthy data
Any storage from IPFS to S3
Open protocol & code
Streams, not Files
While many solutions continue to treat data as books in the library, we see that most information flows in the world are dynamic and the value of data drops sharply with any delay.
So we made Kamu work seamlessly with both static and near real-time data to achieve minimal time from data to impact.
Supply Chains, not Silos
Efficient and fair exchange of data between organizationsis the foundation of the new-generation data economy.
Whether your company wants to monetize its data, or enrich internal data to make it exponentially more valuable - Kamu can help you become part of a global data supply chain.
Kamu is designed for trustless environment and resistant to malicious actors.
No matter how many hands data went through - you can easily see where data originated from, who transformed it and how, can verify data and hold all parties accountable.
All data manipulations in Kamu are done through code - a novel stream processing SQL.
Pipelines are easy to audit and work highly autonomously at near real-time speeds. Write a query once - and it will continue to produce valuable results forever, requiring near-zero maintenance.
Kamu achieves for data what Version Control Systems did for Software , but does so without diffs, versioning, or snapshotting.
Our new paradigm streamlines collaboration on data within your company, and enables the effect similar to Open Source Software revolution for data globally.
How it works
We turn data into a ledger
Data preserves complete history, and never updated destructively. Trust is anchored at the publisher, so they can be always held accountable for data they provide.
Datasets are registered on the network
As a publisher you don’t need to move data into any central point. You maintain complete ownership and control.
People process data using special SQL code
Our decentralized ETL pipelines can span across teams, organizations, and even continents. People can collaborate on cleaning and enriching data and confidently reuse data from any step.
Data flows in near real-time
Our streaming SQL engines process data within seconds, continuously and autonomously. All of your science projects, dashboards, and automation get the fidelity of stock tickers data.
Accountability, verifiability, and provenance built-in
Our SQL has the properties of Smart Contracts, so you can trace every single data cell to its source, and easily tell who processed it and how.
Ready to see it in action?
Watch a demo or try it yourselfGet started
While the tagline of almost every enterprise data product is "breaking down silos", none of them addresses problems of cross-organizational data exchange. At best, they give you just a bigger silo - the size of your company.
Efficient and fair exchange of information across company
boundaries is the foundation of digital economy,
- It relies on cumbersome bi-lateral contracts
- Most critical data is still exchanged as CSV and Excel files via email
- JSON APIs remain primary data exchange "workhorse" while barely offering any interoperability and requiring significant infrastructure investments on both the publisher and consumer sides.
Kamu is designed from ground up to solve unique challenges around data exchange in a global trustless environment:
- It makes the power of enterprise data technologies accessible to even smallest organization
- Seamlessly integrates with internal data pipelines (both IN & OUT)
- Allows you to share data in a privacy-preserving way
- Enables new data monetization schemes
- Keeps all parties accountable for data they provide
- Resolves the dreaded copy problem by ensuring that data acquired illegally is detached from the verifiable provenance trail and becomes worthless.
While governments worldwide continue to invest in Open Data, the implementations of data sharing remain highly fragmented, uncoordinated, and wasteful.
Publishing data correctly requires significant technical expertese that even most large organizations lack. Underfunded municipalities that approach data sharing as another "website job" unfortunately end up both overpaying for development and maintenance and fail to realize even a fraction of their data's true potential.
- Publishing data can be 50-100x cheaper than with leading "open government" vendors and affordable to even the smallest organizations
- You maintain full ownership of data and control the access
- You automatically comply with all best publishing practices and all data is made available via modern APIs
- You can see who and how uses your data and your place in a global data supply chain
- Your data is available for analytics alongside many other organizations like yours, as if it was in a single database.
The ability to compare performance indicators across organizations is highlighted by McKinsey as the biggest value driver of open government data.
Science & Research Data
Scientific progress today is undermined by severe reproducibility & verifiability crisis, and while tools for reproducible analysis and ML are becoming more widespread, they are of little use if you cannot access or verify the source data.
Multiple incidents where poor data verification standards allowed invalid or outright malicious research to be published are shifting the tide towards considering all research unreliable until proven otherwise.
Existing RDM portals and even the new DeSci solutions unfortunately address this by snapshotting and archiving data, which:
- Results in hundreds of millions of poorly systematized datasets
- Bases all research on stale and outdated data
- Creates lineage and provenance trail that is nearly impossible to verify
We believe that wold cannot continue to treat data as books in the library. Most valuable data is dynamic and flows continuously (weather, health records, etc.), and the value of insights diminishes rapidly with any delay. So we designed Kamu to bring reproducibility and verifiability to even near real-time data.
Instead of bundling your research with some CSVs of unknown provenance, Kamu lets you link it to a decentralized data supply chains extending directly to governments, universities, hospitals, where all data is cleaned and enriched by the community and is verifiably trustworthy. Think Open Source Software revolution, but now in data!
IoT & Smart Cities
Cars and ferries, traffic lights and parking meters, weather and hydrology stations - we are whitnessing a rapid growth of data sources and await the upcoming transformation of the world into a data-centric economy ... but where will all this data be stored? And who will pay for it?
While our data sources are becoming more and more decentralized, our data processing capabilities all revolve around centralized solutions - data can be analyzed only when brought into a single warehouse or a data lake. But it is safe to say that there will never be a "World Data Warehouse", and all that attempts to group data that "belongs together" are futile.
Kamu is the world's first planet-scale data pipeline that offers decentralized and federated storage AND processing of data.
- Supports IoT data volumes and near real-time latencies
- Friendly to micro-publishers - up to individual device granularity
- Decouples data ownership from storage and compute infrastructure, allowing you to freely chose between vendors without having your data held hostage
- Is infinitely composable, to allow combining many sources into larger units
- Provides novel monetization mechanisms to ensure publishers and data processors are fairly compensated.
FinTech & InsurTech
More efficient use of data is already revolutionizing these industries, leaving those who failed to transform their organizations around new data strategies far behind. But access to reliable and up-to-date information remains an unsolved problem.
Opening a line of credit or designing a new insurance policy requires data from hundreds of external sources, but the current state of global data sharing that still relies on cumbersome bilateral contracts makes data acquisition an unsurmountable task for most companies. Even the "hottest" FinTech and InsurTech startups have to rely on a group of antiquated actuaries and data aggregators that provide outdated and unverifiable information for which they cannot even be held accountable.
What if ...
- We could bring the velocity of stock tickers to every data source out there?
- All middlemen could be replaced by transparent and autonomous data processing pipelines?
- Your decision-making was powered by constantly up-to-date data from these global supply chains?
- Audits and compliance could be fully automated?
Kamu's decentralized data technology brings verifiability and auditability properties to data that surpass those of banks while ensuring data flows rapidly between all parties.
It's widely acknowledged that the inadequate state of the global data infrastructure was the top contributing factor in our poor response to the COVID-19 pandemic. Even in developed countries it took nearly 8 months to establish basic data flows such as daily case counts , , and while the crisis created a significant uptake in projects that use data to improve transparency and efficiency of healthcare systems - a vast majority of them stopped functioning after less than a year, as urgency and funding went down.
It shouldn't take a global pandemic for us to create such data flows, nor take a heroic effort of thousands to maintain them.
Kamu's technology addresses key root problems of healthcare data flows:
- Publishing data should not involve giving up ownership or compromising privacy - Kamu is a decentralized network and a foundation for multiple privacy-preserving analysis techniques
- Data analysis should not require heavy infrastructure investments - Kamu makes the latest enterprise-grade technologies accessible to any researcher at no cost
- Creation of new data flows should not require establishing new legal entities for the sole purpose of giving results credibility - analysis in Kamu can be performed by any individual in the world and the research community can then establish the validity of results purely through ease of audit and verification of computations
- Once created, data flows in Kamu require nearly no maintenance - they will continue to bring long-lasting value after a project is over and researchers moved on to a different task.
Blockchain technologies, for the first time in history, enabled complex interactions between people who don't know or trust one another. They've done so by guaranteeing verifiability and accountability. Unfortunately, these mechanisms only work for on-chain assets like currencies and tokens. We so far failed to extend them to data - the blood of the digital economy.
On one hand, Oracle Networks force consumers to choose between high fees and unreliable data, as their attempts to establish trust anchors at an unnatural place - during the API call to a data source - have a lot of overhead.On another hand, providing data to blockchains requires significant upfront investments - potential publisher needs to turn their data into a product, and develop and maintain API infrastructure - such a high bar is beyond reach for a vast majority of organizations that have useful data.
As a result, the Web3 data space is currently dominated by large companies that exercise unhealthy control over data that fuels the Smart Contract ecosystem. The system favors data monopolies and prevents small publishers from entering the space - a state that is very anti-Web3.
Kamu creates a natural bridge between external data and blockchains:
- It anchors trust in data on the publisher side, ensuring full accountability for data they provide
- This enables verifiable data transfer and computations so that publishers can delegate operating the infrastructure to other parties without man-in-the-middle concerns
- Global community can then collaborate on data cleaning and enrichment
- Intrinsic verifiability of data simplifies the Oracle Problem and lowers data access fees
All this encourages micro-publishers to enter the space and improves the diversity of data sources, eliminating monopolies and ensuring greater trustworthiness of data in the system as a whole.
The name "kaˈmju" is a reference to Albert Camus' essay "The Myth of Sisyphus". Just like Sisyphus, who was punished by gods to roll an immense boulder up the hill only to watch it repeatedly roll down, we think data science is stuck in a loop where all forward progress is constantly lost because no mechanism exists for making incremental improvements to data.
Kamu is a passion project of engineers and data scientists who spent most of their careers pushing the boulder up, but now ready to break the cycle.Read more