Skip to Content

CKAN: OUR FIRST LOW-HANGING FRUIT IN DATA SPACES

NOT REALLY, I MEAN, NOT SO LOW-HANGING AFTER ALL - DATA SPACE / 3
September 29, 2025 by
CKAN: OUR FIRST LOW-HANGING FRUIT IN DATA SPACES
Jure Lampe

If you want to start playing with dataspace preparation in the FIWARE world, you need something easy enough to not scare your team (or yourself) away, but still useful. For us, that “low-hanging fruit” was CKAN. It’s been around forever, every open data portal you ever cursed at probably ran on it, and it fits perfectly into our “let’s not reinvent the wheel (yet)” strategy.

CKAN IN A NUTSHELL

CKAN is basically a warehouse for datasets. Think of it as Excel on steroids, but instead of one overgrown spreadsheet, it’s hundreds (or thousands) of neatly packaged datasets with metadata, tags, and APIs. It’s the tool cities, governments, and even some corporations use to publish data in a way that at least looks semi-organized. Why is it important? Because every dataspace needs a front door, and CKAN is still one of the most widely used ones.

WHAT DATA WE ARE TALKING ABOUT?

We’re not picky (yet). Our target sources include:

  • Open Data (the usual suspects: cities, governments, NGOs)
  • Sensor Data (yes, the real stuff that comes in too fast and in too many formats)
  • Broker Data (from Orion, Scorpio, Stellio, MQTT - you name it)
  • Simulators (sometimes you just fake it till you make it)

Basically, if it produces JSON, CSV, XML, anything we can harass with a harvester, or it goes to the broker and we can publish its metadata, we’ll take it.

FORMATS GALORE

CKAN is wonderfully agnostic. You’ll see formats like:

  • CSV, XLSX (the bread and butter).
  • JSON, JSON-LD (our personal favorite).
  • XML (hello, 1999).
  • GeoJSON, Shapefiles (because maps make people happy).
  • ARCGis (a lot of cities use this format).
  • PDFs (yes, some people think a PDF is a dataset…).

We take them all. And then cry about them later.

HARVESTERS: OUR NEW FRENEMIES

Harvesters are CKAN’s way of pulling data from other portals. In theory, it’s brilliant: set it, forget it, and let the data flow in. In practice, well… let’s just say our server’s disk filled up in a few days. No warnings. No controls. Just harvesters happily doing their thing until the lights went out.

WHAT DID WE HARVEST?

You can check our test playground at datasets.dataspace.fiwarebox.com. We pulled (with harvesting, broker connection or API) from:

  • Seattle (because why not?) and some other USA cities, counties, and states (like Washington).
  • A few European cities (they love CKAN).
  • Some international catalogs we probably shouldn’t have touched (oops).
  • Some real-time sensor data (from our internal sensor on FIWAREBox Scorpio context broker like Shelly).
  • Some real-time simulated data (like electricity meters, solar generators, trackers on FIWAREBox Scorpio context broker).
  • Some near real-time metering data from our partners (like Brunata water meters).

Why so diversified portfolio? Because we can and to show you, what is possible. It’s a bit of a mix: some real gems, some empty shells, and some “datasets” that are literally just CSV. The wild west of data. All go to CKAN with metadata, some with static data, some with connectors to the broker, database and similar.

THE DOWNSIDES (SPOILER: THERE ARE MANY)

  • Harvesters: no control panel, no limits, no mercy. To be honest, there are limits you can set up, but they simply didn't work for us.
  • There are no info in the datasets list about updated time. You need to dig deeper in a specific list to get it. So, basically, you cannot be sure, if data is updated as it should or not. Ofcourse, there are logs and similar, I am talking about end-user, no-technical perspective.
  • Search: it’s basically “good luck, type something and pray.”
  • User-friendliness: non-existent it could be better - this is not grandma’s portal.
  • Metadata quality: let’s say… inconsistent, but it is not CKAN to blame.
  • Our disk space: RIP.

OUR TAKE (AND OUR PAIN)

We still like CKAN. It’s imperfect, outdated in parts, but it’s the only thing we found that actually works today at scale. The problems we hit are exactly why we’re working on DataGrid - our Extended Metadata Browser (EMB) and Extended Data Browser (EDB). Think CKAN++ with:

  • Proper filtering, sorting, grouping, column selecting - metadata + data (JSON, CSV).
  • Profiles for storing & retrieving our views.
  • SQL query generation.
  • Metadata + data browsing in one place (Master / Detail view).
  • Controls so harvesters don’t eat your infrastructure alive.
  • Integrated into CKAN portal.

DATAGRID PREVIEW 

Extended Metadata Browser (EMB)
Extended Metadata Browser (EMB)

CONCLUSION

CKAN gave us a bruised ego, full disks, and a few laughs. But it’s still our gateway into dataspace building. The next step? Making it usable for normal humans (and ourselves) with DataGrid. Stay tuned, because the CKAN saga has just begun.

This is becoming our Data Catalog Solution: 

FIWAREBox Solution: Data Catalog-As-A-Service (DCAAS).

Is this a Data Space? Heck, no! It is just a "Starter pack" with CKAN, UI for Harvester and metadata control, KeyCloak integration for OAuth, DCAT-AP,  Storage, APIs, Monitoring & Alerting. As a Service. To make your life easier.

For full blown Data space with real-time data, Smart Data Models, and EU-compliant governance, all in one platform you need this:

FIWAREBox Solution: Data Space-As-A-Service (DSAAS).



WHAT DOES IT TAKE TO BE A REAL DATA SPACE?

DATA SPACE / 2

TO BE CONTINUED...
CKAN: OUR FIRST LOW-HANGING FRUIT IN DATA SPACES
Jure Lampe September 29, 2025
Share this post
Tags