Three days after video emerged of the execution of journalist James Foley, the top human rights official at the United Nations happened to provide a grim numerical context to his murder. Citing the precise and shocking figure of 191,369 known killings over three years, High Commissioner for Human Rights Navi Pillay condemned the Security Council's inaction in unusually strong language.
When I learned that this figure was based on a report by the Human Rights Data Group, I suspected there was even more to the story. HRDAG staff and partners work in the wake of human rights catastrophes around the world, extracting data from forgotten archives, door-to-door surveys, cemetery censuses, and registers of the dead. They then build statistical models that can reveal hidden violence and the patterns, or even perpetrators, that lie behind it.
I asked HRDAG's director, Patrick Ball, if he would explain the Syria figure, and what might be learned from it. In a video call from his home office in the Bay Area, Patrick provided an engaging and discursive explanation.
It turns out that after decades of authoritarianism and years of intense violence, Syria is rich in one respect: the number of documentation groups. Four of them (the Syrian Center for Statistics and Research; the Syrian Network for Human Rights; the Violations Documentation Centre; and the Syrian Observatory for Human Rights) shared with HRDAG some 319,000 recorded deaths painstakingly collected from witnesses and relatives, at great personal risk. HRDAG combined the lists and eliminated incomplete or duplicate records. (Fivethirtyeight.com has a great explanation of the mix of machine learning and human effort behind this process).
But as important as the effort to compile all known deaths has been, it is actually just a first step: Patrick reminded me that the "challenge lies in figuring out what is going on that you can't observe."
Revealing the Uncounted
For now, the UN limits its reporting to enumeration of the known dead, while warning it is probably far below the real figure. But HRDAG continues to work with the data, using models it developed in other conflicts. Patrick explained the process of estimating the full scope of violence by asking me to imagine two pitch-black rooms of unknown sizes. Into each you throw a set of rubber balls that emit a sound when they collide. In the first room you hear them hit each other rapidly: cthock! cthock! cthock! In the second room, you hear only an occasional cthock as the balls mostly sail past each other in the dark. Which room is bigger?
Now consider the room as all the killings in Aleppo in one month. There are multiple, incomplete lists from Syrian groups for that period: "the balls are the databases, and we observe when they collide or don't collide. That tells us the size of the room." (Read about this technique here).
They have only begun modeling, but Patrick thinks the total may prove to be twice the number of documented deaths. For every one of the 191,369 confirmed deaths, there may be another victim known only to loved ones, or to no one at all.
Why Does It Matter?
To Patrick, the total number is not what is important. But filling out the picture addresses the problem of selection bias, and that is important: "We are pretty sure that the killings we know about are different from the ones we don't know about."
For one, large-scale killings have many "rememberers" who saw the violence or its aftermath, heard about it, or lost family members. However, when one man is taken from his house in the night, or killed while searching for food and fuel, there are few witnesses, and they may fear retaliation. A bias emerges in the data towards large events. In addition, researchers may have contacts mainly in one religious group or another, or may not move around freely. During surging violence, the reported number of killings often drops: it's simply too dangerous for enumerators to work. For all these reasons, apparent patterns in the raw data can be deeply misleading.
But if you can accurately estimate the true size of the room, and of the slices of data broken down by time, space, or type of victim, then you can identify patterns. Did killings rise or fall after a town changed hands, or when a military unit was deployed? Was one group targeted?
The ability to answer these questions informed Patrick's testimony in several groundbreaking trials. In The Hague he helped dismantle Slobodan Milošević's defense that the Serb forces were not to blame for the violence. Patrick was also an expert witness in the first genocide prosecution of a former head of state, demonstrating that under General Rios Montt the indigenous population of Guatemala was deliberately targeted.
The Syrian conflict has no end in sight. As the UN High Commissioner, noted "the Security Council has failed to refer the case of Syria to the International Criminal Court, where it clearly belongs." However, if parties to the conflict ever face justice, the Syrian documentation effort and the Human Rights Data Group will deserve some credit.
A Final Word of Caution
But what about big data and the mobile revolution? Won't the profusion of mobile video and citizen journalists, from Aleppo to Ferguson, fill in the gaps and end selection bias?
Patrick shook his head: In fact, technology may make the bias problem worse, because it "shows you what you already were seeing anyway." The record may be more complete than ever for a major incident, but "when someone is executed in the night and buried under a trash heap we still don't know about it." And even as the selection bias is increasing, it becomes harder to see: we have the "appearance of perfect knowledge, when in fact the shape of that knowledge has not changed that much."
So what is the answer? "Better models," Patrick shrugged. "Technology is not a substitute for science."
When I learned that this figure was based on a report by the Human Rights Data Group, I suspected there was even more to the story. HRDAG staff and partners work in the wake of human rights catastrophes around the world, extracting data from forgotten archives, door-to-door surveys, cemetery censuses, and registers of the dead. They then build statistical models that can reveal hidden violence and the patterns, or even perpetrators, that lie behind it.
I asked HRDAG's director, Patrick Ball, if he would explain the Syria figure, and what might be learned from it. In a video call from his home office in the Bay Area, Patrick provided an engaging and discursive explanation.
It turns out that after decades of authoritarianism and years of intense violence, Syria is rich in one respect: the number of documentation groups. Four of them (the Syrian Center for Statistics and Research; the Syrian Network for Human Rights; the Violations Documentation Centre; and the Syrian Observatory for Human Rights) shared with HRDAG some 319,000 recorded deaths painstakingly collected from witnesses and relatives, at great personal risk. HRDAG combined the lists and eliminated incomplete or duplicate records. (Fivethirtyeight.com has a great explanation of the mix of machine learning and human effort behind this process).
But as important as the effort to compile all known deaths has been, it is actually just a first step: Patrick reminded me that the "challenge lies in figuring out what is going on that you can't observe."
Revealing the Uncounted
For now, the UN limits its reporting to enumeration of the known dead, while warning it is probably far below the real figure. But HRDAG continues to work with the data, using models it developed in other conflicts. Patrick explained the process of estimating the full scope of violence by asking me to imagine two pitch-black rooms of unknown sizes. Into each you throw a set of rubber balls that emit a sound when they collide. In the first room you hear them hit each other rapidly: cthock! cthock! cthock! In the second room, you hear only an occasional cthock as the balls mostly sail past each other in the dark. Which room is bigger?
Now consider the room as all the killings in Aleppo in one month. There are multiple, incomplete lists from Syrian groups for that period: "the balls are the databases, and we observe when they collide or don't collide. That tells us the size of the room." (Read about this technique here).
They have only begun modeling, but Patrick thinks the total may prove to be twice the number of documented deaths. For every one of the 191,369 confirmed deaths, there may be another victim known only to loved ones, or to no one at all.
Why Does It Matter?
To Patrick, the total number is not what is important. But filling out the picture addresses the problem of selection bias, and that is important: "We are pretty sure that the killings we know about are different from the ones we don't know about."
For one, large-scale killings have many "rememberers" who saw the violence or its aftermath, heard about it, or lost family members. However, when one man is taken from his house in the night, or killed while searching for food and fuel, there are few witnesses, and they may fear retaliation. A bias emerges in the data towards large events. In addition, researchers may have contacts mainly in one religious group or another, or may not move around freely. During surging violence, the reported number of killings often drops: it's simply too dangerous for enumerators to work. For all these reasons, apparent patterns in the raw data can be deeply misleading.
But if you can accurately estimate the true size of the room, and of the slices of data broken down by time, space, or type of victim, then you can identify patterns. Did killings rise or fall after a town changed hands, or when a military unit was deployed? Was one group targeted?
The ability to answer these questions informed Patrick's testimony in several groundbreaking trials. In The Hague he helped dismantle Slobodan Milošević's defense that the Serb forces were not to blame for the violence. Patrick was also an expert witness in the first genocide prosecution of a former head of state, demonstrating that under General Rios Montt the indigenous population of Guatemala was deliberately targeted.
The Syrian conflict has no end in sight. As the UN High Commissioner, noted "the Security Council has failed to refer the case of Syria to the International Criminal Court, where it clearly belongs." However, if parties to the conflict ever face justice, the Syrian documentation effort and the Human Rights Data Group will deserve some credit.
A Final Word of Caution
But what about big data and the mobile revolution? Won't the profusion of mobile video and citizen journalists, from Aleppo to Ferguson, fill in the gaps and end selection bias?
Patrick shook his head: In fact, technology may make the bias problem worse, because it "shows you what you already were seeing anyway." The record may be more complete than ever for a major incident, but "when someone is executed in the night and buried under a trash heap we still don't know about it." And even as the selection bias is increasing, it becomes harder to see: we have the "appearance of perfect knowledge, when in fact the shape of that knowledge has not changed that much."
So what is the answer? "Better models," Patrick shrugged. "Technology is not a substitute for science."