data & society
– – – –
danah boyd‘s new research institute:
The Data & Society Research Institute is a new think/do tank in New York City dedicated to addressing social, technical, ethical, legal, and policy issues that are emerging because of data-centric technological development. Data & Society will launch in 2014.
In under six weeks, our amazing team produced six guiding documents and crafted a phenomenal event called The Social, Cultural & Ethical Dimensions of “Big Data.” On our conference page, you can find an event summary, videos of the sessions, copies of the workshop primers and discussion notes, a zip file of important references, and documents that list participants, the schedule, and production team.
d&s reporting on data and civil rights conference.. oct 2013:
Today, we’re releasing all of the pre-conference primers, write-ups from the workshops and breakouts we held, the videos from the level-setting opening, and an executive summary of what we learned: http://www.datacivilrights.org.
from executive summary:
The event had three main narratives: (1) the roots and contemporary state of civil rights issues, which centered primarily on discrimination on the basis of protected classes, and issues of privacy; (2) the inner workings of the technology and how and when it can create discriminatory outcomes and impacts, particularly through algorithmic decision-making; and (3) the next steps for these discussions, especially in the areas of policymaking, government actions, technology development, generating social change, industry innovation, and new research.
These gaps in knowledge raised an opportunity to address key questions: How can civil rights values be embedded into the designs of emerging technologies? Is it possible to develop a new civil rights science that aids the cause of equal opportunity and social justice?
Discussions also focused on what it takes to achieve, and how to measure, equality and equity. For example, all schools might receive equal funding, but if one school is teaching students who require more resources, ..
To the surprise of many, some of the most insidious examples of “big data” applications or outcomes represent unintentional discrimination—
byproducts of technical designs and practices that were never intended to cause harm and are hard to prevent from happening again.
A recurrent theme centered on the need for new or expanded uses of existing laws and regulations, particularly the Fair Credit and Reporting Act, to protect civil rights across sectors in response to, and in anticipation of, the flood of data.
In this sense, the convening emphasized the possibility for technology to optimize for civil rights goals, above and beyond what the law provides. Many questioned whether legislation and policy work can keep apace with new technologies, and whether technology can offer a more efficient avenue than laws for designing and implementing systems that perpetuate civil rights values.
yes. this. as long as it’s not added onto existing laws. refresh.
In conversations about positive uses of technology, speculation trumped evidence, prompting critics to be wary of the claims made by those seeking to use technology for good.
what if it’s our obsession with – and/or dependence on – evidence that is keeping us from a doable vision of good. ie: that we’ve not yet experienced because it’s not yet been evidenced.. because it’s not yet been experienced… etc.
[end of executive summary]
– – – –
videos from conference:
intro – danah – all social system are guided by 4 forces: market, law, social norms, tech/architecture – focus on market and tech – so when having convo in dc – don’t jump to what companies can/should do
mc – Solon Barocas
Harlan Yu – spam tracker – training data – patterns/learning based on past examples – machine learning
Ulysses Andrews – error rate – predictions show disparity – ie: google naming
Solon – the big of big data is exciting because it might be able to detect the often invisible
Latonia Sweeney – reproducing prejudice – audience exclusivity – mechanism producing these results… could be advertiser intent – then to max income – more clicks. or opposite, if advertiser equity in intent.. clicks took over all. etc
Ashkan Soltani – price discrimination on law – 2012 research on wall st
so much seems based on consumerism.. what if money were no object. what if that is our pathway
Cynthia Dwork – fairness through awareness – hiding sensitive info can erode both fairness & utility – ie: carnegie melon grad criteria.. dropped # of yrs ness.. the metric to figuring how similar two people are for a task.. results of own learning algorithms
Jenna Burrell – interpretability – internet in ghana not same as in us – esp dating web sites – spam filtering purely because you’re in ghana – machine learning – 3 levels of opacity: proprietary, complexity (require expert knowledge to understand), interpretability – end user facing component
interesting – how to understand complex w/o being an expert in it
Solon – often get greater performance at the expense of interpretability
Ed Felten – accountability while allowing people to withhold info
David Robison – new tools needed to protect civil rights – turn it into a science of civil rights.. now – the returns to having precise rules are going up – census as original big data
– – – –
as i start reading through the report on the education focus – http://www.datacivilrights.org/pubs/2014-1030/Education-Writeup.pdf i’m thinking – surely this idea that perhaps we are gathering – focusing – on the wrong data pertains to other sectors as well. then i get to this.. and think – yeah.. but perhaps not as visibly (because it’s so invisible to many) or as dangerously (because we are spending hours of our days on it, ie: school that kills curiosity et al):
Marginalized student populations may be distrustful of the organizations and platforms that mediate their education because of the ways they have been historically excluded from highquality educational opportunities, and EduTech systems may not be designed in ways that are cognizant of student needs and situations across the board. For instance, being tardy to class may be different for a student who has major caretaking responsibilities at home than for a more privileged student, but the system might log it in the same way for both parties, and impose consequences accordingly.
e. 3. – Groups with the most at stake have the deepest distress because data has been used against them in the past.
ie: For Kay, the DynaBook was meant to help build capacity so that children (and adults too) would create their own interactive learning tools. The DynaBook was not simply about a new piece of hardware or new software, but about a new literacy, a new way of teaching and learning. And that remains largely unrealized. ness… http://www.amazon.com/dp/B00QEDGMMW/ref=r_soa_w_d
– – – –
report on new research http://www.datacivilrights.org/pubs/2014-1030/NewResearch-Writeup.pdf –
In each of the spheres of concern discussed in other sessions, the group summarized the most pressing concerns:
Education — what is meaningful consent and data ownership; what is useful educational data?
Housing — housing data has a deep capacity for positive and negative impacts in a system that is remarkably hard to reverse engineer. In particular, this is exacerbated by the prevalence of very large housing companies, who are responsible for a range of decisions distinct from those even 20 years ago.
Employment — how are algorithmic processes and different types of data brought to bear on hiring decisions?
Criminal Justice — much of the useful data currently exists in limited forms.
Health — public health authorities might have an obligation to collect even more data, but there might be significant barriers to how they use and distribute that data, particularly sensitive data surrounding stigmatizing medical conditions like H.I.V. However, this is one area where privacy
is a diminishing value compared to the overall goals of improving
what if self-talk gets us here – taking care of (in many cases by making irrelevant) the other spheres of concerns…
In every case, however, good and bad had to be contextualized by for whom it was good or bad. This is why an analytical language of power (one beyond “good” and “bad”) is required to advance the discourse on data. There is a need to better describe the structural character of commercial, scientific, governmental, and private individual uses of data, and data analytic capabilities.
– – – –
from write-up of employment – http://www.datacivilrights.org/pubs/2014-1030/Employment-Writeup.pdf –
This session approached the question of Big Data’s growing influence on employment, with a specific focus on hiring practices.
a mechanism that gives the most human/honest look at a person..but perhaps more important.. frees us all from days filled with proving.. to days filled with doing the thing(s) we can’t not do.. (whether or not hiring becomes irrelevant)
– – – –
write up from finance – http://www.datacivilrights.org/pubs/2014-1030/Finance-Writeup.pdf –
Another line of questioning revolved around new fair and just alternatives to traditional banking. First came the question of what it means to be “alternative” within this context. Does it mean non-regulated or outside? Or does alternative mean innovative?
perhaps innovative – as in money becomes irrelevant
– – – –
from write-up of tech development – http://www.datacivilrights.org/pubs/2014-1030/TechnologyDevelopment-Writeup.pdf –
Despite some discussion during the panel, the group acknowledged that a more detailed definition of fairness still needed to be developed. Without a concrete description of what constitutes fairness, consistent regulation cannot be created.
how can we? – is fairness really algorithmable..? like Yaacov‘s definition of democratic ed (asking yourself everyday what it is) and Michael‘s definition of revolution (instigating utopia everyday) ..perhaps instead we use tech to allow everyone to do whatever they want. everyone getting a go everyday.
perhaps that we can’t define fairness… is a blessing… meaning we can’t then regulate it..
In addition, once criteria of fairness are defined, there need to be appropriate techniques for testing for that fairness. This will be a process that needs careful design, as gathering the data necessary for such tests could (of course) produce new opportunities for inequality. Therefore test cases must be thoroughly assessed for their representation of all involved parties for users to companies to vendors and beyond.
Technical researchers started questioning whether it was possible for those working with data to make assertions about their practices and for those assertions to be validated technically without revealing the data itself. This, alongside the broader question of technical auditing, is a fruitful area for further technical research.
perhaps our greatest potential to be afforded a hiding yet in public ness – is when everyone is busy with something else (they can’t not do.) ie: now any/all data is fully transparent/accessible – but everyone is too usefully preoccupied to dig into any data for ill purposes – security becomes irrelevant
– – – –
We need to work across sectors to imagine how we can create a more robust society, free of the cancerous nature of inequity. We need to imagine how technology can be used to empower all of us as a society, not just the most privileged individuals.
The material we are releasing today is a baby step, an attempt to scope out the landscape as best we know it so that we can all work together to go further and deeper. Please help us imagine how we should move forward. If you have any ideas or feedback, don’t hesitate to contact us at: nextsteps at datacivilrights dot org
2014-1015 report – link is a pdf download:
We’ve distilled Data & Society’s first year into a report that we hope you’ll check out.
on captivation (& algorithms)
traps as artwork and artwork as traps
traps of conjunction of 2 diff players
designed to be hidden.. but amazing intensity you can read out of them. traps as constructed environment
17 min – when i gamble i feel like a rat in a trap
i of t very much about our bodies
15 min – on not able to get to privacy.. but able to get to equal access…(?)
a version of neil’s something else ness?
29 min – on not being able to pull nsa out of i of t. but rather create spaces to play
sounds like tim berner-lee – in weaving the web – on – to be a totality – can’t exclude anything..
33 min – not that people use phones to govern.. but to provide governance goods..
37 min – to me the epistemological problem.. the material world might get filled with drm (digital rights management).. with objects that decide if used by right person, then report deviance… … engineers i talk to – i of t as a loss leader for the data. chip prices are falling.. several power issues need to be solved before reach level.. value proposition is in data flow.. not in selling of devices.
43 min – audience – on not market or not market – but it seems you’re saying the market is around the data.. and the data is value.. .. so then we’d have – not a market for data but a market for chips.. all you’d be regulating is batter life..
45 min – i would like to see a possible path for what that idea would be..
48 min – audience – on the i of t – perhaps becoming unintended consequences… with real material impact, ie: sudan
49 min – most of what we write about the i of t – is about imagination.. so convo not yet evolved enough to be about waste.. and the aspiration is that batteries would last for decades..
54 min – i’m sort of a fan of the right to forget initiative.. might help us feel better about our right to privacy
on cultural heritage
7 min – what does it mean to operate a museum in a world with the internet
9 min – what if you could go to the museum and remember your visit.. without having to mess around with device.. we want to encourage people to have a heads up visit.. we built the world’s most complicated bookmarking system…
13 min – everything has a permanent url.. idea – in 100 yrs.. url still works..
16 min – what if there were a way to take in this data.. (at smithsonian).. and say – we will keep this stuff safe from the present and for the future..
19 min – on when data will quit being poison to players and become rich trove of heritage..
always a revisiting.. always an interpretation
21 min – what if there was a way to actively preserve more voices than we do now…
everyday convo ness.
24 min – w/our collections website – most of metadata is garbage – but what we did – say – this thing exists… and you have nothing or you have something you can share with somebody else – and we need to get out of the way. trying to create a system where things achieve weight mass in the universe.. that they become communal proof.. we let people find their own meaning in them… opens up for all of us to have opinions
26 min – danah – what’s interesting is how we think of intellectual property now and how will think of it in the future… ie: artifacts i donate – that is my history …
28 min – i have the luxury of my position because you (user) think this is important… the opportunity we have now is for a kind of voluntary participation
30 min – about longevity… and people wanting to participate – ie: my life is a part of this larger story. then institute saying.. what buffer do we need around this
32 min – audience – for me.. the visit was a lot more than the things i saw….
34 mi – our goal is not to be your memory.. but to help you make those larger constructions (we spend a ton of time on referencing.. needing to know where everything is.. to make what the pen affords possible)
36 min – on smithsonian being a reliable institution/source you can build on.. add to larger narrative around the object
45 min – what becomes important is the stories we tell (on design vs art museums)
47 min – on weird pants from the past – and what is the ancillary material around it.. when you watch how people have used the internet… you start to see that same motivation of – what is there a way to start to preserve some of that.. w/o being creepy..
again – output ness.. document everything ness
48 min – on recall being a power dynamic.. ie: tv, the thing that forces us all to get together… when web happened.. we could discover/arrive at an argument (something of substance) on their own terms..
53 min – danah – on relationality you can do in digital that you can’t do in physical… answer – shape analysis.. how objects are framed.. bridging across institutions.. data visualization.. ie: timeline
556 mi – the more you dig through relationality… not replacing curatory insight.. but different ways you’re invited to see… great opportunity of having data at these multiple levels..
deep learning, machine perception, and the future of memory
6 min – deep learning is basically neural networks.. way to match inputs to outputs
9 min – convolutional network
11 min – on (template?) for images: https://en.wikipedia.org/wiki/Gabor_wavelet
all ways of solving problem of how to represent images have converged on same thing – gabor wavelet ness
14 min – deep learning starting 2012
20 min – documenting everything outside of you –
23 min – the ambient stuff that you go about your day – what if that becomes searchable..
25 min – at the cafe – visual search – using image similarity object.. cluster by particular images
27 min – why i want it – the past is one blob.. and i’m deeply nostalgic
29 min – it’s my past – i should be able to use it if i want.. not depending on how good my memory is.. and the experience of watching yourself in an experience you had before
stephen wolfram doing it
31 min – can’t talk about this w/o talking surveillance. you’re already being recorded… and you don’t have control. it’s almost like a right to bear arms.. ie: who’s evidence can be admissible.. who gets to know.. in some cases advertisers know more about you than you do now….
snapchat designed in the vein of live now… throw it away.. there’s more later
34 min – on cell phones – and before they came to be – people saying – i don’t want to be reachable all the time. google glass – recording everything so you don’t have to be looking at it now.
35 min – if we know we’re on record.. do we become more honest… nicer.. or more fake.. honing this image all the time
36 min – design challenge – so that you are free to be in the moment
adam at clarifai
43 min – i’m not sure this would kill serendipity
47 min – q: why keep data around.. adam – keeping everything can be helpful.. you don’t know what you might want or can better understand later
49 min – q: how the role of interaction will change.. adam: the tools this can give you… you don’t have to remember it the same way.. an image is not a memory
52 min – on people who want to hold a grudge…
57 min – a major battlefront is if your employee will let you wear this at work
unless this changes all that.. no?
59 min – q: on the narrative clip documenting biases…. adam: you’ll be able to tailer it.. it’s machine learning.. if built that enables people to teach it well.. it will.
so there’s the need for synchronicity – everyone having something else to do – to keep us away from grudge ness et al
problematic if not everyone has it..
53 min – q: since constantly reshaping our narrative, how did you experience this.. how did it change you? … adam: i felt free to not to be forced to take my phone out to take pics.. esp with my daughter.. that was really great..
55 min – i do like the feeling of – when something awesome happens.. that i probably have a photo of it
1:06 – my interest.. in building things that help people ask/answer the right questions
deep learning ness
mission of debt collective: 1\ education empowerment 2\ cultural intervention 3\ collective action
erased 33 mill of ed and medical debt
current campaign – corinthian collective – now corinthian150 and growing.. hit a capacity issue
14 min – it’s a data driven society… for who..
24 min – family doesn’t get what i’m doing at all
– – –
Databite No. 42: Maurice Mitchell
26 min – a lot of our work is around staking claim to black people’s humanity. if black people are in fact human.. then logical things should follow that..
nationality: human ness
38 min – danah asking – on connecting people (rapid response et al) through networks.. without having to do it through institutions (previously constructed)
42 min – how we understand organization needs to be more fluid
45 min – physical space is really important in organizing…
53 min – conference calls, ability to video tape anonymously and upload to cloud instantly, ability to communicate w/o wifi, livestreaming, …
55 min – documenting the enormity of this movement as we’re doing it… how to use data to help tell our story
1:01 – democratizing discomfort… ie: stopping traffic not same as people dying
1:07 – on how open/clear it is that we are being surveilled … social control
in mid read of shock doctrine:
current tweet from data and society:
looking through his tweet stream while livestream gets rolling:
notes from livestream:
data is always lying to you.. but if we know that we can fix it.. can’t use data in its raw form… ie: not what it seems…last century…. solved this.. ie: big data doesn’t work.. but now have so much more data.. we forgot what we learned then…
if we don’t know what we don’t know and we compare two different counts.. the stakes of ignoring this.. huge
stats rarely about magnitude.. but rather .. pattern…
what you don’t know is systematically diff than what you do know
i’m not talking about models that give us explanations… talking about models that help us figure out what’s not in our sample.. ie: data base of iraq body count
what does it mean to not know about something.. it means we have zero sources… so the notion of a source.. gives us our first insight about what it means to have knowledge in the world.. so ie: count proportion that has certain number of sources…
sources..? – who defines that..? i mean we’ve seen the exponential fly of ie: rumor ness
what kinds of events are covered by zero sources..
i’m not suggesting rise of isis is a result of bad data analysis
bad data analysis much worse than no data analysis..
on stats.. now we know something we didn’t know before..
? do we really…?
problem is.. assumption hidden in there.
indeed.. always.. no?
can figure 1500 deaths per year by police
rather than pattern
problem.. w data always lying to us – looking at curves of trends over time, ..ie: estimated deaths pretty close.. pattern roughly right. but not in hama… we would miss really huge peak jan 2013 – turns out govt retook hama
data is not a representation of the world.. it’s terrific work done by people.. don’t make it a good stat rep of reality..
ie: 2005-6 – para military demobilization.. putting down weapons.. changed forms.. then killing each other.. so nobody occupied them.. so this crucial piece of colombian history goes completely undocumented… so says.. this was successful process for decreasing violence
until we presented these data.. no one had looked at these 5 data sets together before…
often rather than patterns.. get beautiful graph of how data was collected..
people doing fantastic work.. but not statisticians… people are making policy on this..
sometimes we think if we have really big data.. this problem goes away.. tech gives more info.. but doesn’t address problems that some areas are just dead zones… but we get deluded by bigness
stats generally about comparison.. not magnitude… mag doesn’t tell story… if we don’t have way to tell stories by comparison.. that affects bias
when we talk big data.. we think we have all the data…
or the right data for a reboot…. ie: self talk as data.
we’ve got to get the story right.
counting and estimating.. sometimes turn out..
if reduce uncertainty.. it’s at cost of increased bias..
danah asking about mal practice – pat telling her they approach problems diff.. he goes to one tiny point to see right ness.. she goes to strategic way to fix entire problem.. we need to start talking about bad data analysis as bigger than oops
q: who benefits from this political position of accountability… being counted/not counted/governed/not governed.. messy.. the idea that we need to re enforce good stats.. leaves me uneasy
a: unpack notion of accountability… 1\ of sr military/police officials against gross human rights violations… i’m not talking about accountability of powerless people.. but of powerful for their actions… 2\ for data scientists to assure we’re getting story right.. i don’t have idea how to talk about it as strategic goal.. just able to put up as issue… we have expertise and getting more and more expertise… customers don’t have that expertise.. so don’t know if we’re right/wrong.. so norms in our field as ethical foundation…
now to your q on people who evade being counted… on getting more data… and graphs change… so all our data depends on who we talk to…
again – call for a reboot – global do over – via ie: self talk as data
from reading pdf.. toward end:
The vision Asimov described, of students ‘following their own bent’ – the notion that not only can a learning plan adapt to a students’ pacing, but also enable individualized pursuit of interest – is endlessly reiterated in promises of personalized learning. Positive as this possibility sounds, current infrastructures may not be prepared for the practical realities of students pursuing their own interests.
What if students wish, for example, to not go to college, or to play games instead of completing assignments? To what extent do parents, school districts, or future employers really want students to pursue their own interests?
Asimov (1988) uses the analogy of baseball to describe the pursuit of interests: “You learn all you want about baseball, because the more you learn about baseball the more you might grow interested in mathematics to try to figure out what they mean by those earned run averages and the batting averages and so on. You might, in the end, become more interested in math than baseball if you follow your own bent, and you’re not told.” There is an assumption that left to pursue their own interests, students will gravitate toward creative, fulfilling, socially valued intellectual pursuits. But what if an interest in baseball doesn’t lead to an interest in math?
What if an interest in baseball leads to a deeply satisfying sports hobby or playing baseball video games? Or what if an interest in baseball comes at the expense of an interest in math, literature, or other topics?
Another possibility is that a student really fulfills this expectation, that a love of baseball does lead to a love of math and that love of math leads to a career exploring complex questions. How long will algorithmic measurement allow between the child’s interest in baseball and her/his demonstrated interest in math before suggesting something else?
Returning to Asimov’s ideal of students following their interests, *when would the algorithm declare the student’s trajectory a success or failure? In the realities of an iterative world, which focuses on small gains, such as test scores, rather than larger ones, such as well-being or job satisfaction, how is the open-ended process of intellectual discovery accounted for? Algorithms can only measure what they are programmed to measure. Given the limitations of technologies in determining the success of an open-ended process with no clear outcomes until the outcomes are clear, how would progress be measured, or allowed, and what data or meta-data are available to be part of the calculation?
databite 109 – Safiya Umoja Noble
At 5:30PM ET this evening, we’ll be livestreaming Databite 109 with @safiyanoble. Watch here: https://t.co/LTj7qdtUPR
Original Tweet: https://twitter.com/datasociety/status/996425725971349504
this talk based on her most recent book. In Algorithms of Oppression, requested to library
if the tech story could include a story of these patterns of global exploitation.. we might have a possibility for organizing and thinking in local and global ways for resisting that.. because ultimately that’s what i’m interested in.. t
the voice in academia is often not our voice.. t
begs mech to listen to every voice.. as it could be..
the most vulnerable in our society becomes the experimental groups.. the commodities.. this is where indigenous communities have very deep knowledge and also very powerful forms of resistance to that and we need to keep those things in mind .. t