Data science

[ follow ]
fromFast Company
2 days ago

How AWS-powered Next Gen Stats changed the NFL forever

Next Gen Stats began in 2015, when the National Football League deployed RFID chips in player shoulder pads and even in the football itself, enabling the league to capture location data multiple times per second through sensors installed throughout stadiums.
Data science
fromHarvard Gazette
2 days ago

Breaking chess's rating stalemate - Harvard Gazette

This is the conundrum of elite chess. The stronger the players, the greater the odds of the match ending in a draw. "What ended up happening," said Mark Glickman, senior lecturer in the Department of Statistics and longtime chess enthusiast, "is that these top players were not having their ratings change very much, just because the games would be drawn all the time."
Data science
Data science
fromWIRED
3 days ago

Sports Betting Is Skyrocketing. Will It Take Over the Olympics?

Integrity agencies monitor live betting data to detect suspicious patterns and coordinate investigations into match-fixing, collusion, and other gambling malfeasance.
fromNews Center
3 days ago

New Computational Biology Track Added to PhD Graduate Program - News Center

A new PhD track is being added to the Walter S. and Lucienne Driskill Graduate Program in Life Sciences ( DGP) for the 2026 application cycle, to enhance student learning and build community around computational biology and bioinformatics at Feinberg. The computational biology and bioinformatics (CBB) track in the graduate program will prepare students through coursework and lectures to use modern computational approaches, including machine learning and artificial intelligence, to extract biological insight from large-scale datasets to address complex biological problems.
Data science
Data science
fromInfoQ
5 days ago

Beyond the Warehouse: Why BigQuery Alone Won't Solve Your Data Problems

Data warehouses like BigQuery perform well initially but become slow, costly, and disorganized at scale, undermining low-latency operational use and innovation.
Data science
fromInfoWorld
6 days ago

Snowflake debuts Cortex Code, an AI agent that understands enterprise data context

Cortex Code enables developers to use natural language to build, optimize, and deploy governed, production-ready data pipelines, analytics, ML workloads, and AI agents.
Data science
fromDevOps.com
6 days ago

Why Data Contracts Need Apache Kafka and Apache Flink - DevOps.com

Data contracts formalize schemas, types, and quality constraints through early producer-consumer collaboration to prevent pipeline failures and reduce operational downtime.
fromCornell Chronicle
6 days ago

Maps offer neighborhood-level insight into American migration | Cornell Chronicle

That local exodus is documented by Cornell-led research that mapped annual moves between U.S. neighborhoods from 2010 to 2019 in detail 4,600 times greater than standard public data. Called MIGRATE, the new, publicly available dataset revealed that most of those displaced remained within the affected county - moves not captured in county-level public migration data aggregated every five years.
Data science
Data science
fromBusiness Insider
1 week ago

Economic data is getting harder to come by, and the alternative won't help everyone

Erosion of BLS economic data undermines public data reliability and will widen information gaps as costly alternative data favors wealthy investors.
Data science
fromNature
1 week ago

Science finds its song

Scientists are translating research data into music, fostering interdisciplinary collaboration, revealing patterns, and increasing accessibility through data-driven music events.
Data science
fromBusiness Insider
1 week ago

The under-the-radar risk that could sink America's economy

Government-produced data that underpins markets and decision-making is eroding, risking poorer decisions across economies and households.
fromInfoWorld
1 week ago

Google expands BigQuery with conversational agent and custom agent tools

Instead of treating each prompt as a one-off request, the new agent remembers what was asked earlier, including datasets, filters, time ranges, and assumptions, and uses that context when answering follow-up questions. This lets users refine an analysis progressively rather than starting from scratch each time," Satapathy added. Satapathy pointed out that this eases the pressure on developers to prebuild dashboards or predefined business logic for every possible question that a data analyst or business user could ask.
Data science
Data science
fromFlowingData
2 weeks ago

Pentagon Pizza dashboard to track activities

A real-time dashboard (PizzINT) monitors pizza shop popularity around the Pentagon to track potential correlations between late-night pizza orders and military activity.
fromTechzine Global
1 week ago

Alteryx and Google Cloud bring analytics closer to BigQuery

With the introduction of Live Query for BigQuery and Alteryx One: Google Edition, users no longer need to move data to run workflows. Companies that standardize cloud platforms for analytics and AI often see a gap between where data is stored and how it is prepared and used. Alteryx wants to change that by bringing analytics workflows directly to BigQuery. The promise: from data to insight to action, without compromising on security or scalability.
Data science
Data science
fromComputerworld
2 weeks ago

Great R packages for data import, wrangling, and visualization

A set of R packages (dplyr, purrr, readr/vroom, datapasta, Hmisc) streamline data wrangling, importing, and analysis with faster, standardized, and reproducible tools.
fromTheServerSide.com
2 weeks ago
Data science

Why Java devs should switch to Python or R for data science | TheServerSide

Python and R dominate data science front-end work, offering richer ecosystems and easier data analysis than Java for many statistical and machine learning tasks.
Data science
fromCIO
2 weeks ago

5 perspectives on modern data analytics

Data/business analytics is the top IT investment priority, yet analytics projects often fail due to poor data, vague objectives, and one-size-fits-all solutions.
Data science
fromComputerworld
2 weeks ago

Tableau re-engineers dashboards, adds new analytics tools for business analysts

Tableau 2022.3 adds Data Guide and Table Extension, dynamic dashboards, event auditing, and performance/cost optimization to simplify self-service analytics for business users.
Data science
fromCmxhub
1 week ago

Ready to Nerd Out About Community Data? Join Richard Millington's Workshop at CMX Summit 2023

Learn data-driven community management techniques in a hands-on Pre-Summit workshop to increase engagement, prioritize actions, and prove community value.
Data science
fromComputerworld
2 weeks ago

R syntax quirks you'll want to know

R primarily uses <- for assignment; = can sometimes assign, is used for default arguments and some functions; R is case-sensitive; c() combines values into vectors.
Data science
fromBusiness Insider
2 weeks ago

How hedge funds are tapping prediction markets and their data for an edge

Hedge funds primarily use prediction market data rather than trading on platforms like Kalshi and Polymarket.
fromFortune
2 weeks ago

How Walmart is using AI to reroute essential supplies ahead of Winter Storm Fern | Fortune

From a meteorological perspective, the winter storm sweeping across the country this weekend is a supply chain disruption in its own right: A high-pressure system from the north is smashing into a low-pressure system from the south, belting large swaths of the US with heavy snow, sleet, and freezing rain. While the snarl in the upper atmosphere could trickle down to the real supply chain on the ground, some retailers are taking steps to anticipate the impact of the storm and position their products accordingly.
Data science
fromComputerWeekly.com
2 weeks ago

Interview: Barry Panayi, group chief data officer, Howden | Computer Weekly

Our work is not about producing a list of tables with numbers in rows and columns,
Data science
Data science
fromLondon Business News | Londonlovesbusiness.com
2 weeks ago

Is Maptive the best mapping software to conduct complex spatial analysis - London Business News | Londonlovesbusiness.com

Maptive delivers cloud-based, no-code spatial analysis and mapping that handles large datasets, automated territories, route planning, and enterprise-grade global mapping infrastructure.
Data science
fromTreehouse Blog
3 weeks ago

Beginning SQL: 10 Essential Query Patterns

Recognizing common SQL query patterns enables beginners to retrieve, filter, summarize, and reason about data effectively across industries.
frommoz.com
3 weeks ago

Vibe Coding Your Own SEO Tools Whiteboard Friday

You can always make it better. You can improve things. But it does give you a good taste of what can be done in vibe coding. Those are things that I made maybe in 15 minutes, half an hour. It is quite simple to get those first steps and say, "Oh, this works." Maybe you want to do some improvements, and you refine the code and what you're expecting.
Data science
Data science
fromInfoQ
3 weeks ago

How Agoda Unified Multiple Data Pipelines Into a Single Source of Truth

A centralized Apache Spark-based financial pipeline (FINUDP) creates a single source of truth and a multi-layered quality framework to ensure accurate, consistent financial metrics.
fromGael Varoquaux
3 weeks ago

Stepping up as probabl's CSO to supercharge scikit-learn and its ecosystem

I'm thrilled to announce that I'm stepping up as Probabl 's CSO (Chief Science Officer) to supercharge scikit-learn and its ecosystem, pursuing my dreams of tools that help go from data to impact. Scikit-learn, a central tool Scikit-learn is central to data-scientists' work: it is the most used machine-learning package. It has grown over more than a decade, supported by volunteers' time, donations, and grant funding, with a central role of Inria.
Data science
Data science
fromMedium
3 weeks ago

How I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)

A Spark job slowed roughly 10x after data growth; diagnosing and optimizing Spark execution reduced runtime by about 70% without adding cluster resources.
fromNew Relic
1 month ago

The Power and Cost of Data Cardinality

The more attributes you add to your metrics, the more complex and valuable questions you can answer. Every additional attribute provides a new dimension for analysis and troubleshooting. For instance, adding an infrastructure attribute, such as region can help you determine if a performance issue is isolated to a specific geographic area or is widespread. Similarly, adding business context, like a store location attribute for an e-commerce platform, allows you to understand if an issue is specific to a particular set of stores
Data science
Data science
fromMedium
1 month ago

The Complete Guide to Optimizing Apache Spark Jobs: From Basics to Production-Ready Performance

Optimize Spark jobs by using lazy evaluation awareness, early filter and column pruning, partition pruning, and appropriate join strategies to minimize shuffles and I/O.
Data science
fromwww.bbc.com
1 month ago

Excel: The software that's hard to quit

Excel's ubiquity enables quick analysis but spreadsheet-based workflows and macros create maintenance, security, centralization, and AI integration problems.
Data science
fromComputerworld
1 month ago

Accenture to acquire UK AI startup Faculty

Faculty, renamed from ASI Data Science, built NHS Covid predictive systems and aligns with Accenture's AI-focused Reinvention Services.
#aws
fromBusiness Insider
1 month ago

CEO of AI training startup says humans will still be involved in data creation for decades

"When I first started this job, the main push back I always got was that synthetic data will take over and you just will not need human feedback two to three years from now," said Fitzpatrick, who joined the startup last year. "From first principles, that actually doesn't make very much sense." Synthetic data refers to data that is artificially created.
Data science
Data science
fromMedium
1 month ago

Migrating from Historical Batch Processing to Incremental CDC Using Apache Iceberg (Glue 4...

Use Apache Iceberg Copy-on-Write tables in AWS Glue 4 to migrate from full historical batch reprocessing to incremental CDC, reducing redundant computation, I/O, and costs.
Data science
fromwww.housingwire.com
1 month ago

The spreadsheet trap: Why investor reporting still operates like it's 2005

Investor reporting offices in loan servicing rely on legacy, spreadsheet-based processes due to historical adoption, cultural inertia, and perceived transparency despite significant operational risk.
#charts
#data-quality
fromMedium
1 month ago
Data science

Data Quality on Spark, Part 4: Deequ

Deequ enables scalable, automated data quality checks, profiling, analyzers, and suggestions on Apache Spark for open-source Data Quality assessments.
fromMedium
1 month ago
Data science

Data Quality on Spark, Part 4: Deequ

Deequ provides scalable, Spark-native tools for defining, profiling, and analyzing data quality checks with Scala APIs and an optional Python wrapper (PyDeequ).
#data-visualization
#ai-data-centers
Data science
fromInfoQ
1 month ago

Beyond Win Rates: How Spotify Quantifies Learning in Product Experiments

Experiments should be judged by decision-ready learning—valid and actionable outcomes that tell teams to ship, abort, or iterate—rather than by win rates alone.
Data science
fromTheregister
1 month ago

AI has pumped hyperscale - but how long can it last?

Hyperscale datacenter operators nearly tripled infrastructure spending and increased quarterly operational capacity by roughly 170% driven by surging demand for AI workloads since late 2022.
Data science
fromInfoQ
1 month ago

Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs

Migrating small-to-medium data workloads from Apache Spark to Polars yields major performance and cost improvements by enabling single-node execution and faster in-memory processing.
Data science
fromMedium
1 month ago

Data Quality on Spark, Part 4: Deequ

Deequ is a Spark-based open-source library for expressing, evaluating, and profiling data quality checks at scale, with analyzers, automatic suggestions, and Scala/Python support.
Data science
fromMedium
1 month ago

Ten Open-Source Business Intelligence Tools for Improved ROI and Productivity

Open-source BI tools deliver flexible, transparent, cost-effective analytics that enable nontechnical users to build dashboards and achieve higher ROI.
Data science
fromThe ODI
2 weeks ago

Data Ethics Professional #10: Advertising & Ethics - Audience Selection & Proxy Data

Digital advertising requires ethical, inclusive audience selection practices to prevent harmful exclusion and prioritize human safety alongside brand safety.
Data science
fromYahoo Creators
1 month ago

Boring remote jobs that pay at least $100,000 a year and employers can't fill fast enough

Numerous data-heavy, low-drama remote careers pay six figures and offer steady, repetitive tasks with strong demand and clear career paths.
fromInfoQ
1 month ago

Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency Across Content Engineering

Upper is based on W3C standards such as RDF for conceptual graph representation and SHACL for validation, and it enables the principle of &quot;model once, represent everywhere&quot; across the data ecosystem.Upper organizes concepts through keyed entities, their attributes, and their relationships across domain boundaries. The modeling grammar and validation structure are designed to maintain consistency as definitions evolve. Keyed concepts can be extended monotonically, allowing new attributes or relationships without modifying existing definitions allowing domains to expand over time without breaking existing models.
Data science
Data science
fromZDNET
1 month ago

This company's AI success was built on 5 essential steps - see how they work for you

AI initiatives succeed when grounded in strong data foundations, clear user-focused goals, measurable value, governance, and an iterative approach that builds confidence and delivers outcomes.
Data science
fromTreehouse Blog
2 months ago

Beginning Data Analysis: From Questions to Insights

Learning data analysis enables beginners to turn raw information into meaningful insights, spot trends, and support evidence-based decision-making across many fields.
Data science
fromMedium
2 months ago

From Zero to Scala Expertise: My Step-by-Step Homework Path

Learning Scala requires overcoming unfamiliar functional syntax and errors, but mastery enables high-performance, cleaner code and access to big data frameworks like Apache Spark.
Data science
fromComputerWeekly.com
2 months ago

Interview: Paul Neville, director of digital, data and technology, The Pensions Regulator | Computer Weekly

TPR is shifting from compliance-based to risk-based regulation by building strong IT foundations, improving data, automation, and cross-organisational information flows.
#chart-templates
Data science
fromInfoWorld
2 months ago

OpenAI to acquire AI training tracker Neptune

Neptune's hosted experiment-tracking SaaS will shut down March 4, 2026; users have months to export data while stability and security fixes continue.
fromRealpython
2 months ago

Introduction to pandas - Real Python

The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields. DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc.
Data science
fromTheregister
2 months ago

MongoDB talks up its AI chops by talking down PostgreSQL

Speaking to investment analysts, he said that while MongoDB had all the elements needed to be the right foundational platform for AI workloads, it was too early to say what might be the platform of choice. However, he said MongoDB had been winning work from AI-native companies, citing a customer that recently "switched from PostgreSQL to MongoDB because PostgreSQL could not just scale."
Data science
Data science
fromTheregister
2 months ago

HPE pumps AI cloud lineup with extra Nvidia capabilities

HPE upgrades Private Cloud AI with Nvidia Blackwell GPUs, GPU fractionalization, STIG-hardened NIMs, Juniper networking integration, and Alletra storage for inline data preparation.
Data science
fromInfoQ
2 months ago

Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges

Uncoordinated data schema changes between application and analytics teams cause silent failures and incorrect analytics; software practices must ensure versioning and compatibility.
fromTechzine Global
2 months ago

Snowflake acquires Select Star for broader data context

Snowflake has signed an agreement to acquire Select Star. This company's technology will expand Snowflake Horizon Catalog by integrating with databases, BI tools, and data pipelines. This will increase the context for AI agents such as Snowflake Intelligence. The full context of data assets is often scattered across upstream and downstream systems. This fragmentation makes it difficult to find the right data and understand the full context. In the AI era, this limited context poses a problem for both humans and agents.
Data science
Data science
fromIT Pro
2 months ago

Chief data officers believe they'll be a 'pivotal' force in in the C-suite within five years

CDOs will become equal or highly influential C-suite leaders as data, AI, budgets, and teams expand.
Data science
fromFortune
2 months ago

A World Bank expert thinks countries should leverage 'small AI'-and avoid competing with the biggest tech giants | Fortune

Smaller Southeast Asian countries can pursue targeted 'small AI' but require expanded data centers, reliable power infrastructure, and regulatory collaboration to scale.
Data science
fromBattery Power
2 months ago

2025 Atlanta Braves Player Review: Vidal Brujan

Vidal Bruján, a versatile but previously below-average offensive player, provided needed depth for the Braves and delivered a stronger-than-expected 2025 performance.
Data science
fromArs Technica
2 months ago

Data-driven sport: How Red Bull and AT&T move terabytes of F1 info

Race teams use hundreds of sensors and high-speed, low-latency, secure data links to optimize setups, strategy, and efficiency while reducing costs.
Data science
fromwww.ocregister.com
2 months ago

California shoppers intensely searching for bargains

California searches for budget-related terms rose 12% year-over-year, reflecting increased consumer thriftiness amid post-pandemic inflationary pressure.
Data science
fromInfoWorld
2 months ago

Improving annotation quality with machine learning

Treat annotation as data understanding to systematically reduce error rates, development time, and cost while improving dataset quality for machine learning.
fromBusiness Insider
2 months ago

The gruesome new data on tech jobs

Data and analytics jobs really stand out, though. This sector had a Jobs Posting Index of 60, the lowest of all sectors Indeed tracked as of the end of October. That means there are 40% fewer data and analytics job openings than before the pandemic. Even worse: There is still a rising number of applications per job in this sector, according to Indeed.
Data science
fromPycoders
2 months ago

PyCoder's Weekly | Issue #709

Why Python's deepcopy Can Be So Slow copy.deepcopy() creates a fully independent clone of an object, traversing every nested element of the object graph." That can be expensive. Learn what it is doing and how you can sometimes avoid the cost.
Data science
fromMarTech
2 months ago

Data readiness is the missing foundation of AI-powered marketing | MarTech

Our industry is rushing headlong toward an AI-powered future. The promise is captivating: intelligent systems that can predict market shifts, personalize customer experiences and drive unprecedented growth. Yet in that race, many organizations are short-changing or even skipping a critical first step. They are building sophisticated engines but trying to run them on unrefined fuel. The result is a quiet crisis of confidence, where powerful technology underwhelms because the marketers don't trust the data it relies on.
Data science
Data science
fromMiami Herald
2 months ago

The Best AI Jobs Are No Longer Concentrated in Silicon Valley - Report

AI job growth and high salaries are shifting nationwide, with top-paying roles and remote options emerging outside Silicon Valley.
fromPsychology Today
2 months ago

The Data Within

It is clean and complete. It captures almost everything I have watched over the last decade, with the exception of a couple of hours of viewing on flights or in hotel rooms. Normally, the algorithm serves up a menu of options that includes something that will satisfy me. And that's the thing about algorithms: They are tuned to normality. They make predictions based on statistical likelihoods, past behavior, and expectations about the continuation of trends.
Data science
Data science
fromBusiness Matters
2 months ago

Managing Big Data Storage: The Role of Object Storage

Object storage provides scalable, flat-namespace management and archiving of massive unstructured big data that overwhelms traditional hierarchical storage systems.
[ Load more ]